1. 13 Jan, 2016 2 commits
    • Yorick Peterse's avatar
      Randomize metrics sample intervals · 057eb824
      Yorick Peterse authored
      Sampling data at a fixed interval means we can potentially miss data
      from events occurring between sampling intervals. For example, say we
      sample data every 15 seconds but Unicorn workers get killed after 10
      seconds. In this particular case it's possible to miss interesting data
      as the sampler will never get to actually submitting data.
      
      To work around this (at least for the most part) the sampling interval
      is randomized as following:
      
      1. Take the user specified sampling interval (15 seconds by default)
      2. Divide it by 2 (referred to as "half" below)
      3. Generate a range (using a step of 0.1) from -"half" to "half"
      4. Every time the sampler goes to sleep we'll grab the user provided
         interval and add a randomly chosen "adjustment" to it while making
         sure we don't pick the same value twice in a row.
      
      For a specified timeout of 15 this means the actual intervals can be
      anywhere between 7.5 and 22.5, but never can the same interval be used
      twice in a row.
      
      The rationale behind this change is that on dev.gitlab.org I'm sometimes
      seeing certain Gitlab::Git/Rugged objects being retained, but only for a
      few minutes every 24 hours. Knowing the code of Gitlab and how much
      memory it uses/leaks I suspect we're missing data due to workers getting
      terminated before the sampler can write its data to InfluxDB.
      057eb824
    • Yorick Peterse's avatar
      23671600
  2. 12 Jan, 2016 17 commits
  3. 11 Jan, 2016 21 commits