1. 07 Mar, 2025 1 commit
  2. 03 Mar, 2025 3 commits
  3. 28 Feb, 2025 5 commits
  4. 07 Feb, 2025 1 commit
    • Titouan Soulard's avatar
      check_slow_queries_digest_result: fix encoding on file opening · 5b441e84
      Titouan Soulard authored
      Sometimes, it seems the file analyzed can contain non-ASCII characters, giving
      errors such as:
      
      ```
      codec can't decode byte 0xc3 in position 5138: ordinal not in range(128)
      ```
      
      It seems there is no reason to not support UTF8, and more generally ignore
      encoding errors here. This change has been applied for a few weeks on Nexedi
      ERP5 production without problems.
      
      /reviewed-by @tomo @jerome
      /reviewed-on !140
      /cc @xavier_thompson
      5b441e84
  5. 23 Jan, 2025 2 commits
  6. 18 Dec, 2024 2 commits
  7. 16 Dec, 2024 3 commits
  8. 03 Dec, 2024 1 commit
  9. 14 Nov, 2024 1 commit
    • Kirill Smelkov's avatar
      promise/plugin/check_cpri_lock: Fix it to work with Amarisoft 2024-06-15 · f155972d
      Kirill Smelkov authored
      In that release rf_info started to emit output with "Clock tune:" entry
      not indented and so our code was ignoring every line after that leading
      to missing to load CPR_option entry and erroring with "no CPRI entry" as
      the result.
      
      -> Fix that by adding a quirk to treat "Clock tune:" line specially so
         that processing of per sdr/port records does not stop on it.
      
      /cc @lu.xu
      /proposed-for-review-on !135
      f155972d
  10. 30 Jul, 2024 1 commit
  11. 18 Jul, 2024 1 commit
  12. 15 Jul, 2024 4 commits
  13. 02 Jul, 2024 3 commits
  14. 17 Apr, 2024 1 commit
    • Kirill Smelkov's avatar
      promise/plugin/check_cpri_lock: Fix it to work ok with Amarisoft 2022 · 90cacfe6
      Kirill Smelkov authored
      Old Amarisoft versions emit data that is different compared to what 640c0130 expected:
      
      - there can be trailing spaces after /dev/sdrX@Y
        fix: adjust /dev/sdr regex
      - there is empty line coming in the end that fas failing in l.split(':', 1)
        fix: ignore empty lines and make detection of : in the line more
             robust with reporting which line is not valid instead of just
             "ERROR not enough values to unpack (expected 2, got 1)"
      - there is CPRI key instead of CPRI_option, and the data in both are different.
        fix: detect automatically whether it is CPRI_option or CPRI option that is present.
      
      /co-authored-by lu.xu <lu.xu@nexedi.com>
      /reviewed-on !132
      90cacfe6
  15. 15 Feb, 2024 4 commits
  16. 14 Feb, 2024 1 commit
  17. 18 Jan, 2024 1 commit
  18. 29 Nov, 2023 2 commits
  19. 02 Nov, 2023 3 commits
    • Kirill Smelkov's avatar
      slapos/promise/plugin/check_cpri_lock: Don't check whatever device blindly · 640c0130
      Kirill Smelkov authored
      Currently this promise is implemented by grepping whole rf_info output
      for "HW" and "SW" strings. But this won't work ok in the presence of
      multiple CPRI devices. Imagine, for example if one device has CPRI lock,
      while the other does not:
      
          PCIe CPRI /dev/sdr2@1:
            Hardware ID: 0x4b12
            DNA: [0x0048248a334a7054]
            Serial: ''
            FPGA revision: 2023-06-23  10:05:24
            FPGA vccint: 0.98 V
            FPGA vccaux: 1.76 V
            FPGA vccbram: 0.98 V
            FPGA temperature: 71.9 °C
            Clock tune: 0.0 ppm
            NUMA: 0
            CPRI_option: '5' (x8) lock=no                     <-- NOTE
            DMA0: TX fifo: 66.67us  Usage=16/32768 (0%)
            DMA0: RX fifo: 66.67us  Usage=16/32768 (0%)
            DMA0 Underflows: 0
            DMA0 Overflows: 0
          PCIe CPRI /dev/sdr3@1:
            Hardware ID: 0x4b12
            DNA: [0x0048248a334a7054]
            Serial: ''
            FPGA revision: 2023-06-23  10:05:24
            FPGA vccint: 0.98 V
            FPGA vccaux: 1.77 V
            FPGA vccbram: 0.98 V
            FPGA temperature: 71.7 °C
            Clock tune: 0.0 ppm
            NUMA: 0
            CPRI_option: '5' (x8) lock=HW+SW rx/tx=46.606us   <-- NOTE
              Port #0: T14=46.606us
            DMA0: TX fifo: 66.67us  Usage=16/32768 (0%)
            DMA0: RX fifo: 66.67us  Usage=16/32768 (0%)
            DMA0 Underflows: 0
            DMA0 Overflows: 0
      
      the old code would still report "CPRI locked all ok" and also globally
      without indicating which CPRI channel is locked.
      
      -> Fix it by adjusting check_cpri_lock to parse rf_info text more
      precisely, detect devices there and to understand which device has CPRI
      lock and which does not.
      
      For now this change is accompanied by the following change in
      ors-amarisoft SR to keep it working:
      
          --- a/software/ors-amarisoft/instance-enb.jinja2.cfg
          +++ b/software/ors-amarisoft/instance-enb.jinja2.cfg
          @@ -35,7 +35,6 @@ parts =
             check-lopcomm-sync.py
             check-lopcomm-config-log.py
             check-lopcomm-stats-log.py
          -  check-cpri-lock.py
           {% endif %}
           {% if slapparameter_dict.get("dnsmasq", None) %}
             dnsmasq-service
          @@ -48,6 +47,7 @@ parts =
           {% endif %}
             monitor-base
             publish-connection-information
          +{% set extra_part_list = [] %}
      
           extends = {{ monitor_template }}
      
          @@ -688,12 +688,21 @@ config-testing = {{ slapparameter_dict.get("testing", False) }}
           config-config-log = ${lopcomm-rrh-config-template:log-output}
           config-stats-period = {{ slapparameter_dict.get("enb_stats_fetch_period", 60) }}
      
          -[check-cpri-lock.py]
          +{%  if ru == "lopcomm" %}
          +{%-   set cell_list = slapparameter_dict.get('cell_list', {'default': {}}) %}
          +{%-   for i, k in enumerate(cell_list) %}
          +{%-     set sfp_port = cell_list[k].get('cpri_port_number', i) %}
          +{%-     do extra_part_list.append('SFP{{sfp_port}}-cpri-lock.py') %}
          +[SFP{{sfp_port}}-cpri-lock.py]
           <= macro.promise
           promise = check_cpri_lock
           config-testing = {{ slapparameter_dict.get("testing", False) }}
          +config-sdr_dev  = {{ slapparameter_dict.get('sdr_number', 0) }}
          +config-sfp_port = {{ sfp_port }}
           config-amarisoft-rf-info-log = ${amarisoft-rf-info-template:log-output}
           config-stats-period = {{ slapparameter_dict.get("enb_stats_fetch_period", 60) }}
          +{%-  endfor %}
          +{% endif %}
      
           [check-rx-saturated.py]
           <= macro.promise
          @@ -702,3 +711,9 @@ config-testing = {{ slapparameter_dict.get("testing", False) }}
           config-amarisoft-stats-log = ${amarisoft-stats-template:log-output}
           config-stats-period = {{ slapparameter_dict.get("enb_stats_fetch_period", 60) }}
           config-max-rx-sample-db = {{ slapparameter_dict.get("max_rx_sample_db", 0) }}
          +
          +[buildout]
          +parts +=
          +{%- for part in extra_part_list %}
          +    {{ part }}
          +{%- endfor %}
      
      (posted in slapos!1461)
      
      The way rf_info text is parsed could be also useful in the future to
      e.g. detect FPGA revision of the boards and report their recency status
      via promise.
      
      /cc @jhuge, @tomo, @xavier_thompson, @Daetalus
      /reviewed-by @lu.xu
      /reviewed-on !127
      640c0130
    • Kirill Smelkov's avatar
      promise/plugin/check_cpri_lock: Fix it to detect stale data · 7c3b240f
      Kirill Smelkov authored
      This promise plugin was using tail_file(self.amarisoft_rf_info_log) to
      get the most recent log entry to check. But tail_file simply return last
      line of the log file without checking timestamp of returned entry. So it
      could be the case that last line is ages ago, reports "CPRI locked ok"
      and then, if there are no more rf_info entries for one reason or
      another, the CPRI-locked promise will be holding green despite the fact
      that data became stale.
      
      -> Fix it by explicitly checking timestamp of last log entry to be in
      expected recent range.
      
      /cc @jhuge, @tomo, @xavier_thompson, @Daetalus
      /reviewed-by @lu.xu
      /reviewed-on !127
      7c3b240f
    • Kirill Smelkov's avatar
      promise/plugin/check_cpri_lock: test: Factor common code to write to log and... · 47f4f8a6
      Kirill Smelkov authored
      promise/plugin/check_cpri_lock: test: Factor common code to write to log and promise into common place
      
      We are going to add more tests. Keeping the code to initialize test
      environment duplicated over and over will be not convenient.
      
      /cc @jhuge, @tomo, @xavier_thompson, @Daetalus
      /reviewed-by @lu.xu
      /reviewed-on !127
      47f4f8a6