• Douglas Anderson's avatar
    platform/chrome: cros_ec_spi: Transfer messages at high priority · 37a18622
    Douglas Anderson authored
    The software running on the Chrome OS Embedded Controller (cros_ec)
    handles SPI transfers in a bit of a wonky way.  Specifically if the EC
    sees too long of a delay in a SPI transfer it will give up and the
    transfer will be counted as failed.  Unfortunately the timeout is
    fairly short, though the actual number may be different for different
    EC codebases.
    
    We can end up tripping the timeout pretty easily if we happen to
    preempt the task running the SPI transfer and don't get back to it for
    a little while.
    
    Historically this hasn't been a _huge_ deal because:
    1. On old devices Chrome OS used to run PREEMPT_VOLUNTARY.  That meant
       we were pretty unlikely to take a big break from the transfer.
    2. On recent devices we had faster / more processors.
    3. Recent devices didn't use "cros-ec-spi-pre-delay".  Using that
       delay makes us more likely to trip this use case.
    4. For whatever reasons (I didn't dig) old kernels seem to be less
       likely to trip this.
    5. For the most part it's kinda OK if a few transfers to the EC fail.
       Mostly we're just polling the battery or doing some other task
       where we'll try again.
    
    Even with the above things, this issue has reared its ugly head
    periodically.  We could solve this in a nice way by adding reliable
    retries to the EC protocol [1] or by re-designing the code in the EC
    codebase to allow it to wait longer, but that code doesn't ever seem
    to get changed.  ...and even if it did, it wouldn't help old devices.
    
    It's now time to finally take a crack at making this a little better.
    This patch isn't guaranteed to make every cros_ec SPI transfer
    perfect, but it should improve things by a few orders of magnitude.
    Specifically you can try this on a rk3288-veyron Chromebook (which is
    slower and also _does_ need "cros-ec-spi-pre-delay"):
      md5sum /dev/zero &
      md5sum /dev/zero &
      md5sum /dev/zero &
      md5sum /dev/zero &
      while true; do
        cat /sys/class/power_supply/sbs-20-000b/charge_now > /dev/null;
      done
    ...before this patch you'll see boatloads of errors.  After this patch I
    don't see any in the testing I did.
    
    The way this patch works is by effectively boosting the priority of
    the cros_ec transfers.  As far as I know there is no simple way to
    just boost the priority of the current process temporarily so the way
    we accomplish this is by queuing the work on the system_highpri_wq.
    
    NOTE: this patch relies on the fact that the SPI framework attempts to
    push the messages out on the calling context (which is the one that is
    boosted to high priority).  As I understand from earlier (long ago)
    discussions with Mark Brown this should be a fine assumption.  Even if
    it isn't true sometimes this patch will still not make things worse.
    
    [1] https://crbug.com/678675Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
    Reviewed-by: default avatarMatthias Kaehlcke <mka@chromium.org>
    Reviewed-by: default avatarBrian Norris <briannorris@chromium.org>
    Signed-off-by: default avatarEnric Balletbo i Serra <enric.balletbo@collabora.com>
    37a18622
cros_ec_spi.c 20.2 KB