• Douglas Anderson's avatar
    r8152: Block future register access if register access fails · d9962b0d
    Douglas Anderson authored
    Even though the functions to read/write registers can fail, most of
    the places in the r8152 driver that read/write register values don't
    check error codes. The lack of error code checking is problematic in
    at least two ways.
    
    The first problem is that the r8152 driver often uses code patterns
    similar to this:
      x = read_register()
      x = x | SOME_BIT;
      write_register(x);
    
    ...with the above pattern, if the read_register() fails and returns
    garbage then we'll end up trying to write modified garbage back to the
    Realtek adapter. If the write_register() succeeds that's bad. Note
    that as of commit f53a7ad1 ("r8152: Set memory to all 0xFFs on
    failed reg reads") the "garbage" returned by read_register() will at
    least be consistent garbage, but it is still garbage.
    
    It turns out that this problem is very serious. Writing garbage to
    some of the hardware registers on the Ethernet adapter can put the
    adapter in such a bad state that it needs to be power cycled (fully
    unplugged and plugged in again) before it can enumerate again.
    
    The second problem is that the r8152 driver generally has functions
    that are long sequences of register writes. Assuming everything will
    be OK if a random register write fails in the middle isn't a great
    assumption.
    
    One might wonder if the above two problems are real. You could ask if
    we would really have a successful write after a failed read. It turns
    out that the answer appears to be "yes, this can happen". In fact,
    we've seen at least two distinct failure modes where this happens.
    
    On a sc7180-trogdor Chromebook if you drop into kdb for a while and
    then resume, you can see:
    1. We get a "Tx timeout"
    2. The "Tx timeout" queues up a USB reset.
    3. In rtl8152_pre_reset() we try to reinit the hardware.
    4. The first several (2-9) register accesses fail with a timeout, then
       things recover.
    
    The above test case was actually fixed by the patch ("r8152: Increase
    USB control msg timeout to 5000ms as per spec") but at least shows
    that we really can see successful calls after failed ones.
    
    On a different (AMD) based Chromebook with a particular adapter, we
    found that during reboot tests we'd also sometimes get a transitory
    failure. In this case we saw -EPIPE being returned sometimes. Retrying
    worked, but retrying is not always safe for all register accesses
    since reading/writing some registers might have side effects (like
    registers that clear on read).
    
    Let's fully lock out all register access if a register access fails.
    When we do this, we'll try to queue up a USB reset and try to unlock
    register access after the reset. This is slightly tricker than it
    sounds since the r8152 driver has an optimized reset sequence that
    only works reliably after probe happens. In order to handle this, we
    avoid the optimized reset if probe didn't finish. Instead, we simply
    retry the probe routine in this case.
    
    When locking out access, we'll use the existing infrastructure that
    the driver was using when it detected we were unplugged. This keeps us
    from getting stuck in delay loops in some parts of the driver.
    Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
    Reviewed-by: default avatarGrant Grundler <grundler@chromium.org>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    d9962b0d
r8152.c 240 KB