1. 01 Jun, 2022 1 commit
    • Guoju Fang's avatar
      net: sched: add barrier to fix packet stuck problem for lockless qdisc · 2e8728c9
      Guoju Fang authored
      In qdisc_run_end(), the spin_unlock() only has store-release semantic,
      which guarantees all earlier memory access are visible before it. But
      the subsequent test_bit() has no barrier semantics so may be reordered
      ahead of the spin_unlock(). The store-load reordering may cause a packet
      stuck problem.
      
      The concurrent operations can be described as below,
               CPU 0                      |          CPU 1
         qdisc_run_end()                  |     qdisc_run_begin()
                .                         |           .
       ----> /* may be reorderd here */   |           .
      |         .                         |           .
      |     spin_unlock()                 |         set_bit()
      |         .                         |         smp_mb__after_atomic()
       ---- test_bit()                    |         spin_trylock()
                .                         |          .
      
      Consider the following sequence of events:
          CPU 0 reorder test_bit() ahead and see MISSED = 0
          CPU 1 calls set_bit()
          CPU 1 calls spin_trylock() and return fail
          CPU 0 executes spin_unlock()
      
      At the end of the sequence, CPU 0 calls spin_unlock() and does nothing
      because it see MISSED = 0. The skb on CPU 1 has beed enqueued but no one
      take it, until the next cpu pushing to the qdisc (if ever ...) will
      notice and dequeue it.
      
      This patch fix this by adding one explicit barrier. As spin_unlock() and
      test_bit() ordering is a store-load ordering, a full memory barrier
      smp_mb() is needed here.
      
      Fixes: a90c57f2 ("net: sched: fix packet stuck problem for lockless qdisc")
      Signed-off-by: default avatarGuoju Fang <gjfang@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20220528101628.120193-1-gjfang@linux.alibaba.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2e8728c9
  2. 31 May, 2022 5 commits
  3. 29 May, 2022 3 commits
    • David S. Miller's avatar
      Merge branch 'sfc-fixes' · 90343f57
      David S. Miller authored
      Íñigo Huguet says:
      
      ====================
      sfc: fix some efx_separate_tx_channels errors
      
      Trying to load sfc driver with modparam efx_separate_tx_channels=1
      resulted in errors during initialization and not being able to use the
      NIC. This patches fix a few bugs and make it work again.
      
      v2:
      * added Martin's patch instead of a previous mine. Mine one solved some
      of the initialization errors, but Martin's solves them also in all
      possible cases.
      * removed whitespaces cleanup, as requested by Jakub
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90343f57
    • Íñigo Huguet's avatar
      sfc: fix wrong tx channel offset with efx_separate_tx_channels · c308dfd1
      Íñigo Huguet authored
      tx_channel_offset is calculated in efx_allocate_msix_channels, but it is
      also calculated again in efx_set_channels because it was originally done
      there, and when efx_allocate_msix_channels was introduced it was
      forgotten to be removed from efx_set_channels.
      
      Moreover, the old calculation is wrong when using
      efx_separate_tx_channels because now we can have XDP channels after the
      TX channels, so n_channels - n_tx_channels doesn't point to the first TX
      channel.
      
      Remove the old calculation from efx_set_channels, and add the
      initialization of this variable if MSI or legacy interrupts are used,
      next to the initialization of the rest of the related variables, where
      it was missing.
      
      Fixes: 3990a8ff ("sfc: allocate channels for XDP tx queues")
      Reported-by: default avatarTianhao Zhao <tizhao@redhat.com>
      Signed-off-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c308dfd1
    • Martin Habets's avatar
      sfc: fix considering that all channels have TX queues · 2e102b53
      Martin Habets authored
      Normally, all channels have RX and TX queues, but this is not true if
      modparam efx_separate_tx_channels=1 is used. In that cases, some
      channels only have RX queues and others only TX queues (or more
      preciselly, they have them allocated, but not initialized).
      
      Fix efx_channel_has_tx_queues to return the correct value for this case
      too.
      
      Messages shown at probe time before the fix:
       sfc 0000:03:00.0 ens6f0np0: MC command 0x82 inlen 544 failed rc=-22 (raw=0) arg=0
       ------------[ cut here ]------------
       netdevice: ens6f0np0: failed to initialise TXQ -1
       WARNING: CPU: 1 PID: 626 at drivers/net/ethernet/sfc/ef10.c:2393 efx_ef10_tx_init+0x201/0x300 [sfc]
       [...] stripped
       RIP: 0010:efx_ef10_tx_init+0x201/0x300 [sfc]
       [...] stripped
       Call Trace:
        efx_init_tx_queue+0xaa/0xf0 [sfc]
        efx_start_channels+0x49/0x120 [sfc]
        efx_start_all+0x1f8/0x430 [sfc]
        efx_net_open+0x5a/0xe0 [sfc]
        __dev_open+0xd0/0x190
        __dev_change_flags+0x1b3/0x220
        dev_change_flags+0x21/0x60
       [...] stripped
      
      Messages shown at remove time before the fix:
       sfc 0000:03:00.0 ens6f0np0: failed to flush 10 queues
       sfc 0000:03:00.0 ens6f0np0: failed to flush queues
      
      Fixes: 8700aff0 ("sfc: fix channel allocation with brute force")
      Reported-by: default avatarTianhao Zhao <tizhao@redhat.com>
      Signed-off-by: default avatarMartin Habets <habetsm.xilinx@gmail.com>
      Tested-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e102b53
  4. 28 May, 2022 13 commits
  5. 27 May, 2022 15 commits
  6. 26 May, 2022 3 commits