1. 04 Mar, 2021 15 commits
    • Vladimir Oltean's avatar
      net: dsa: sja1105: fix SGMII PCS being forced to SPEED_UNKNOWN instead of SPEED_10 · 053d8ad1
      Vladimir Oltean authored
      When using MLO_AN_PHY or MLO_AN_FIXED, the MII_BMCR of the SGMII PCS is
      read before resetting the switch so it can be reprogrammed afterwards.
      This works for the speeds of 1Gbps and 100Mbps, but not for 10Mbps,
      because SPEED_10 is actually 0, so AND-ing anything with 0 is false,
      therefore that last branch is dead code.
      
      Do what others do (genphy_read_status_fixed, phy_mii_ioctl) and just
      remove the check for SPEED_10, let it fall into the default case.
      
      Fixes: ffe10e67 ("net: dsa: sja1105: Add support for the SGMII port")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      053d8ad1
    • Vladimir Oltean's avatar
      net: mscc: ocelot: properly reject destination IP keys in VCAP IS1 · f1becbed
      Vladimir Oltean authored
      An attempt is made to warn the user about the fact that VCAP IS1 cannot
      offload keys matching on destination IP (at least given the current half
      key format), but sadly that warning fails miserably in practice, due to
      the fact that it operates on an uninitialized "match" variable. We must
      first decode the keys from the flow rule.
      
      Fixes: 75944fda ("net: mscc: ocelot: offload ingress skbedit and vlan actions to VCAP IS1")
      Reported-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1becbed
    • David S. Miller's avatar
      Merge branch 'nexthop-blackhole' · 87e5e094
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      nexthop: Do not flush blackhole nexthops when loopback goes down
      
      Patch #1 prevents blackhole nexthops from being flushed when the
      loopback device goes down given that as far as user space is concerned,
      these nexthops do not have a nexthop device.
      
      Patch #2 adds a test case.
      
      There are no regressions in fib_nexthops.sh with this change:
      
       # ./fib_nexthops.sh
       ...
       Tests passed: 165
       Tests failed:   0
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87e5e094
    • Ido Schimmel's avatar
      selftests: fib_nexthops: Test blackhole nexthops when loopback goes down · 3a1099d3
      Ido Schimmel authored
      Test that blackhole nexthops are not flushed when the loopback device
      goes down.
      
      Output without previous patch:
      
       # ./fib_nexthops.sh -t basic
      
       Basic functional tests
       ----------------------
       TEST: List with nothing defined                                     [ OK ]
       TEST: Nexthop get on non-existent id                                [ OK ]
       TEST: Nexthop with no device or gateway                             [ OK ]
       TEST: Nexthop with down device                                      [ OK ]
       TEST: Nexthop with device that is linkdown                          [ OK ]
       TEST: Nexthop with device only                                      [ OK ]
       TEST: Nexthop with duplicate id                                     [ OK ]
       TEST: Blackhole nexthop                                             [ OK ]
       TEST: Blackhole nexthop with other attributes                       [ OK ]
       TEST: Blackhole nexthop with loopback device down                   [FAIL]
       TEST: Create group                                                  [ OK ]
       TEST: Create group with blackhole nexthop                           [FAIL]
       TEST: Create multipath group where 1 path is a blackhole            [ OK ]
       TEST: Multipath group can not have a member replaced by blackhole   [ OK ]
       TEST: Create group with non-existent nexthop                        [ OK ]
       TEST: Create group with same nexthop multiple times                 [ OK ]
       TEST: Replace nexthop with nexthop group                            [ OK ]
       TEST: Replace nexthop group with nexthop                            [ OK ]
       TEST: Nexthop group and device                                      [ OK ]
       TEST: Test proto flush                                              [ OK ]
       TEST: Nexthop group and blackhole                                   [ OK ]
      
       Tests passed:  19
       Tests failed:   2
      
      Output with previous patch:
      
       # ./fib_nexthops.sh -t basic
      
       Basic functional tests
       ----------------------
       TEST: List with nothing defined                                     [ OK ]
       TEST: Nexthop get on non-existent id                                [ OK ]
       TEST: Nexthop with no device or gateway                             [ OK ]
       TEST: Nexthop with down device                                      [ OK ]
       TEST: Nexthop with device that is linkdown                          [ OK ]
       TEST: Nexthop with device only                                      [ OK ]
       TEST: Nexthop with duplicate id                                     [ OK ]
       TEST: Blackhole nexthop                                             [ OK ]
       TEST: Blackhole nexthop with other attributes                       [ OK ]
       TEST: Blackhole nexthop with loopback device down                   [ OK ]
       TEST: Create group                                                  [ OK ]
       TEST: Create group with blackhole nexthop                           [ OK ]
       TEST: Create multipath group where 1 path is a blackhole            [ OK ]
       TEST: Multipath group can not have a member replaced by blackhole   [ OK ]
       TEST: Create group with non-existent nexthop                        [ OK ]
       TEST: Create group with same nexthop multiple times                 [ OK ]
       TEST: Replace nexthop with nexthop group                            [ OK ]
       TEST: Replace nexthop group with nexthop                            [ OK ]
       TEST: Nexthop group and device                                      [ OK ]
       TEST: Test proto flush                                              [ OK ]
       TEST: Nexthop group and blackhole                                   [ OK ]
      
       Tests passed:  21
       Tests failed:   0
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a1099d3
    • Ido Schimmel's avatar
      nexthop: Do not flush blackhole nexthops when loopback goes down · 76c03bf8
      Ido Schimmel authored
      As far as user space is concerned, blackhole nexthops do not have a
      nexthop device and therefore should not be affected by the
      administrative or carrier state of any netdev.
      
      However, when the loopback netdev goes down all the blackhole nexthops
      are flushed. This happens because internally the kernel associates
      blackhole nexthops with the loopback netdev.
      
      This behavior is both confusing to those not familiar with kernel
      internals and also diverges from the legacy API where blackhole IPv4
      routes are not flushed when the loopback netdev goes down:
      
       # ip route add blackhole 198.51.100.0/24
       # ip link set dev lo down
       # ip route show 198.51.100.0/24
       blackhole 198.51.100.0/24
      
      Blackhole IPv6 routes are flushed, but at least user space knows that
      they are associated with the loopback netdev:
      
       # ip -6 route show 2001:db8:1::/64
       blackhole 2001:db8:1::/64 dev lo metric 1024 pref medium
      
      Fix this by only flushing blackhole nexthops when the loopback netdev is
      unregistered.
      
      Fixes: ab84be7e ("net: Initial nexthop code")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reported-by: default avatarDonald Sharp <sharpd@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76c03bf8
    • Drew Fustini's avatar
      net: sctp: trivial: fix typo in comment · d93ef301
      Drew Fustini authored
      Fix typo of 'overflow' for comment in sctp_tsnmap_check().
      Reported-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: default avatarDrew Fustini <drew@beagleboard.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d93ef301
    • David S. Miller's avatar
      Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · e216674a
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-03-03
      
      This series contains updates to ixgbe and ixgbevf drivers.
      
      Bartosz Golaszewski does not error on -ENODEV from ixgbe_mii_bus_init()
      as this is valid for some devices with a shared bus for ixgbe.
      
      Antony Antony adds a check to fail for non transport mode SA with
      offload as this is not supported for ixgbe and ixgbevf.
      
      Dinghao Liu fixes a memory leak on failure to program a perfect filter
      for ixgbe.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e216674a
    • Dinghao Liu's avatar
      ixgbe: Fix memleak in ixgbe_configure_clsu32 · 7a766381
      Dinghao Liu authored
      When ixgbe_fdir_write_perfect_filter_82599() fails,
      input allocated by kzalloc() has not been freed,
      which leads to memleak.
      Signed-off-by: default avatarDinghao Liu <dinghao.liu@zju.edu.cn>
      Reviewed-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Tested-by: default avatarTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      7a766381
    • Antony Antony's avatar
      ixgbe: fail to create xfrm offload of IPsec tunnel mode SA · d785e1fe
      Antony Antony authored
      Based on talks and indirect references ixgbe IPsec offlod do not
      support IPsec tunnel mode offload. It can only support IPsec transport
      mode offload. Now explicitly fail when creating non transport mode SA
      with offload to avoid false performance expectations.
      
      Fixes: 63a67fe2 ("ixgbe: add ipsec offload add and remove SA")
      Signed-off-by: default avatarAntony Antony <antony@phenome.org>
      Acked-by: default avatarShannon Nelson <snelson@pensando.io>
      Tested-by: default avatarTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      d785e1fe
    • zhang kai's avatar
      a9ecb0cb
    • Jisheng Zhang's avatar
      net: 9p: advance iov on empty read · d65614a0
      Jisheng Zhang authored
      I met below warning when cating a small size(about 80bytes) txt file
      on 9pfs(msize=2097152 is passed to 9p mount option), the reason is we
      miss iov_iter_advance() if the read count is 0 for zerocopy case, so
      we didn't truncate the pipe, then iov_iter_pipe() thinks the pipe is
      full. Fix it by removing the exception for 0 to ensure to call
      iov_iter_advance() even on empty read for zerocopy case.
      
      [    8.279568] WARNING: CPU: 0 PID: 39 at lib/iov_iter.c:1203 iov_iter_pipe+0x31/0x40
      [    8.280028] Modules linked in:
      [    8.280561] CPU: 0 PID: 39 Comm: cat Not tainted 5.11.0+ #6
      [    8.281260] RIP: 0010:iov_iter_pipe+0x31/0x40
      [    8.281974] Code: 2b 42 54 39 42 5c 76 22 c7 07 20 00 00 00 48 89 57 18 8b 42 50 48 c7 47 08 b
      [    8.283169] RSP: 0018:ffff888000cbbd80 EFLAGS: 00000246
      [    8.283512] RAX: 0000000000000010 RBX: ffff888000117d00 RCX: 0000000000000000
      [    8.283876] RDX: ffff88800031d600 RSI: 0000000000000000 RDI: ffff888000cbbd90
      [    8.284244] RBP: ffff888000cbbe38 R08: 0000000000000000 R09: ffff8880008d2058
      [    8.284605] R10: 0000000000000002 R11: ffff888000375510 R12: 0000000000000050
      [    8.284964] R13: ffff888000cbbe80 R14: 0000000000000050 R15: ffff88800031d600
      [    8.285439] FS:  00007f24fd8af600(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
      [    8.285844] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    8.286150] CR2: 00007f24fd7d7b90 CR3: 0000000000c97000 CR4: 00000000000406b0
      [    8.286710] Call Trace:
      [    8.288279]  generic_file_splice_read+0x31/0x1a0
      [    8.289273]  ? do_splice_to+0x2f/0x90
      [    8.289511]  splice_direct_to_actor+0xcc/0x220
      [    8.289788]  ? pipe_to_sendpage+0xa0/0xa0
      [    8.290052]  do_splice_direct+0x8b/0xd0
      [    8.290314]  do_sendfile+0x1ad/0x470
      [    8.290576]  do_syscall_64+0x2d/0x40
      [    8.290818]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [    8.291409] RIP: 0033:0x7f24fd7dca0a
      [    8.292511] Code: c3 0f 1f 80 00 00 00 00 4c 89 d2 4c 89 c6 e9 bd fd ff ff 0f 1f 44 00 00 31 8
      [    8.293360] RSP: 002b:00007ffc20932818 EFLAGS: 00000206 ORIG_RAX: 0000000000000028
      [    8.293800] RAX: ffffffffffffffda RBX: 0000000001000000 RCX: 00007f24fd7dca0a
      [    8.294153] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001
      [    8.294504] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
      [    8.294867] R10: 0000000001000000 R11: 0000000000000206 R12: 0000000000000003
      [    8.295217] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
      [    8.295782] ---[ end trace 63317af81b3ca24b ]---
      Signed-off-by: default avatarJisheng Zhang <Jisheng.Zhang@synaptics.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d65614a0
    • Hayes Wang's avatar
      Revert "r8152: adjust the settings about MAC clock speed down for RTL8153" · 4b5dc1a9
      Hayes Wang authored
      This reverts commit 134f98bc.
      
      The r8153_mac_clk_spd() is used for RTL8153A only, because the register
      table of RTL8153B is different from RTL8153A. However, this function would
      be called when RTL8153B calls r8153_first_init() and r8153_enter_oob().
      That causes RTL8153B becomes unstable when suspending and resuming. The
      worst case may let the device stop working.
      
      Besides, revert this commit to disable MAC clock speed down for RTL8153A.
      It would avoid the known issue when enabling U1. The data of the first
      control transfer may be wrong when exiting U1.
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b5dc1a9
    • Matthias Schiffer's avatar
      net: l2tp: reduce log level of messages in receive path, add counter instead · 3e59e885
      Matthias Schiffer authored
      Commit 5ee759cd ("l2tp: use standard API for warning log messages")
      changed a number of warnings about invalid packets in the receive path
      so that they are always shown, instead of only when a special L2TP debug
      flag is set. Even with rate limiting these warnings can easily cause
      significant log spam - potentially triggered by a malicious party
      sending invalid packets on purpose.
      
      In addition these warnings were noticed by projects like Tunneldigger [1],
      which uses L2TP for its data path, but implements its own control
      protocol (which is sufficiently different from L2TP data packets that it
      would always be passed up to userspace even with future extensions of
      L2TP).
      
      Some of the warnings were already redundant, as l2tp_stats has a counter
      for these packets. This commit adds one additional counter for invalid
      packets that are passed up to userspace. Packets with unknown session are
      not counted as invalid, as there is nothing wrong with the format of
      these packets.
      
      With the additional counter, all of these messages are either redundant
      or benign, so we reduce them to pr_debug_ratelimited().
      
      [1] https://github.com/wlanslovenija/tunneldigger/issues/160
      
      Fixes: 5ee759cd ("l2tp: use standard API for warning log messages")
      Signed-off-by: default avatarMatthias Schiffer <mschiffer@universe-factory.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e59e885
    • Atish Patra's avatar
      net: macb: Add default usrio config to default gem config · b1242236
      Atish Patra authored
      There is no usrio config defined for default gem config leading to
      a kernel panic devices that don't define a data. This issue can be
      reprdouced with microchip polar fire soc where compatible string
      is defined as "cdns,macb".
      
      Fixes: edac6386 ("add userio bits as platform configuration")
      Signed-off-by: default avatarAtish Patra <atish.patra@wdc.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1242236
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-2021-03-03' of... · ef9a6df0
      David S. Miller authored
      Merge tag 'wireless-drivers-2021-03-03' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for v5.12
      
      Second set of fixes for v5.12. Only three iwlwifi fixes this time, the
      crash with MVM being the most important one and reported by multiple
      people.
      
      iwlwifi
      
      * fix kernel crash regression when using LTO with MVM devices
      
      * fix printk format warnings
      
      * fix potential deadlock found by lockdep
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef9a6df0
  2. 03 Mar, 2021 7 commits
    • Jakub Kicinski's avatar
      docs: networking: drop special stable handling · dbbe7c96
      Jakub Kicinski authored
      Leave it to Greg.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbbe7c96
    • Ong Boon Leong's avatar
      net: stmmac: fix incorrect DMA channel intr enable setting of EQoS v4.10 · 879c348c
      Ong Boon Leong authored
      We introduce dwmac410_dma_init_channel() here for both EQoS v4.10 and
      above which use different DMA_CH(n)_Interrupt_Enable bit definitions for
      NIE and AIE.
      
      Fixes: 48863ce5 ("stmmac: add DMA support for GMAC 4.xx")
      Signed-off-by: default avatarOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: default avatarRamesh Babu B <ramesh.babu.b@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      879c348c
    • Michal Suchanek's avatar
      ibmvnic: Fix possibly uninitialized old_num_tx_queues variable warning. · 6881b07f
      Michal Suchanek authored
      GCC 7.5 reports:
      ../drivers/net/ethernet/ibm/ibmvnic.c: In function 'ibmvnic_reset_init':
      ../drivers/net/ethernet/ibm/ibmvnic.c:5373:51: warning: 'old_num_tx_queues' may be used uninitialized in this function [-Wmaybe-uninitialized]
      ../drivers/net/ethernet/ibm/ibmvnic.c:5373:6: warning: 'old_num_rx_queues' may be used uninitialized in this function [-Wmaybe-uninitialized]
      
      The variable is initialized only if(reset) and used only if(reset &&
      something) so this is a false positive. However, there is no reason to
      not initialize the variables unconditionally avoiding the warning.
      
      Fixes: 635e442f ("ibmvnic: merge ibmvnic_reset_init and ibmvnic_init")
      Signed-off-by: default avatarMichal Suchanek <msuchanek@suse.de>
      Reviewed-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6881b07f
    • Dan Carpenter's avatar
      octeontx2-af: cn10k: fix an array overflow in is_lmac_valid() · 2378b2c9
      Dan Carpenter authored
      The value of "lmac_id" can be controlled by the user and if it is larger
      then the number of bits in long then it reads outside the bitmap.
      The highest valid value is less than MAX_LMAC_PER_CGX (4).
      
      Fixes: 91c6945e ("octeontx2-af: cn10k: Add RPM MAC support")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2378b2c9
    • Jiri Kosina's avatar
      iwlwifi: don't call netif_napi_add() with rxq->lock held (was Re: Lockdep... · 295d4cd8
      Jiri Kosina authored
      iwlwifi: don't call netif_napi_add() with rxq->lock held (was Re: Lockdep warning in iwl_pcie_rx_handle())
      
      We can't call netif_napi_add() with rxq-lock held, as there is a potential
      for deadlock as spotted by lockdep (see below). rxq->lock is not
      protecting anything over the netif_napi_add() codepath anyway, so let's
      drop it just before calling into NAPI.
      
       ========================================================
       WARNING: possible irq lock inversion dependency detected
       5.12.0-rc1-00002-gbada49429032 #5 Not tainted
       --------------------------------------------------------
       irq/136-iwlwifi/565 just changed the state of lock:
       ffff89f28433b0b0 (&rxq->lock){+.-.}-{2:2}, at: iwl_pcie_rx_handle+0x7f/0x960 [iwlwifi]
       but this lock took another, SOFTIRQ-unsafe lock in the past:
        (napi_hash_lock){+.+.}-{2:2}
      
       and interrupts could create inverse lock ordering between them.
      
       other info that might help us debug this:
        Possible interrupt unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(napi_hash_lock);
                                      local_irq_disable();
                                      lock(&rxq->lock);
                                      lock(napi_hash_lock);
         <Interrupt>
           lock(&rxq->lock);
      
        *** DEADLOCK ***
      
       1 lock held by irq/136-iwlwifi/565:
        #0: ffff89f2b1440170 (sync_cmd_lockdep_map){+.+.}-{0:0}, at: iwl_pcie_irq_handler+0x5/0xb30
      
       the shortest dependencies between 2nd lock and 1st lock:
        -> (napi_hash_lock){+.+.}-{2:2} {
           HARDIRQ-ON-W at:
                             lock_acquire+0x277/0x3d0
                             _raw_spin_lock+0x2c/0x40
                             netif_napi_add+0x14b/0x270
                             e1000_probe+0x2fe/0xee0 [e1000e]
                             local_pci_probe+0x42/0x90
                             pci_device_probe+0x10b/0x1c0
                             really_probe+0xef/0x4b0
                             driver_probe_device+0xde/0x150
                             device_driver_attach+0x4f/0x60
                             __driver_attach+0x9c/0x140
                             bus_for_each_dev+0x79/0xc0
                             bus_add_driver+0x18d/0x220
                             driver_register+0x5b/0xf0
                             do_one_initcall+0x5b/0x300
                             do_init_module+0x5b/0x21c
                             load_module+0x1dae/0x22c0
                             __do_sys_finit_module+0xad/0x110
                             do_syscall_64+0x33/0x80
                             entry_SYSCALL_64_after_hwframe+0x44/0xae
           SOFTIRQ-ON-W at:
                             lock_acquire+0x277/0x3d0
                             _raw_spin_lock+0x2c/0x40
                             netif_napi_add+0x14b/0x270
                             e1000_probe+0x2fe/0xee0 [e1000e]
                             local_pci_probe+0x42/0x90
                             pci_device_probe+0x10b/0x1c0
                             really_probe+0xef/0x4b0
                             driver_probe_device+0xde/0x150
                             device_driver_attach+0x4f/0x60
                             __driver_attach+0x9c/0x140
                             bus_for_each_dev+0x79/0xc0
                             bus_add_driver+0x18d/0x220
                             driver_register+0x5b/0xf0
                             do_one_initcall+0x5b/0x300
                             do_init_module+0x5b/0x21c
                             load_module+0x1dae/0x22c0
                             __do_sys_finit_module+0xad/0x110
                             do_syscall_64+0x33/0x80
                             entry_SYSCALL_64_after_hwframe+0x44/0xae
           INITIAL USE at:
                            lock_acquire+0x277/0x3d0
                            _raw_spin_lock+0x2c/0x40
                            netif_napi_add+0x14b/0x270
                            e1000_probe+0x2fe/0xee0 [e1000e]
                            local_pci_probe+0x42/0x90
                            pci_device_probe+0x10b/0x1c0
                            really_probe+0xef/0x4b0
                            driver_probe_device+0xde/0x150
                            device_driver_attach+0x4f/0x60
                            __driver_attach+0x9c/0x140
                            bus_for_each_dev+0x79/0xc0
                            bus_add_driver+0x18d/0x220
                            driver_register+0x5b/0xf0
                            do_one_initcall+0x5b/0x300
                            do_init_module+0x5b/0x21c
                            load_module+0x1dae/0x22c0
                            __do_sys_finit_module+0xad/0x110
                            do_syscall_64+0x33/0x80
                            entry_SYSCALL_64_after_hwframe+0x44/0xae
         }
         ... key      at: [<ffffffffae84ef38>] napi_hash_lock+0x18/0x40
         ... acquired at:
          _raw_spin_lock+0x2c/0x40
          netif_napi_add+0x14b/0x270
          _iwl_pcie_rx_init+0x1f4/0x710 [iwlwifi]
          iwl_pcie_rx_init+0x1b/0x3b0 [iwlwifi]
          iwl_trans_pcie_start_fw+0x2ac/0x6a0 [iwlwifi]
          iwl_mvm_load_ucode_wait_alive+0x116/0x460 [iwlmvm]
          iwl_run_init_mvm_ucode+0xa4/0x3a0 [iwlmvm]
          iwl_op_mode_mvm_start+0x9ed/0xbf0 [iwlmvm]
          _iwl_op_mode_start.isra.4+0x42/0x80 [iwlwifi]
          iwl_opmode_register+0x71/0xe0 [iwlwifi]
          iwl_mvm_init+0x34/0x1000 [iwlmvm]
          do_one_initcall+0x5b/0x300
          do_init_module+0x5b/0x21c
          load_module+0x1dae/0x22c0
          __do_sys_finit_module+0xad/0x110
          do_syscall_64+0x33/0x80
          entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [ ... lockdep output trimmed .... ]
      
      Fixes: 25edc8f2 ("iwlwifi: pcie: properly implement NAPI")
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Acked-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2103021134060.12405@cbobk.fhfr.pm
      295d4cd8
    • Pierre-Louis Bossart's avatar
      iwlwifi: fix ARCH=i386 compilation warnings · 436b2656
      Pierre-Louis Bossart authored
      An unsigned long variable should rely on '%lu' format strings, not '%zd'
      
      Fixes: a1a6a4cf ("iwlwifi: pnvm: implement reading PNVM from UEFI")
      Signed-off-by: default avatarPierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
      Acked-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20210302011640.1276636-1-pierre-louis.bossart@linux.intel.com
      436b2656
    • Wei Yongjun's avatar
      iwlwifi: mvm: add terminate entry for dmi_system_id tables · a22549f1
      Wei Yongjun authored
      Make sure dmi_system_id tables are NULL terminated. This crashed when LTO was enabled:
      
      BUG: KASAN: global-out-of-bounds in dmi_check_system+0x5a/0x70
      Read of size 1 at addr ffffffffc16af750 by task NetworkManager/1913
      
      CPU: 4 PID: 1913 Comm: NetworkManager Not tainted 5.12.0-rc1+ #10057
      Hardware name: LENOVO 20THCTO1WW/20THCTO1WW, BIOS N2VET27W (1.12 ) 12/21/2020
      Call Trace:
       dump_stack+0x90/0xbe
       print_address_description.constprop.0+0x1d/0x140
       ? dmi_check_system+0x5a/0x70
       ? dmi_check_system+0x5a/0x70
       kasan_report.cold+0x7b/0xd4
       ? dmi_check_system+0x5a/0x70
       __asan_load1+0x4d/0x50
       dmi_check_system+0x5a/0x70
       iwl_mvm_up+0x1360/0x1690 [iwlmvm]
       ? iwl_mvm_send_recovery_cmd+0x270/0x270 [iwlmvm]
       ? setup_object.isra.0+0x27/0xd0
       ? kasan_poison+0x20/0x50
       ? ___slab_alloc.constprop.0+0x483/0x5b0
       ? mempool_kmalloc+0x17/0x20
       ? ftrace_graph_ret_addr+0x2a/0xb0
       ? kasan_poison+0x3c/0x50
       ? cfg80211_iftype_allowed+0x2e/0x90 [cfg80211]
       ? __kasan_check_write+0x14/0x20
       ? mutex_lock+0x86/0xe0
       ? __mutex_lock_slowpath+0x20/0x20
       __iwl_mvm_mac_start+0x49/0x290 [iwlmvm]
       iwl_mvm_mac_start+0x37/0x50 [iwlmvm]
       drv_start+0x73/0x1b0 [mac80211]
       ieee80211_do_open+0x53e/0xf10 [mac80211]
       ? ieee80211_check_concurrent_iface+0x266/0x2e0 [mac80211]
       ieee80211_open+0xb9/0x100 [mac80211]
       __dev_open+0x1b8/0x280
      
      Fixes: a2ac0f48 ("iwlwifi: mvm: implement approved list for the PPAG feature")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Tested-by: default avatarVictor Michel <vic.michel.web@gmail.com>
      Acked-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      [kvalo@codeaurora.org: improve commit log]
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20210223140039.1708534-1-weiyongjun1@huawei.com
      a22549f1
  3. 02 Mar, 2021 2 commits
    • Biao Huang's avatar
      net: ethernet: mtk-star-emac: fix wrong unmap in RX handling · 95b39f07
      Biao Huang authored
      mtk_star_dma_unmap_rx() should unmap the dma_addr of old skb rather than
      that of new skb.
      Assign new_dma_addr to desc_data.dma_addr after all handling of old skb
      ends to avoid unexpected receive side error.
      
      Fixes: f96e9641 ("net: ethernet: mtk-star-emac: fix error path in RX handling")
      Signed-off-by: default avatarBiao Huang <biao.huang@mediatek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95b39f07
    • Wong Vee Khee's avatar
      stmmac: intel: Fix mdio bus registration issue for TGL-H/ADL-S · fa706dce
      Wong Vee Khee authored
      On Intel platforms which consist of two Ethernet Controllers such as
      TGL-H and ADL-S, a unique MDIO bus id is required for MDIO bus to be
      successful registered:
      
      [   13.076133] sysfs: cannot create duplicate filename '/class/mdio_bus/stmmac-1'
      [   13.083404] CPU: 8 PID: 1898 Comm: systemd-udevd Tainted: G     U            5.11.0-net-next #106
      [   13.092410] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-S ADP-S DRR4 CRB, BIOS ADLIFSI1.R00.1494.B00.2012031421 12/03/2020
      [   13.105709] Call Trace:
      [   13.108176]  dump_stack+0x64/0x7c
      [   13.111553]  sysfs_warn_dup+0x56/0x70
      [   13.115273]  sysfs_do_create_link_sd.isra.2+0xbd/0xd0
      [   13.120371]  device_add+0x4df/0x840
      [   13.123917]  ? complete_all+0x2a/0x40
      [   13.127636]  __mdiobus_register+0x98/0x310 [libphy]
      [   13.132572]  stmmac_mdio_register+0x1c5/0x3f0 [stmmac]
      [   13.137771]  ? stmmac_napi_add+0xa5/0xf0 [stmmac]
      [   13.142493]  stmmac_dvr_probe+0x806/0xee0 [stmmac]
      [   13.147341]  intel_eth_pci_probe+0x1cb/0x250 [dwmac_intel]
      [   13.152884]  pci_device_probe+0xd2/0x150
      [   13.156897]  really_probe+0xf7/0x4d0
      [   13.160527]  driver_probe_device+0x5d/0x140
      [   13.164761]  device_driver_attach+0x4f/0x60
      [   13.168996]  __driver_attach+0xa2/0x140
      [   13.172891]  ? device_driver_attach+0x60/0x60
      [   13.177300]  bus_for_each_dev+0x76/0xc0
      [   13.181188]  bus_add_driver+0x189/0x230
      [   13.185083]  ? 0xffffffffc0795000
      [   13.188446]  driver_register+0x5b/0xf0
      [   13.192249]  ? 0xffffffffc0795000
      [   13.195577]  do_one_initcall+0x4d/0x210
      [   13.199467]  ? kmem_cache_alloc_trace+0x2ff/0x490
      [   13.204228]  do_init_module+0x5b/0x21c
      [   13.208031]  load_module+0x2a0c/0x2de0
      [   13.211838]  ? __do_sys_finit_module+0xb1/0x110
      [   13.216420]  __do_sys_finit_module+0xb1/0x110
      [   13.220825]  do_syscall_64+0x33/0x40
      [   13.224451]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [   13.229515] RIP: 0033:0x7fc2b1919ccd
      [   13.233113] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 31 0c 00 f7 d8 64 89 01 48
      [   13.251912] RSP: 002b:00007ffcea2e5b98 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      [   13.259527] RAX: ffffffffffffffda RBX: 0000560558920f10 RCX: 00007fc2b1919ccd
      [   13.266706] RDX: 0000000000000000 RSI: 00007fc2b1a881e3 RDI: 0000000000000012
      [   13.273887] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
      [   13.281036] R10: 0000000000000012 R11: 0000000000000246 R12: 00007fc2b1a881e3
      [   13.288183] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffcea2e5d58
      [   13.295389] libphy: mii_bus stmmac-1 failed to register
      
      Fixes: 88af9bd4 ("stmmac: intel: Add ADL-S 1Gbps PCI IDs")
      Fixes: 8450e23f ("stmmac: intel: Add PCI IDs for TGL-H platform")
      Signed-off-by: default avatarWong Vee Khee <vee.khee.wong@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa706dce
  4. 01 Mar, 2021 16 commits
    • Eric Dumazet's avatar
      tcp: add sanity tests to TCP_QUEUE_SEQ · 8811f4a9
      Eric Dumazet authored
      Qingyu Li reported a syzkaller bug where the repro
      changes RCV SEQ _after_ restoring data in the receive queue.
      
      mprotect(0x4aa000, 12288, PROT_READ)    = 0
      mmap(0x1ffff000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x1ffff000
      mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000
      mmap(0x21000000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x21000000
      socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 3
      setsockopt(3, SOL_TCP, TCP_REPAIR, [1], 4) = 0
      connect(3, {sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28) = 0
      setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [1], 4) = 0
      sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="0x0000000000000003\0\0", iov_len=20}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
      setsockopt(3, SOL_TCP, TCP_REPAIR, [0], 4) = 0
      setsockopt(3, SOL_TCP, TCP_QUEUE_SEQ, [128], 4) = 0
      recvfrom(3, NULL, 20, 0, NULL, NULL)    = -1 ECONNRESET (Connection reset by peer)
      
      syslog shows:
      [  111.205099] TCP recvmsg seq # bug 2: copied 80, seq 0, rcvnxt 80, fl 0
      [  111.207894] WARNING: CPU: 1 PID: 356 at net/ipv4/tcp.c:2343 tcp_recvmsg_locked+0x90e/0x29a0
      
      This should not be allowed. TCP_QUEUE_SEQ should only be used
      when queues are empty.
      
      This patch fixes this case, and the tx path as well.
      
      Fixes: ee995283 ("tcp: Initial repair mode")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=212005Reported-by: default avatarQingyu Li <ieatmuttonchuan@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8811f4a9
    • Andrea Parri (Microsoft)'s avatar
      hv_netvsc: Fix validation in netvsc_linkstatus_callback() · 3946688e
      Andrea Parri (Microsoft) authored
      Contrary to the RNDIS protocol specification, certain (pre-Fe)
      implementations of Hyper-V's vSwitch did not account for the status
      buffer field in the length of an RNDIS packet; the bug was fixed in
      newer implementations.  Validate the status buffer fields using the
      length of the 'vmtransfer_page' packet (all implementations), that
      is known/validated to be less than or equal to the receive section
      size and not smaller than the length of the RNDIS message.
      Reported-by: default avatarDexuan Cui <decui@microsoft.com>
      Suggested-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarAndrea Parri (Microsoft) <parri.andrea@gmail.com>
      Fixes: 505e3f00 ("hv_netvsc: Add (more) validation for untrusted Hyper-V values")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3946688e
    • DENG Qingfang's avatar
      net: dsa: tag_mtk: fix 802.1ad VLAN egress · 9200f515
      DENG Qingfang authored
      A different TPID bit is used for 802.1ad VLAN frames.
      Reported-by: default avatarIlario Gelmetti <iochesonome@gmail.com>
      Fixes: f0af3431 ("net: dsa: mediatek: combine MediaTek tag with VLAN tag")
      Signed-off-by: default avatarDENG Qingfang <dqfext@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9200f515
    • Willem de Bruijn's avatar
      net: expand textsearch ts_state to fit skb_seq_state · b228c9b0
      Willem de Bruijn authored
      The referenced commit expands the skb_seq_state used by
      skb_find_text with a 4B frag_off field, growing it to 48B.
      
      This exceeds container ts_state->cb, causing a stack corruption:
      
      [   73.238353] Kernel panic - not syncing: stack-protector: Kernel stack
      is corrupted in: skb_find_text+0xc5/0xd0
      [   73.247384] CPU: 1 PID: 376 Comm: nping Not tainted 5.11.0+ #4
      [   73.252613] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS 1.14.0-2 04/01/2014
      [   73.260078] Call Trace:
      [   73.264677]  dump_stack+0x57/0x6a
      [   73.267866]  panic+0xf6/0x2b7
      [   73.270578]  ? skb_find_text+0xc5/0xd0
      [   73.273964]  __stack_chk_fail+0x10/0x10
      [   73.277491]  skb_find_text+0xc5/0xd0
      [   73.280727]  string_mt+0x1f/0x30
      [   73.283639]  ipt_do_table+0x214/0x410
      
      The struct is passed between skb_find_text and its callbacks
      skb_prepare_seq_read, skb_seq_read and skb_abort_seq read through
      the textsearch interface using TS_SKB_CB.
      
      I assumed that this mapped to skb->cb like other .._SKB_CB wrappers.
      skb->cb is 48B. But it maps to ts_state->cb, which is only 40B.
      
      skb->cb was increased from 40B to 48B after ts_state was introduced,
      in commit 3e3850e9 ("[NETFILTER]: Fix xfrm lookup in
      ip_route_me_harder/ip6_route_me_harder").
      
      Increase ts_state.cb[] to 48 to fit the struct.
      
      Also add a BUILD_BUG_ON to avoid a repeat.
      
      The alternative is to directly add a dependency from textsearch onto
      linux/skbuff.h, but I think the intent is textsearch to have no such
      dependencies on its callers.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=211911
      Fixes: 97550f6f ("net: compound page support in skb_seq_read")
      Reported-by: default avatarKris Karas <bugs-a17@moonlit-rail.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b228c9b0
    • Masanari Iida's avatar
      docs: networking: bonding.rst Fix a typo in bonding.rst · 2353db75
      Masanari Iida authored
      This patch fixes a spelling typo in bonding.rst.
      Signed-off-by: default avatarMasanari Iida <standby24x7@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2353db75
    • David S. Miller's avatar
      Merge tag 'linux-can-fixes-for-5.12-20210301' of... · 2eb48982
      David S. Miller authored
      Merge tag 'linux-can-fixes-for-5.12-20210301' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2021-03-01
      
      this is a pull request of 6 patches for net/master.
      
      The first 3 patches are by Joakim Zhang for the flexcan driver and fix
      the probing and starting of the chip.
      
      The next patch is by me, for the mcp251xfd driver and reverts the BQL
      support. BQL support got mainline with rc1 and assumes that CAN frames
      are always echoed, which is not the case. A proper fix requires
      changes more changes and will be rolled out via linux-can-next later.
      
      Oleksij Rempel's patch fixes the socket ref counting if socket was
      closed before setting skb ownership.
      
      Torin Cooper-Bennun's patch for the tcan4x5x driver fixes a race
      condition, where the chip is first attached the bus and then the MRAM
      is initialized, which may result in lost data.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2eb48982
    • David S. Miller's avatar
      Merge branch 'enetc-fixes' · 8a00946e
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Fixes for NXP ENETC driver
      
      This contains an assorted set of fixes collected over the past 2 weeks
      on the enetc driver. Some are related to VLAN processing, some to
      physical link settings, some are fixups of previous hardware workarounds,
      and some are simply zero-day data path bugs that for some reason were
      never caught or at least identified.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a00946e
    • Vladimir Oltean's avatar
      net: enetc: keep RX ring consumer index in sync with hardware · 3a5d12c9
      Vladimir Oltean authored
      The RX rings have a producer index owned by hardware, where newly
      received frame buffers are placed, and a consumer index owned by
      software, where newly allocated buffers are placed, in expectation of
      hardware being able to place frame data in them.
      
      Hardware increments the producer index when a frame is received, however
      it is not allowed to increment the producer index to match the consumer
      index (RBCIR) since the ring can hold at most RBLENR[LENGTH]-1 received
      BDs. Whenever the producer index matches the value of the consumer
      index, the ring has no unprocessed received frames and all BDs in the
      ring have been initialized/prepared by software, i.e. hardware owns all
      BDs in the ring.
      
      The code uses the next_to_clean variable to keep track of the producer
      index, and the next_to_use variable to keep track of the consumer index.
      
      The RX rings are seeded from enetc_refill_rx_ring, which is called from
      two places:
      
      1. initially the ring is seeded until full with enetc_bd_unused(rx_ring),
         i.e. with 511 buffers. This will make next_to_clean=0 and next_to_use=511:
      
      .ndo_open
      -> enetc_open
         -> enetc_setup_bdrs
            -> enetc_setup_rxbdr
               -> enetc_refill_rx_ring
      
      2. then during the data path processing, it is refilled with 16 buffers
         at a time:
      
      enetc_msix
      -> napi_schedule
         -> enetc_poll
            -> enetc_clean_rx_ring
               -> enetc_refill_rx_ring
      
      There is just one problem: the initial seeding done during .ndo_open
      updates just the producer index (ENETC_RBPIR) with 0, and the software
      next_to_clean and next_to_use variables. Notably, it will not update the
      consumer index to make the hardware aware of the newly added buffers.
      
      Wait, what? So how does it work?
      
      Well, the reset values of the producer index and of the consumer index
      of a ring are both zero. As per the description in the second paragraph,
      it means that the ring is full of buffers waiting for hardware to put
      frames in them, which by coincidence is almost true, because we have in
      fact seeded 511 buffers into the ring.
      
      But will the hardware attempt to access the 512th entry of the ring,
      which has an invalid BD in it? Well, no, because in order to do that, it
      would have to first populate the first 511 entries, and the NAPI
      enetc_poll will kick in by then. Eventually, after 16 processed slots
      have become available in the RX ring, enetc_clean_rx_ring will call
      enetc_refill_rx_ring and then will [ finally ] update the consumer index
      with the new software next_to_use variable. From now on, the
      next_to_clean and next_to_use variables are in sync with the producer
      and consumer ring indices.
      
      So the day is saved, right? Well, not quite. Freeing the memory
      allocated for the rings is done in:
      
      enetc_close
      -> enetc_clear_bdrs
         -> enetc_clear_rxbdr
            -> this just disables the ring
      -> enetc_free_rxtx_rings
         -> enetc_free_rx_ring
            -> sets next_to_clean and next_to_use to 0
      
      but again, nothing is committed to the hardware producer and consumer
      indices (yay!). The assumption is that the ring is disabled, so the
      indices don't matter anyway, and it's the responsibility of the "open"
      code path to set those up.
      
      .. Except that the "open" code path does not set those up properly.
      
      While initially, things almost work, during subsequent enetc_close ->
      enetc_open sequences, we have problems. To be precise, the enetc_open
      that is subsequent to enetc_close will again refill the ring with 511
      entries, but it will leave the consumer index untouched. Untouched
      means, of course, equal to the value it had before disabling the ring
      and draining the old buffers in enetc_close.
      
      But as mentioned, enetc_setup_rxbdr will at least update the producer
      index though, through this line of code:
      
      	enetc_rxbdr_wr(hw, idx, ENETC_RBPIR, 0);
      
      so at this stage we'll have:
      
      next_to_clean=0 (in hardware 0)
      next_to_use=511 (in hardware we'll have the refill index prior to enetc_close)
      
      Again, the next_to_clean and producer index are in sync and set to
      correct values, so the driver manages to limp on. Eventually, 16 ring
      entries will be consumed by enetc_poll, and the savior
      enetc_clean_rx_ring will come and call enetc_refill_rx_ring, and then
      update the hardware consumer ring based upon the new next_to_use.
      
      So.. it works?
      Well, by coincidence, it almost does, but there's a circumstance where
      enetc_clean_rx_ring won't be there to save us. If the previous value of
      the consumer index was 15, there's a problem, because the NAPI poll
      sequence will only issue a refill when 16 or more buffers have been
      consumed.
      
      It's easiest to illustrate this with an example:
      
      ip link set eno0 up
      ip addr add 192.168.100.1/24 dev eno0
      ping 192.168.100.1 -c 20 # ping this port from another board
      ip link set eno0 down
      ip link set eno0 up
      ping 192.168.100.1 -c 20 # ping it again from the same other board
      
      One by one:
      
      1. ip link set eno0 up
      -> calls enetc_setup_rxbdr:
         -> calls enetc_refill_rx_ring(511 buffers)
         -> next_to_clean=0 (in hw 0)
         -> next_to_use=511 (in hw 0)
      
      2. ping 192.168.100.1 -c 20 # ping this port from another board
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=15 next_to_clean 14 (in hw 15) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: enetc_refill_rx_ring(16) increments next_to_use by 16 (mod 512) and writes it to hw
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=0 next_to_clean 15 (in hw 16) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 16 (in hw 17) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 17 (in hw 18) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 18 (in hw 19) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 19 (in hw 20) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 20 (in hw 21) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 21 (in hw 22) next_to_use 15 (in hw 15)
      
      20 packets transmitted, 20 packets received, 0% packet loss
      
      3. ip link set eno0 down
      enetc_free_rx_ring: next_to_clean 0 (in hw 22), next_to_use 0 (in hw 15)
      
      4. ip link set eno0 up
      -> calls enetc_setup_rxbdr:
         -> calls enetc_refill_rx_ring(511 buffers)
         -> next_to_clean=0 (in hw 0)
         -> next_to_use=511 (in hw 15)
      
      5. ping 192.168.100.1 -c 20 # ping it again from the same other board
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 15)
      
      20 packets transmitted, 12 packets received, 40% packet loss
      
      And there it dies. No enetc_refill_rx_ring (because cleaned_cnt must be equal
      to 15 for that to happen), no nothing. The hardware enters the condition where
      the producer (14) + 1 is equal to the consumer (15) index, which makes it
      believe it has no more free buffers to put packets in, so it starts discarding
      them:
      
      ip netns exec ns0 ethtool -S eno0 | grep -v ': 0'
      NIC statistics:
           Rx ring  0 discarded frames: 8
      
      Summarized, if the interface receives between 16 and 32 (mod 512) frames
      and then there is a link flap, then the port will eventually die with no
      way to recover. If it receives less than 16 (mod 512) frames, then the
      initial NAPI poll [ before the link flap ] will not update the consumer
      index in hardware (it will remain zero) which will be ok when the buffers
      are later reinitialized. If more than 32 (mod 512) frames are received,
      the initial NAPI poll has the chance to refill the ring twice, updating
      the consumer index to at least 32. So after the link flap, the consumer
      index is still wrong, but the post-flap NAPI poll gets a chance to
      refill the ring once (because it passes through cleaned_cnt=15) and
      makes the consumer index be again back in sync with next_to_use.
      
      The solution to this problem is actually simple, we just need to write
      next_to_use into the hardware consumer index at enetc_open time, which
      always brings it back in sync after an initial buffer seeding process.
      
      The simpler thing would be to put the write to the consumer index into
      enetc_refill_rx_ring directly, but there are issues with the MDIO
      locking: in the NAPI poll code we have the enetc_lock_mdio() taken from
      top-level and we use the unlocked enetc_wr_reg_hot, whereas in
      enetc_open, the enetc_lock_mdio() is not taken at the top level, but
      instead by each individual enetc_wr_reg, so we are forced to put an
      additional enetc_wr_reg in enetc_setup_rxbdr. Better organization of
      the code is left as a refactoring exercise.
      
      Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a5d12c9
    • Vladimir Oltean's avatar
      net: enetc: remove bogus write to SIRXIDR from enetc_setup_rxbdr · 96a5223b
      Vladimir Oltean authored
      The Station Interface Receive Interrupt Detect Register (SIRXIDR)
      contains a 16-bit wide mask of 'interrupt detected' events for each ring
      associated with a port. Bit i is write-1-to-clean for RX ring i.
      
      I have no explanation whatsoever how this line of code came to be
      inserted in the blamed commit. I checked the downstream versions of that
      patch and none of them have it.
      
      The somewhat comical aspect of it is that we're writing a binary number
      to the SIRXIDR register, which is derived from enetc_bd_unused(rx_ring).
      Since the RX rings have 512 buffer descriptors, we end up writing 511 to
      this register, which is 0x1ff, so we are effectively clearing the
      'interrupt detected' event for rings 0-8.
      
      This register is not what is used for interrupt handling though - it
      only provides a summary for the entire SI. The hardware provides one
      separate Interrupt Detect Register per RX ring, which auto-clears upon
      read. So there doesn't seem to be any adverse effect caused by this
      bogus write.
      
      There is, however, one reason why this should be handled as a bugfix:
      next_to_clean _should_ be committed to hardware, just not to that
      register, and this was obscuring the fact that it wasn't. This is fixed
      in the next patch, and removing the bogus line now allows the fix patch
      to be backported beyond that point.
      
      Fixes: fd5736bf ("enetc: Workaround for MDIO register access issue")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96a5223b
    • Vladimir Oltean's avatar
      net: enetc: force the RGMII speed and duplex instead of operating in inband mode · c76a9721
      Vladimir Oltean authored
      The ENETC port 0 MAC supports in-band status signaling coming from a PHY
      when operating in RGMII mode, and this feature is enabled by default.
      
      It has been reported that RGMII is broken in fixed-link, and that is not
      surprising considering the fact that no PHY is attached to the MAC in
      that case, but a switch.
      
      This brings us to the topic of the patch: the enetc driver should have
      not enabled the optional in-band status signaling for RGMII unconditionally,
      but should have forced the speed and duplex to what was resolved by
      phylink.
      
      Note that phylink does not accept the RGMII modes as valid for in-band
      signaling, and these operate a bit differently than 1000base-x and SGMII
      (notably there is no clause 37 state machine so no ACK required from the
      MAC, instead the PHY sends extra code words on RXD[3:0] whenever it is
      not transmitting something else, so it should be safe to leave a PHY
      with this option unconditionally enabled even if we ignore it). The spec
      talks about this here:
      https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/138/RGMIIv1_5F00_3.pdf
      
      Fixes: 71b77a7a ("enetc: Migrate to PHYLINK and PCS_LYNX")
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Russell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c76a9721
    • Vladimir Oltean's avatar
      net: enetc: don't disable VLAN filtering in IFF_PROMISC mode · a74dbce9
      Vladimir Oltean authored
      Quoting from the blamed commit:
      
          In promiscuous mode, it is more intuitive that all traffic is received,
          including VLAN tagged traffic. It appears that it is necessary to set
          the flag in PSIPVMR for that to be the case, so VLAN promiscuous mode is
          also temporarily enabled. On exit from promiscuous mode, the setting
          made by ethtool is restored.
      
      Intuitive or not, there isn't any definition issued by a standards body
      which says that promiscuity has anything to do with VLAN filtering - it
      only has to do with accepting packets regardless of destination MAC address.
      
      In fact people are already trying to use this misunderstanding/bug of
      the enetc driver as a justification to transform promiscuity into
      something it never was about: accepting every packet (maybe that would
      be the "rx-all" netdev feature?):
      https://lore.kernel.org/netdev/20201110153958.ci5ekor3o2ekg3ky@ipetronik.com/
      
      This is relevant because there are use cases in the kernel (such as
      tc-flower rules with the protocol 802.1Q and a vlan_id key) which do not
      (yet) use the vlan_vid_add API to be compatible with VLAN-filtering NICs
      such as enetc, so for those, disabling rx-vlan-filter is currently the
      only right solution to make these setups work:
      https://lore.kernel.org/netdev/CA+h21hoxwRdhq4y+w8Kwgm74d4cA0xLeiHTrmT-VpSaM7obhkg@mail.gmail.com/
      The blamed patch has unintentionally introduced one more way for this to
      work, which is to enable IFF_PROMISC, however this is non-portable
      because port promiscuity is not meant to disable VLAN filtering.
      Therefore, it could invite people to write broken scripts for enetc, and
      then wonder why they are broken when migrating to other drivers that
      don't handle promiscuity in the same way.
      
      Fixes: 7070eea5 ("enetc: permit configuration of rx-vlan-filter with ethtool")
      Cc: Markus Blöchl <Markus.Bloechl@ipetronik.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a74dbce9
    • Vladimir Oltean's avatar
      net: enetc: fix incorrect TPID when receiving 802.1ad tagged packets · 827b6fd0
      Vladimir Oltean authored
      When the enetc ports have rx-vlan-offload enabled, they report a TPID of
      ETH_P_8021Q regardless of what was actually in the packet. When
      rx-vlan-offload is disabled, packets have the proper TPID. Fix this
      inconsistency by finishing the TODO left in the code.
      
      Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      827b6fd0
    • Vladimir Oltean's avatar
      net: enetc: take the MDIO lock only once per NAPI poll cycle · 6d36ecdb
      Vladimir Oltean authored
      The workaround for the ENETC MDIO erratum caused a performance
      degradation of 82 Kpps (seen with IP forwarding of two 1Gbps streams of
      64B packets). This is due to excessive locking and unlocking in the fast
      path, which can be avoided.
      
      By taking the MDIO read-side lock only once per NAPI poll cycle, we are
      able to regain 54 Kpps (65%) of the performance hit. The rest of the
      performance degradation comes from the TX data path, but unfortunately
      it doesn't look like we can optimize that away easily, even with
      netdev_xmit_more(), there just isn't any skb batching done, to help with
      taking the MDIO lock less often than once per packet.
      
      We need to change the register accessor type for enetc_get_tx_tstamp,
      because it now runs under the enetc_lock_mdio as per the new call path
      detailed below:
      
      enetc_msix
      -> napi_schedule
         -> enetc_poll
            -> enetc_lock_mdio
            -> enetc_clean_tx_ring
               -> enetc_get_tx_tstamp
            -> enetc_clean_rx_ring
            -> enetc_unlock_mdio
      
      Fixes: fd5736bf ("enetc: Workaround for MDIO register access issue")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d36ecdb
    • Vladimir Oltean's avatar
      net: enetc: initialize RFS/RSS memories for unused ports too · 3222b5b6
      Vladimir Oltean authored
      Michael reports that since linux-next-20210211, the AER messages for ECC
      errors have started reappearing, and this time they can be reliably
      reproduced with the first ping on one of his LS1028A boards.
      
      $ ping 1[   33.258069] pcieport 0000:00:1f.0: AER: Multiple Corrected error received: 0000:00:00.0
      72.16.0.1
      PING [   33.267050] pcieport 0000:00:1f.0: AER: can't find device of ID0000
      172.16.0.1 (172.16.0.1): 56 data bytes
      64 bytes from 172.16.0.1: seq=0 ttl=64 time=17.124 ms
      64 bytes from 172.16.0.1: seq=1 ttl=64 time=0.273 ms
      
      $ devmem 0x1f8010e10 32
      0xC0000006
      
      It isn't clear why this is necessary, but it seems that for the errors
      to go away, we must clear the entire RFS and RSS memory, not just for
      the ports in use.
      
      Sadly the code is structured in such a way that we can't have unified
      logic for the used and unused ports. For the minimal initialization of
      an unused port, we need just to enable and ioremap the PF memory space,
      and a control buffer descriptor ring. Unused ports must then free the
      CBDR because the driver will exit, but used ports can not pick up from
      where that code path left, since the CBDR API does not reinitialize a
      ring when setting it up, so its producer and consumer indices are out of
      sync between the software and hardware state. So a separate
      enetc_init_unused_port function was created, and it gets called right
      after the PF memory space is enabled.
      
      Fixes: 07bf34a5 ("net: enetc: initialize the RFS and RSS memories")
      Reported-by: default avatarMichael Walle <michael@walle.cc>
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarMichael Walle <michael@walle.cc>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3222b5b6
    • Vladimir Oltean's avatar
      net: enetc: don't overwrite the RSS indirection table when initializing · c646d10d
      Vladimir Oltean authored
      After the blamed patch, all RX traffic gets hashed to CPU 0 because the
      hashing indirection table set up in:
      
      enetc_pf_probe
      -> enetc_alloc_si_resources
         -> enetc_configure_si
            -> enetc_setup_default_rss_table
      
      is overwritten later in:
      
      enetc_pf_probe
      -> enetc_init_port_rss_memory
      
      which zero-initializes the entire port RSS table in order to avoid ECC errors.
      
      The trouble really is that enetc_init_port_rss_memory really neads
      enetc_alloc_si_resources to be called, because it depends upon
      enetc_alloc_cbdr and enetc_setup_cbdr. But that whole enetc_configure_si
      thing could have been better thought out, it has nothing to do in a
      function called "alloc_si_resources", especially since its counterpart,
      "free_si_resources", does nothing to unwind the configuration of the SI.
      
      The point is, we need to pull out enetc_configure_si out of
      enetc_alloc_resources, and move it after enetc_init_port_rss_memory.
      This allows us to set up the default RSS indirection table after
      initializing the memory.
      
      Fixes: 07bf34a5 ("net: enetc: initialize the RFS and RSS memories")
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c646d10d
    • Yejune Deng's avatar
      inetpeer: use div64_ul() and clamp_val() calculate inet_peer_threshold · 8bd2a055
      Yejune Deng authored
      In inet_initpeers(), struct inet_peer on IA32 uses 128 bytes in nowdays.
      Get rid of the cascade and use div64_ul() and clamp_val() calculate that
      will not need to be adjusted in the future as suggested by Eric Dumazet.
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarYejune Deng <yejune.deng@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8bd2a055