1. 27 Aug, 2020 1 commit
  2. 26 Aug, 2020 20 commits
    • David S. Miller's avatar
      Merge branch 'net-fix-netpoll-crash-with-bnxt' · 5875568a
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      net: fix netpoll crash with bnxt
      
      Rob run into crashes when using XDP on bnxt. Upon investigation
      it turns out that during driver reconfig irq core produces
      a warning message when IRQs are requested. This triggers netpoll,
      which in turn accesses uninitialized driver state. Same crash can
      also be triggered on this platform by changing the number of rings.
      
      Looks like we have two missing pieces here, netif_napi_add() has
      to make sure we start out with netpoll blocked. The driver also
      has to be more careful about when napi gets enabled.
      
      Tested XDP and channel count changes, the warning message no longer
      causes a crash. Not sure if the memory barriers added in patch 1
      are necessary, but it seems we should have them.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5875568a
    • Jakub Kicinski's avatar
      bnxt: don't enable NAPI until rings are ready · 96ecdcc9
      Jakub Kicinski authored
      Netpoll can try to poll napi as soon as napi_enable() is called.
      It crashes trying to access a doorbell which is still NULL:
      
       BUG: kernel NULL pointer dereference, address: 0000000000000000
       CPU: 59 PID: 6039 Comm: ethtool Kdump: loaded Tainted: G S                5.9.0-rc1-00469-g5fd99b5d-dirty #26
       RIP: 0010:bnxt_poll+0x121/0x1c0
       Code: c4 20 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 41 8b 86 a0 01 00 00 41 23 85 18 01 00 00 49 8b 96 a8 01 00 00 0d 00 00 00 24 <89> 02
      41 f6 45 77 02 74 cb 49 8b ae d8 01 00 00 31 c0 c7 44 24 1a
        netpoll_poll_dev+0xbd/0x1a0
        __netpoll_send_skb+0x1b2/0x210
        netpoll_send_udp+0x2c9/0x406
        write_ext_msg+0x1d7/0x1f0
        console_unlock+0x23c/0x520
        vprintk_emit+0xe0/0x1d0
        printk+0x58/0x6f
        x86_vector_activate.cold+0xf/0x46
        __irq_domain_activate_irq+0x50/0x80
        __irq_domain_activate_irq+0x32/0x80
        __irq_domain_activate_irq+0x32/0x80
        irq_domain_activate_irq+0x25/0x40
        __setup_irq+0x2d2/0x700
        request_threaded_irq+0xfb/0x160
        __bnxt_open_nic+0x3b1/0x750
        bnxt_open_nic+0x19/0x30
        ethtool_set_channels+0x1ac/0x220
        dev_ethtool+0x11ba/0x2240
        dev_ioctl+0x1cf/0x390
        sock_do_ioctl+0x95/0x130
      Reported-by: default avatarRob Sherwood <rsher@fb.com>
      Fixes: c0c050c5 ("bnxt_en: New Broadcom ethernet driver.")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96ecdcc9
    • Jakub Kicinski's avatar
      net: disable netpoll on fresh napis · 96e97bc0
      Jakub Kicinski authored
      napi_disable() makes sure to set the NAPI_STATE_NPSVC bit to prevent
      netpoll from accessing rings before init is complete. However, the
      same is not done for fresh napi instances in netif_napi_add(),
      even though we expect NAPI instances to be added as disabled.
      
      This causes crashes during driver reconfiguration (enabling XDP,
      changing the channel count) - if there is any printk() after
      netif_napi_add() but before napi_enable().
      
      To ensure memory ordering is correct we need to use RCU accessors.
      Reported-by: default avatarRob Sherwood <rsher@fb.com>
      Fixes: 2d8bff12 ("netpoll: Close race condition between poll_one_napi and napi_disable")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96e97bc0
    • Ido Schimmel's avatar
      ipv4: Silence suspicious RCU usage warning · 7f6f32bb
      Ido Schimmel authored
      fib_info_notify_update() is always called with RTNL held, but not from
      an RCU read-side critical section. This leads to the following warning
      [1] when the FIB table list is traversed with
      hlist_for_each_entry_rcu(), but without a proper lockdep expression.
      
      Since modification of the list is protected by RTNL, silence the warning
      by adding a lockdep expression which verifies RTNL is held.
      
      [1]
       =============================
       WARNING: suspicious RCU usage
       5.9.0-rc1-custom-14233-g2f26e122d62f #129 Not tainted
       -----------------------------
       net/ipv4/fib_trie.c:2124 RCU-list traversed in non-reader section!!
      
       other info that might help us debug this:
      
       rcu_scheduler_active = 2, debug_locks = 1
       1 lock held by ip/834:
        #0: ffffffff85a3b6b0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x49a/0xbd0
      
       stack backtrace:
       CPU: 0 PID: 834 Comm: ip Not tainted 5.9.0-rc1-custom-14233-g2f26e122d62f #129
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
       Call Trace:
        dump_stack+0x100/0x184
        lockdep_rcu_suspicious+0x143/0x14d
        fib_info_notify_update+0x8d1/0xa60
        __nexthop_replace_notify+0xd2/0x290
        rtm_new_nexthop+0x35e2/0x5946
        rtnetlink_rcv_msg+0x4f7/0xbd0
        netlink_rcv_skb+0x17a/0x480
        rtnetlink_rcv+0x22/0x30
        netlink_unicast+0x5ae/0x890
        netlink_sendmsg+0x98a/0xf40
        ____sys_sendmsg+0x879/0xa00
        ___sys_sendmsg+0x122/0x190
        __sys_sendmsg+0x103/0x1d0
        __x64_sys_sendmsg+0x7d/0xb0
        do_syscall_64+0x32/0x50
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7fde28c3be57
       Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51
      c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
      RSP: 002b:00007ffc09330028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fde28c3be57
      RDX: 0000000000000000 RSI: 00007ffc09330090 RDI: 0000000000000003
      RBP: 000000005f45f911 R08: 0000000000000001 R09: 00007ffc0933012c
      R10: 0000000000000076 R11: 0000000000000246 R12: 0000000000000001
      R13: 00007ffc09330290 R14: 00007ffc09330eee R15: 00005610e48ed020
      
      Fixes: 1bff1a0c ("ipv4: Add function to send route updates")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f6f32bb
    • Xie He's avatar
      drivers/net/wan/lapbether: Set network_header before transmitting · 91244d10
      Xie He authored
      Set the skb's network_header before it is passed to the underlying
      Ethernet device for transmission.
      
      This patch fixes the following issue:
      
      When we use this driver with AF_PACKET sockets, there would be error
      messages of:
         protocol 0805 is buggy, dev (Ethernet interface name)
      printed in the system "dmesg" log.
      
      This is because skbs passed down to the Ethernet device for transmission
      don't have their network_header properly set, and the dev_queue_xmit_nit
      function in net/core/dev.c complains about this.
      
      Reason of setting the network_header to this place (at the end of the
      Ethernet header, and at the beginning of the Ethernet payload):
      
      Because when this driver receives an skb from the Ethernet device, the
      network_header is also set at this place.
      
      Cc: Martin Schiller <ms@dev.tdt.de>
      Signed-off-by: default avatarXie He <xie.he.0141@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91244d10
    • Florian Westphal's avatar
      mptcp: free acked data before waiting for more memory · 1cec170d
      Florian Westphal authored
      After subflow lock is dropped, more wmem might have been made available.
      
      This fixes a deadlock in mptcp_connect.sh 'mmap' mode: wmem is exhausted.
      But as the mptcp socket holds on to already-acked data (for retransmit)
      no wakeup will occur.
      
      Using 'goto restart' calls mptcp_clean_una(sk) which will free pages
      that have been acked completely in the mean time.
      
      Fixes: fb529e62 ("mptcp: break and restart in case mptcp sndbuf is full")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cec170d
    • Vinicius Costa Gomes's avatar
      taprio: Fix using wrong queues in gate mask · 09e31cf0
      Vinicius Costa Gomes authored
      Since commit 9c66d156 ("taprio: Add support for hardware
      offloading") there's a bit of inconsistency when offloading schedules
      to the hardware:
      
      In software mode, the gate masks are specified in terms of traffic
      classes, so if say "sched-entry S 03 20000", it means that the traffic
      classes 0 and 1 are open for 20us; when taprio is offloaded to
      hardware, the gate masks are specified in terms of hardware queues.
      
      The idea here is to fix hardware offloading, so schedules in hardware
      and software mode have the same behavior. What's needed to do is to
      map traffic classes to queues when applying the offload to the driver.
      
      Fixes: 9c66d156 ("taprio: Add support for hardware offloading")
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09e31cf0
    • YueHaibing's avatar
      net: cdc_ncm: Fix build error · 5fd99b5d
      YueHaibing authored
      If USB_NET_CDC_NCM is y and USB_NET_CDCETHER is m, build fails:
      
      drivers/net/usb/cdc_ncm.o:(.rodata+0x1d8): undefined reference to `usbnet_cdc_update_filter'
      
      Select USB_NET_CDCETHER for USB_NET_CDC_NCM to fix this.
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Fixes: e10dcb1b ("net: cdc_ncm: hook into set_rx_mode to admit multicast traffic")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fd99b5d
    • Yi Li's avatar
      net: hns3: Fix for geneve tx checksum bug · a156998f
      Yi Li authored
      when skb->encapsulation is 0, skb->ip_summed is CHECKSUM_PARTIAL
      and it is udp packet, which has a dest port as the IANA assigned.
      the hardware is expected to do the checksum offload, but the
      hardware will not do the checksum offload when udp dest port is
      6081.
      
      This patch fixes it by doing the checksum in software.
      Reported-by: default avatarLi Bing <libing@winhong.com>
      Signed-off-by: default avatarYi Li <yili@winhong.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a156998f
    • David S. Miller's avatar
      Merge branch 'bnxt_en-Bug-fixes' · 0a3445b8
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Bug fixes.
      
      This set of driver patches include bug fixes for ethtool get channels,
      ethtool statistics, ethtool NVRAM, AER recovery, a firmware reset issue
      that could potentially crash, hwmon temperature reporting issue on VF,
      and 2 fixes for regressions introduced by the recent user-defined RSS
      map feature.
      
      Please queue patches 1 to 6 for -stable.  Thanks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a3445b8
    • Michael Chan's avatar
      bnxt_en: Setup default RSS map in all scenarios. · b43b9f53
      Michael Chan authored
      The recent changes to support user-defined RSS map assume that RX
      rings are always reserved and the default RSS map is set after the
      RX rings are successfully reserved.  If the firmware spec is older
      than 1.6.1, no ring reservations are required and the default RSS
      map is not setup at all.  In another scenario where the fw Resource
      Manager is older, RX rings are not reserved and we also end up with
      no valid RSS map.
      
      Fix both issues in bnxt_need_reserve_rings().  In both scenarios
      described above, we don't need to reserve RX rings so we need to
      call this new function bnxt_check_rss_map_no_rmgr() to setup the
      default RSS map when needed.
      
      Without valid RSS map, the NIC won't receive packets properly.
      
      Fixes: 1667cbf6 ("bnxt_en: Add logical RSS indirection table structure.")
      Reviewed-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Reviewed-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b43b9f53
    • Edwin Peer's avatar
      bnxt_en: init RSS table for Minimal-Static VF reservation · 5fa65524
      Edwin Peer authored
      There are no VF rings available during probe when the device is configured
      using the Minimal-Static reservation strategy. In this case, the RSS
      indirection table can only be initialized later, during bnxt_open_nic().
      However, this was not happening because the rings will already have been
      reserved via bnxt_init_dflt_ring_mode(), causing bnxt_need_reserve_rings()
      to return false in bnxt_reserve_rings() and bypass the RSS table init.
      
      Solve this by pushing the call to bnxt_set_dflt_rss_indir_tbl() into
      __bnxt_reserve_rings(), which is common to both paths and is called
      whenever ring configuration is changed. After doing this, the RSS table
      init that must be called from bnxt_init_one() happens implicitly via
      bnxt_set_default_rings(), necessitating doing the allocation earlier in
      order to avoid a null pointer dereference.
      
      Fixes: bd3191b5 ("bnxt_en: Implement ethtool -X to set indirection table.")
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fa65524
    • Edwin Peer's avatar
      bnxt_en: fix HWRM error when querying VF temperature · 12cce90b
      Edwin Peer authored
      Firmware returns RESOURCE_ACCESS_DENIED for HWRM_TEMP_MONITORY_QUERY for
      VFs. This produces unpleasing error messages in the log when temp1_input
      is queried via the hwmon sysfs interface from a VF.
      
      The error is harmless and expected, so silence it and return unknown as
      the value. Since the device temperature is not particularly sensitive
      information, provide flexibility to change this policy in future by
      silencing the error rather than avoiding the HWRM call entirely for VFs.
      
      Fixes: cde49a42 ("bnxt_en: Add hwmon sysfs support to read temperature")
      Cc: Marc Smith <msmith626@gmail.com>
      Reported-by: default avatarMarc Smith <msmith626@gmail.com>
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      12cce90b
    • Michael Chan's avatar
      bnxt_en: Fix possible crash in bnxt_fw_reset_task(). · b148bb23
      Michael Chan authored
      bnxt_fw_reset_task() is run from a delayed workqueue.  The current
      code is not cancelling the workqueue in the driver's .remove()
      method and it can potentially crash if the device is removed with
      the workqueue still pending.
      
      The fix is to clear the BNXT_STATE_IN_FW_RESET flag and then cancel
      the delayed workqueue in bnxt_remove_one().  bnxt_queue_fw_reset_work()
      also needs to check that this flag is set before scheduling.  This
      will guarantee that no rescheduling will be done after it is cancelled.
      
      Fixes: 230d1f0d ("bnxt_en: Handle firmware reset.")
      Reviewed-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b148bb23
    • Vasundhara Volam's avatar
      bnxt_en: Fix PCI AER error recovery flow · df3875ec
      Vasundhara Volam authored
      When a PCI error is detected the PCI state could be corrupt, save
      the PCI state after initialization and restore it after the slot
      reset.
      
      Fixes: 6316ea6d ("bnxt_en: Enable AER support.")
      Signed-off-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df3875ec
    • Michael Chan's avatar
      bnxt_en: Fix ethtool -S statitics with XDP or TCs enabled. · 7de65149
      Michael Chan authored
      We are returning the wrong count for ETH_SS_STATS in get_sset_count()
      when XDP or TCs are enabled.  In a recent commit, we got rid of
      irrelevant counters when the ring is RX only or TX only, but we
      did not make the proper adjustments for the count.  As a result,
      when we have XDP or TCs enabled, we are returning an excess count
      because some of the rings are TX only.  This causes ethtool -S to
      display extra counters with no counter names.
      
      Fix bnxt_get_num_ring_stats() by not assuming that all rings will
      always have RX and TX counters in combined mode.
      
      Fixes: 125592fb ("bnxt_en: show only relevant ethtool stats for a TX or RX ring")
      Reviewed-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7de65149
    • Vasundhara Volam's avatar
      bnxt_en: Check for zero dir entries in NVRAM. · dbbfa96a
      Vasundhara Volam authored
      If firmware goes into unstable state, HWRM_NVM_GET_DIR_INFO firmware
      command may return zero dir entries. Return error in such case to
      avoid zero length dma buffer request.
      
      Fixes: c0c050c5 ("bnxt_en: New Broadcom ethernet driver.")
      Signed-off-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbbfa96a
    • Pavan Chebbi's avatar
      bnxt_en: Don't query FW when netif_running() is false. · c1c2d774
      Pavan Chebbi authored
      In rare conditions like two stage OS installation, the
      ethtool's get_channels function may be called when the
      device is in D3 state, leading to uncorrectable PCI error.
      Check netif_running() first before making any query to FW
      which involves writing to BAR.
      
      Fixes: db4723b3 ("bnxt_en: Check max_tx_scheduler_inputs value from firmware.")
      Signed-off-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1c2d774
    • Daniel Gorsulowski's avatar
      net: dp83869: Fix RGMII internal delay configuration · 2e1ec861
      Daniel Gorsulowski authored
      The RGMII control register at 0x32 indicates the states for the bits
      RGMII_TX_CLK_DELAY and RGMII_RX_CLK_DELAY as follows:
      
        RGMII Transmit/Receive Clock Delay
          0x0 = RGMII transmit clock is shifted with respect to transmit/receive data.
          0x1 = RGMII transmit clock is aligned with respect to transmit/receive data.
      
      This commit fixes the inversed behavior of these bits
      
      Fixes: 736b25af ("net: dp83869: Add RGMII internal delay configuration")
      Signed-off-by: default avatarDaniel Gorsulowski <daniel.gorsulowski@esd.eu>
      Acked-by: default avatarDan Murphy <dmurphy@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e1ec861
    • Mingming Cao's avatar
      ibmvnic fix NULL tx_pools and rx_tools issue at do_reset · 9f134573
      Mingming Cao authored
      At the time of do_rest, ibmvnic tries to re-initalize the tx_pools
      and rx_pools to avoid re-allocating the long term buffer. However
      there is a window inside do_reset that the tx_pools and
      rx_pools were freed before re-initialized making it possible to deference
      null pointers.
      
      This patch fix this issue by always check the tx_pool
      and rx_pool are not NULL after ibmvnic_login. If so, re-allocating
      the pools. This will avoid getting into calling reset_tx/rx_pools with
      NULL adapter tx_pools/rx_pools pointer. Also add null pointer check in
      reset_tx_pools and reset_rx_pools to safe handle NULL pointer case.
      Signed-off-by: default avatarMingming Cao <mmc@linux.vnet.ibm.com>
      Signed-off-by: default avatarDany Madden <drt@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9f134573
  3. 25 Aug, 2020 12 commits
  4. 24 Aug, 2020 7 commits