1. 15 Sep, 2020 2 commits
  2. 11 Sep, 2020 1 commit
  3. 08 Sep, 2020 2 commits
    • Tony Ambardar's avatar
      tools/libbpf: Avoid counting local symbols in ABI check · 746f534a
      Tony Ambardar authored
      Encountered the following failure building libbpf from kernel 5.8.5 sources
      with GCC 8.4.0 and binutils 2.34: (long paths shortened)
      
        Warning: Num of global symbols in sharedobjs/libbpf-in.o (234) does NOT
        match with num of versioned symbols in libbpf.so (236). Please make sure
        all LIBBPF_API symbols are versioned in libbpf.map.
        --- libbpf_global_syms.tmp    2020-09-02 07:30:58.920084380 +0000
        +++ libbpf_versioned_syms.tmp 2020-09-02 07:30:58.924084388 +0000
        @@ -1,3 +1,5 @@
        +_fini
        +_init
         bpf_btf_get_fd_by_id
         bpf_btf_get_next_id
         bpf_create_map
        make[4]: *** [Makefile:210: check_abi] Error 1
      
      Investigation shows _fini and _init are actually local symbols counted
      amongst global ones:
      
        $ readelf --dyn-syms --wide libbpf.so|head -10
      
        Symbol table '.dynsym' contains 343 entries:
           Num:    Value  Size Type    Bind   Vis      Ndx Name
             0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
             1: 00004098     0 SECTION LOCAL  DEFAULT   11
             2: 00004098     8 FUNC    LOCAL  DEFAULT   11 _init@@LIBBPF_0.0.1
             3: 00023040     8 FUNC    LOCAL  DEFAULT   14 _fini@@LIBBPF_0.0.1
             4: 00000000     0 OBJECT  GLOBAL DEFAULT  ABS LIBBPF_0.0.4
             5: 00000000     0 OBJECT  GLOBAL DEFAULT  ABS LIBBPF_0.0.1
             6: 0000ffa4     8 FUNC    GLOBAL DEFAULT   12 bpf_object__find_map_by_offset@@LIBBPF_0.0.1
      
      A previous commit filtered global symbols in sharedobjs/libbpf-in.o. Do the
      same with the libbpf.so DSO for consistent comparison.
      
      Fixes: 306b267c ("libbpf: Verify versioned symbols")
      Signed-off-by: default avatarTony Ambardar <Tony.Ambardar@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200905214831.1565465-1-Tony.Ambardar@gmail.com
      746f534a
    • Daniel Borkmann's avatar
      bpf: Fix clobbering of r2 in bpf_gen_ld_abs · e6a18d36
      Daniel Borkmann authored
      Bryce reported that he saw the following with:
      
        0:  r6 = r1
        1:  r1 = 12
        2:  r0 = *(u16 *)skb[r1]
      
      The xlated sequence was incorrectly clobbering r2 with pointer
      value of r6 ...
      
        0: (bf) r6 = r1
        1: (b7) r1 = 12
        2: (bf) r1 = r6
        3: (bf) r2 = r1
        4: (85) call bpf_skb_load_helper_16_no_cache#7692160
      
      ... and hence call to the load helper never succeeded given the
      offset was too high. Fix it by reordering the load of r6 to r1.
      
      Other than that the insn has similar calling convention than BPF
      helpers, that is, r0 - r5 are scratch regs, so nothing else
      affected after the insn.
      
      Fixes: e0cea7ce ("bpf: implement ld_abs/ld_ind in native bpf")
      Reported-by: default avatarBryce Kahle <bryce.kahle@datadoghq.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/cace836e4d07bb63b1a53e49c5dfb238a040c298.1599512096.git.daniel@iogearbox.net
      e6a18d36
  4. 04 Sep, 2020 3 commits
    • Alexei Starovoitov's avatar
      Merge branch 'hashmap_iter_bucket_lock_fix' · e6135df4
      Alexei Starovoitov authored
      Yonghong Song says:
      
      ====================
      Currently, the bpf hashmap iterator takes a bucket_lock, a spin_lock,
      before visiting each element in the bucket. This will cause a deadlock
      if a map update/delete operates on an element with the same
      bucket id of the visited map.
      
      To avoid the deadlock, let us just use rcu_read_lock instead of
      bucket_lock. This may result in visiting stale elements, missing some elements,
      or repeating some elements, if concurrent map delete/update happens for the
      same map. I think using rcu_read_lock is a reasonable compromise.
      For users caring stale/missing/repeating element issues, bpf map batch
      access syscall interface can be used.
      
      Note that another approach is during bpf_iter link stage, we check
      whether the iter program might be able to do update/delete to the visited
      map. If it is, reject the link_create. Verifier needs to record whether
      an update/delete operation happens for each map for this approach.
      I just feel this checking is too specialized, hence still prefer
      rcu_read_lock approach.
      
      Patch #1 has the kernel implementation and Patch #2 added a selftest
      which can trigger deadlock without Patch #1.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e6135df4
    • Yonghong Song's avatar
      selftests/bpf: Add bpf_{update, delete}_map_elem in hashmap iter program · 4daab713
      Yonghong Song authored
      Added bpf_{updata,delete}_map_elem to the very map element the
      iter program is visiting. Due to rcu protection, the visited map
      elements, although stale, should still contain correct values.
        $ ./test_progs -n 4/18
        #4/18 bpf_hash_map:OK
        #4 bpf_iter:OK
        Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200902235341.2001534-1-yhs@fb.com
      4daab713
    • Yonghong Song's avatar
      bpf: Do not use bucket_lock for hashmap iterator · dc0988bb
      Yonghong Song authored
      Currently, for hashmap, the bpf iterator will grab a bucket lock, a
      spinlock, before traversing the elements in the bucket. This can ensure
      all bpf visted elements are valid. But this mechanism may cause
      deadlock if update/deletion happens to the same bucket of the
      visited map in the program. For example, if we added bpf_map_update_elem()
      call to the same visited element in selftests bpf_iter_bpf_hash_map.c,
      we will have the following deadlock:
      
        ============================================
        WARNING: possible recursive locking detected
        5.9.0-rc1+ #841 Not tainted
        --------------------------------------------
        test_progs/1750 is trying to acquire lock:
        ffff9a5bb73c5e70 (&htab->buckets[i].raw_lock){....}-{2:2}, at: htab_map_update_elem+0x1cf/0x410
      
        but task is already holding lock:
        ffff9a5bb73c5e20 (&htab->buckets[i].raw_lock){....}-{2:2}, at: bpf_hash_map_seq_find_next+0x94/0x120
      
        other info that might help us debug this:
         Possible unsafe locking scenario:
      
               CPU0
               ----
          lock(&htab->buckets[i].raw_lock);
          lock(&htab->buckets[i].raw_lock);
      
         *** DEADLOCK ***
         ...
        Call Trace:
         dump_stack+0x78/0xa0
         __lock_acquire.cold.74+0x209/0x2e3
         lock_acquire+0xba/0x380
         ? htab_map_update_elem+0x1cf/0x410
         ? __lock_acquire+0x639/0x20c0
         _raw_spin_lock_irqsave+0x3b/0x80
         ? htab_map_update_elem+0x1cf/0x410
         htab_map_update_elem+0x1cf/0x410
         ? lock_acquire+0xba/0x380
         bpf_prog_ad6dab10433b135d_dump_bpf_hash_map+0x88/0xa9c
         ? find_held_lock+0x34/0xa0
         bpf_iter_run_prog+0x81/0x16e
         __bpf_hash_map_seq_show+0x145/0x180
         bpf_seq_read+0xff/0x3d0
         vfs_read+0xad/0x1c0
         ksys_read+0x5f/0xe0
         do_syscall_64+0x33/0x40
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        ...
      
      The bucket_lock first grabbed in seq_ops->next() called by bpf_seq_read(),
      and then grabbed again in htab_map_update_elem() in the bpf program, causing
      deadlocks.
      
      Actually, we do not need bucket_lock here, we can just use rcu_read_lock()
      similar to netlink iterator where the rcu_read_{lock,unlock} likes below:
       seq_ops->start():
           rcu_read_lock();
       seq_ops->next():
           rcu_read_unlock();
           /* next element */
           rcu_read_lock();
       seq_ops->stop();
           rcu_read_unlock();
      
      Compared to old bucket_lock mechanism, if concurrent updata/delete happens,
      we may visit stale elements, miss some elements, or repeat some elements.
      I think this is a reasonable compromise. For users wanting to avoid
      stale, missing/repeated accesses, bpf_map batch access syscall interface
      can be used.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200902235340.2001375-1-yhs@fb.com
      dc0988bb
  5. 03 Sep, 2020 1 commit
  6. 01 Sep, 2020 1 commit
  7. 31 Aug, 2020 1 commit
  8. 28 Aug, 2020 6 commits
  9. 27 Aug, 2020 12 commits
  10. 26 Aug, 2020 11 commits
    • David S. Miller's avatar
      Merge branch 'net-fix-netpoll-crash-with-bnxt' · 5875568a
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      net: fix netpoll crash with bnxt
      
      Rob run into crashes when using XDP on bnxt. Upon investigation
      it turns out that during driver reconfig irq core produces
      a warning message when IRQs are requested. This triggers netpoll,
      which in turn accesses uninitialized driver state. Same crash can
      also be triggered on this platform by changing the number of rings.
      
      Looks like we have two missing pieces here, netif_napi_add() has
      to make sure we start out with netpoll blocked. The driver also
      has to be more careful about when napi gets enabled.
      
      Tested XDP and channel count changes, the warning message no longer
      causes a crash. Not sure if the memory barriers added in patch 1
      are necessary, but it seems we should have them.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5875568a
    • Jakub Kicinski's avatar
      bnxt: don't enable NAPI until rings are ready · 96ecdcc9
      Jakub Kicinski authored
      Netpoll can try to poll napi as soon as napi_enable() is called.
      It crashes trying to access a doorbell which is still NULL:
      
       BUG: kernel NULL pointer dereference, address: 0000000000000000
       CPU: 59 PID: 6039 Comm: ethtool Kdump: loaded Tainted: G S                5.9.0-rc1-00469-g5fd99b5d-dirty #26
       RIP: 0010:bnxt_poll+0x121/0x1c0
       Code: c4 20 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 41 8b 86 a0 01 00 00 41 23 85 18 01 00 00 49 8b 96 a8 01 00 00 0d 00 00 00 24 <89> 02
      41 f6 45 77 02 74 cb 49 8b ae d8 01 00 00 31 c0 c7 44 24 1a
        netpoll_poll_dev+0xbd/0x1a0
        __netpoll_send_skb+0x1b2/0x210
        netpoll_send_udp+0x2c9/0x406
        write_ext_msg+0x1d7/0x1f0
        console_unlock+0x23c/0x520
        vprintk_emit+0xe0/0x1d0
        printk+0x58/0x6f
        x86_vector_activate.cold+0xf/0x46
        __irq_domain_activate_irq+0x50/0x80
        __irq_domain_activate_irq+0x32/0x80
        __irq_domain_activate_irq+0x32/0x80
        irq_domain_activate_irq+0x25/0x40
        __setup_irq+0x2d2/0x700
        request_threaded_irq+0xfb/0x160
        __bnxt_open_nic+0x3b1/0x750
        bnxt_open_nic+0x19/0x30
        ethtool_set_channels+0x1ac/0x220
        dev_ethtool+0x11ba/0x2240
        dev_ioctl+0x1cf/0x390
        sock_do_ioctl+0x95/0x130
      Reported-by: default avatarRob Sherwood <rsher@fb.com>
      Fixes: c0c050c5 ("bnxt_en: New Broadcom ethernet driver.")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96ecdcc9
    • Jakub Kicinski's avatar
      net: disable netpoll on fresh napis · 96e97bc0
      Jakub Kicinski authored
      napi_disable() makes sure to set the NAPI_STATE_NPSVC bit to prevent
      netpoll from accessing rings before init is complete. However, the
      same is not done for fresh napi instances in netif_napi_add(),
      even though we expect NAPI instances to be added as disabled.
      
      This causes crashes during driver reconfiguration (enabling XDP,
      changing the channel count) - if there is any printk() after
      netif_napi_add() but before napi_enable().
      
      To ensure memory ordering is correct we need to use RCU accessors.
      Reported-by: default avatarRob Sherwood <rsher@fb.com>
      Fixes: 2d8bff12 ("netpoll: Close race condition between poll_one_napi and napi_disable")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96e97bc0
    • Ido Schimmel's avatar
      ipv4: Silence suspicious RCU usage warning · 7f6f32bb
      Ido Schimmel authored
      fib_info_notify_update() is always called with RTNL held, but not from
      an RCU read-side critical section. This leads to the following warning
      [1] when the FIB table list is traversed with
      hlist_for_each_entry_rcu(), but without a proper lockdep expression.
      
      Since modification of the list is protected by RTNL, silence the warning
      by adding a lockdep expression which verifies RTNL is held.
      
      [1]
       =============================
       WARNING: suspicious RCU usage
       5.9.0-rc1-custom-14233-g2f26e122d62f #129 Not tainted
       -----------------------------
       net/ipv4/fib_trie.c:2124 RCU-list traversed in non-reader section!!
      
       other info that might help us debug this:
      
       rcu_scheduler_active = 2, debug_locks = 1
       1 lock held by ip/834:
        #0: ffffffff85a3b6b0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x49a/0xbd0
      
       stack backtrace:
       CPU: 0 PID: 834 Comm: ip Not tainted 5.9.0-rc1-custom-14233-g2f26e122d62f #129
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
       Call Trace:
        dump_stack+0x100/0x184
        lockdep_rcu_suspicious+0x143/0x14d
        fib_info_notify_update+0x8d1/0xa60
        __nexthop_replace_notify+0xd2/0x290
        rtm_new_nexthop+0x35e2/0x5946
        rtnetlink_rcv_msg+0x4f7/0xbd0
        netlink_rcv_skb+0x17a/0x480
        rtnetlink_rcv+0x22/0x30
        netlink_unicast+0x5ae/0x890
        netlink_sendmsg+0x98a/0xf40
        ____sys_sendmsg+0x879/0xa00
        ___sys_sendmsg+0x122/0x190
        __sys_sendmsg+0x103/0x1d0
        __x64_sys_sendmsg+0x7d/0xb0
        do_syscall_64+0x32/0x50
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7fde28c3be57
       Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51
      c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
      RSP: 002b:00007ffc09330028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fde28c3be57
      RDX: 0000000000000000 RSI: 00007ffc09330090 RDI: 0000000000000003
      RBP: 000000005f45f911 R08: 0000000000000001 R09: 00007ffc0933012c
      R10: 0000000000000076 R11: 0000000000000246 R12: 0000000000000001
      R13: 00007ffc09330290 R14: 00007ffc09330eee R15: 00005610e48ed020
      
      Fixes: 1bff1a0c ("ipv4: Add function to send route updates")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f6f32bb
    • Xie He's avatar
      drivers/net/wan/lapbether: Set network_header before transmitting · 91244d10
      Xie He authored
      Set the skb's network_header before it is passed to the underlying
      Ethernet device for transmission.
      
      This patch fixes the following issue:
      
      When we use this driver with AF_PACKET sockets, there would be error
      messages of:
         protocol 0805 is buggy, dev (Ethernet interface name)
      printed in the system "dmesg" log.
      
      This is because skbs passed down to the Ethernet device for transmission
      don't have their network_header properly set, and the dev_queue_xmit_nit
      function in net/core/dev.c complains about this.
      
      Reason of setting the network_header to this place (at the end of the
      Ethernet header, and at the beginning of the Ethernet payload):
      
      Because when this driver receives an skb from the Ethernet device, the
      network_header is also set at this place.
      
      Cc: Martin Schiller <ms@dev.tdt.de>
      Signed-off-by: default avatarXie He <xie.he.0141@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91244d10
    • Florian Westphal's avatar
      mptcp: free acked data before waiting for more memory · 1cec170d
      Florian Westphal authored
      After subflow lock is dropped, more wmem might have been made available.
      
      This fixes a deadlock in mptcp_connect.sh 'mmap' mode: wmem is exhausted.
      But as the mptcp socket holds on to already-acked data (for retransmit)
      no wakeup will occur.
      
      Using 'goto restart' calls mptcp_clean_una(sk) which will free pages
      that have been acked completely in the mean time.
      
      Fixes: fb529e62 ("mptcp: break and restart in case mptcp sndbuf is full")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cec170d
    • Vinicius Costa Gomes's avatar
      taprio: Fix using wrong queues in gate mask · 09e31cf0
      Vinicius Costa Gomes authored
      Since commit 9c66d156 ("taprio: Add support for hardware
      offloading") there's a bit of inconsistency when offloading schedules
      to the hardware:
      
      In software mode, the gate masks are specified in terms of traffic
      classes, so if say "sched-entry S 03 20000", it means that the traffic
      classes 0 and 1 are open for 20us; when taprio is offloaded to
      hardware, the gate masks are specified in terms of hardware queues.
      
      The idea here is to fix hardware offloading, so schedules in hardware
      and software mode have the same behavior. What's needed to do is to
      map traffic classes to queues when applying the offload to the driver.
      
      Fixes: 9c66d156 ("taprio: Add support for hardware offloading")
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09e31cf0
    • YueHaibing's avatar
      net: cdc_ncm: Fix build error · 5fd99b5d
      YueHaibing authored
      If USB_NET_CDC_NCM is y and USB_NET_CDCETHER is m, build fails:
      
      drivers/net/usb/cdc_ncm.o:(.rodata+0x1d8): undefined reference to `usbnet_cdc_update_filter'
      
      Select USB_NET_CDCETHER for USB_NET_CDC_NCM to fix this.
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Fixes: e10dcb1b ("net: cdc_ncm: hook into set_rx_mode to admit multicast traffic")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fd99b5d
    • Yi Li's avatar
      net: hns3: Fix for geneve tx checksum bug · a156998f
      Yi Li authored
      when skb->encapsulation is 0, skb->ip_summed is CHECKSUM_PARTIAL
      and it is udp packet, which has a dest port as the IANA assigned.
      the hardware is expected to do the checksum offload, but the
      hardware will not do the checksum offload when udp dest port is
      6081.
      
      This patch fixes it by doing the checksum in software.
      Reported-by: default avatarLi Bing <libing@winhong.com>
      Signed-off-by: default avatarYi Li <yili@winhong.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a156998f
    • David S. Miller's avatar
      Merge branch 'bnxt_en-Bug-fixes' · 0a3445b8
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Bug fixes.
      
      This set of driver patches include bug fixes for ethtool get channels,
      ethtool statistics, ethtool NVRAM, AER recovery, a firmware reset issue
      that could potentially crash, hwmon temperature reporting issue on VF,
      and 2 fixes for regressions introduced by the recent user-defined RSS
      map feature.
      
      Please queue patches 1 to 6 for -stable.  Thanks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a3445b8
    • Michael Chan's avatar
      bnxt_en: Setup default RSS map in all scenarios. · b43b9f53
      Michael Chan authored
      The recent changes to support user-defined RSS map assume that RX
      rings are always reserved and the default RSS map is set after the
      RX rings are successfully reserved.  If the firmware spec is older
      than 1.6.1, no ring reservations are required and the default RSS
      map is not setup at all.  In another scenario where the fw Resource
      Manager is older, RX rings are not reserved and we also end up with
      no valid RSS map.
      
      Fix both issues in bnxt_need_reserve_rings().  In both scenarios
      described above, we don't need to reserve RX rings so we need to
      call this new function bnxt_check_rss_map_no_rmgr() to setup the
      default RSS map when needed.
      
      Without valid RSS map, the NIC won't receive packets properly.
      
      Fixes: 1667cbf6 ("bnxt_en: Add logical RSS indirection table structure.")
      Reviewed-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Reviewed-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b43b9f53