1. 10 Sep, 2020 10 commits
    • Dexuan Cui's avatar
      hv_netvsc: Cache the current data path to avoid duplicate call and message · da26658c
      Dexuan Cui authored
      The previous change "hv_netvsc: Switch the data path at the right time
      during hibernation" adds the call of netvsc_vf_changed() upon
      NETDEV_CHANGE, so it's necessary to avoid the duplicate call and message
      when the VF is brought UP or DOWN.
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da26658c
    • Dexuan Cui's avatar
      hv_netvsc: Switch the data path at the right time during hibernation · de214e52
      Dexuan Cui authored
      When netvsc_resume() is called, the mlx5 VF NIC has not been resumed yet,
      so in the future the host might sliently fail the call netvsc_vf_changed()
      -> netvsc_switch_datapath() there, even if the call works now.
      
      Call netvsc_vf_changed() in the NETDEV_CHANGE event handler: at that time
      the mlx5 VF NIC has been resumed.
      
      Fixes: 19162fd4 ("hv_netvsc: Fix hibernation for mlx5 VF driver")
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de214e52
    • Yunsheng Lin's avatar
      net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc · 2fb541c8
      Yunsheng Lin authored
      Currently there is concurrent reset and enqueue operation for the
      same lockless qdisc when there is no lock to synchronize the
      q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in
      qdisc_deactivate() called by dev_deactivate_queue(), which may cause
      out-of-bounds access for priv->ring[] in hns3 driver if user has
      requested a smaller queue num when __dev_xmit_skb() still enqueue a
      skb with a larger queue_mapping after the corresponding qdisc is
      reset, and call hns3_nic_net_xmit() with that skb later.
      
      Reused the existing synchronize_net() in dev_deactivate_many() to
      make sure skb with larger queue_mapping enqueued to old qdisc(which
      is saved in dev_queue->qdisc_sleeping) will always be reset when
      dev_reset_queue() is called.
      
      Fixes: 6b3ba914 ("net: sched: allow qdiscs to handle locking")
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fb541c8
    • Helmut Grohne's avatar
      net: dsa: microchip: look for phy-mode in port nodes · edecfa98
      Helmut Grohne authored
      Documentation/devicetree/bindings/net/dsa/dsa.txt says that the phy-mode
      property should be specified on port nodes. However, the microchip
      drivers read it from the switch node.
      
      Let the driver use the per-port property and fall back to the old
      location with a warning.
      
      Fix in-tree users.
      Signed-off-by: default avatarHelmut Grohne <helmut.grohne@intenta.de>
      Link: https://lore.kernel.org/netdev/20200617082235.GA1523@laureti-dev/Acked-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edecfa98
    • Geliang Tang's avatar
      mptcp: fix kmalloc flag in mptcp_pm_nl_get_local_id · f612eb76
      Geliang Tang authored
      mptcp_pm_nl_get_local_id may be called in interrupt context, so we need to
      use GFP_ATOMIC flag to allocate memory to avoid sleeping in atomic context.
      
      [  280.209809] BUG: sleeping function called from invalid context at mm/slab.h:498
      [  280.209812] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1680, name: kworker/1:3
      [  280.209814] INFO: lockdep is turned off.
      [  280.209816] CPU: 1 PID: 1680 Comm: kworker/1:3 Tainted: G        W         5.9.0-rc3-mptcp+ #146
      [  280.209818] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  280.209820] Workqueue: events mptcp_worker
      [  280.209822] Call Trace:
      [  280.209824]  <IRQ>
      [  280.209826]  dump_stack+0x77/0xa0
      [  280.209829]  ___might_sleep.cold+0xa6/0xb6
      [  280.209832]  kmem_cache_alloc_trace+0x1d1/0x290
      [  280.209835]  mptcp_pm_nl_get_local_id+0x23c/0x410
      [  280.209840]  subflow_init_req+0x1e9/0x2ea
      [  280.209843]  ? inet_reqsk_alloc+0x1c/0x120
      [  280.209845]  ? kmem_cache_alloc+0x264/0x290
      [  280.209849]  tcp_conn_request+0x303/0xae0
      [  280.209854]  ? printk+0x53/0x6a
      [  280.209857]  ? tcp_rcv_state_process+0x28f/0x1374
      [  280.209859]  tcp_rcv_state_process+0x28f/0x1374
      [  280.209864]  ? tcp_v4_do_rcv+0xb3/0x1f0
      [  280.209866]  tcp_v4_do_rcv+0xb3/0x1f0
      [  280.209869]  tcp_v4_rcv+0xed6/0xfa0
      [  280.209873]  ip_protocol_deliver_rcu+0x28/0x270
      [  280.209875]  ip_local_deliver_finish+0x89/0x120
      [  280.209877]  ip_local_deliver+0x180/0x220
      [  280.209881]  ip_rcv+0x166/0x210
      [  280.209885]  __netif_receive_skb_one_core+0x82/0x90
      [  280.209888]  process_backlog+0xd6/0x230
      [  280.209891]  net_rx_action+0x13a/0x410
      [  280.209895]  __do_softirq+0xcf/0x468
      [  280.209899]  asm_call_on_stack+0x12/0x20
      [  280.209901]  </IRQ>
      [  280.209903]  ? ip_finish_output2+0x240/0x9a0
      [  280.209906]  do_softirq_own_stack+0x4d/0x60
      [  280.209908]  do_softirq.part.0+0x2b/0x60
      [  280.209911]  __local_bh_enable_ip+0x9a/0xa0
      [  280.209913]  ip_finish_output2+0x264/0x9a0
      [  280.209916]  ? rcu_read_lock_held+0x4d/0x60
      [  280.209920]  ? ip_output+0x7a/0x250
      [  280.209922]  ip_output+0x7a/0x250
      [  280.209925]  ? __ip_finish_output+0x330/0x330
      [  280.209928]  __ip_queue_xmit+0x1dc/0x5a0
      [  280.209931]  __tcp_transmit_skb+0xa0f/0xc70
      [  280.209937]  tcp_connect+0xb03/0xff0
      [  280.209939]  ? lockdep_hardirqs_on_prepare+0xe7/0x190
      [  280.209942]  ? ktime_get_with_offset+0x125/0x150
      [  280.209944]  ? trace_hardirqs_on+0x1c/0xe0
      [  280.209948]  tcp_v4_connect+0x449/0x550
      [  280.209953]  __inet_stream_connect+0xbb/0x320
      [  280.209955]  ? mark_held_locks+0x49/0x70
      [  280.209958]  ? lockdep_hardirqs_on_prepare+0xe7/0x190
      [  280.209960]  ? __local_bh_enable_ip+0x6b/0xa0
      [  280.209963]  inet_stream_connect+0x32/0x50
      [  280.209966]  __mptcp_subflow_connect+0x1fd/0x242
      [  280.209972]  mptcp_pm_create_subflow_or_signal_addr+0x2db/0x600
      [  280.209975]  mptcp_worker+0x543/0x7a0
      [  280.209980]  process_one_work+0x26d/0x5b0
      [  280.209984]  ? process_one_work+0x5b0/0x5b0
      [  280.209987]  worker_thread+0x48/0x3d0
      [  280.209990]  ? process_one_work+0x5b0/0x5b0
      [  280.209993]  kthread+0x117/0x150
      [  280.209996]  ? kthread_park+0x80/0x80
      [  280.209998]  ret_from_fork+0x22/0x30
      
      Fixes: 01cacb00 ("mptcp: add netlink-based PM")
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f612eb76
    • David S. Miller's avatar
      Merge branch 'mptcp-fix-subflow-s-local_id-remote_id-issues' · d697f42a
      David S. Miller authored
      Geliang Tang says:
      
      ====================
      mptcp: fix subflow's local_id/remote_id issues
      
      v2:
       - add Fixes tags;
       - simply with 'return addresses_equal';
       - use 'reversed Xmas tree' way.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d697f42a
    • Geliang Tang's avatar
      mptcp: fix subflow's remote_id issues · 2ff0e566
      Geliang Tang authored
      This patch set the init remote_id to zero, otherwise it will be a random
      number.
      
      Then it added the missing subflow's remote_id setting code both in
      __mptcp_subflow_connect and in subflow_ulp_clone.
      
      Fixes: 01cacb00 ("mptcp: add netlink-based PM")
      Fixes: ec3edaa7 ("mptcp: Add handling of outgoing MP_JOIN requests")
      Fixes: f296234c ("mptcp: Add handling of incoming MP_JOIN requests")
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ff0e566
    • Geliang Tang's avatar
      mptcp: fix subflow's local_id issues · 57025817
      Geliang Tang authored
      In mptcp_pm_nl_get_local_id, skc_local is the same as msk_local, so it
      always return 0. Thus every subflow's local_id is 0. It's incorrect.
      
      This patch fixed this issue.
      
      Also, we need to ignore the zero address here, like 0.0.0.0 in IPv4. When
      we use the zero address as a local address, it means that we can use any
      one of the local addresses. The zero address is not a new address, we don't
      need to add it to PM, so this patch added a new function address_zero to
      check whether an address is the zero address, if it is, we ignore this
      address.
      
      Fixes: 01cacb00 ("mptcp: add netlink-based PM")
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57025817
    • Tetsuo Handa's avatar
      tipc: fix shutdown() of connection oriented socket · a4b5cc9e
      Tetsuo Handa authored
      I confirmed that the problem fixed by commit 2a63866c ("tipc: fix
      shutdown() of connectionless socket") also applies to stream socket.
      
      ----------
      #include <sys/socket.h>
      #include <unistd.h>
      #include <sys/wait.h>
      
      int main(int argc, char *argv[])
      {
              int fds[2] = { -1, -1 };
              socketpair(PF_TIPC, SOCK_STREAM /* or SOCK_DGRAM */, 0, fds);
              if (fork() == 0)
                      _exit(read(fds[0], NULL, 1));
              shutdown(fds[0], SHUT_RDWR); /* This must make read() return. */
              wait(NULL); /* To be woken up by _exit(). */
              return 0;
      }
      ----------
      
      Since shutdown(SHUT_RDWR) should affect all processes sharing that socket,
      unconditionally setting sk->sk_shutdown to SHUTDOWN_MASK will be the right
      behavior.
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4b5cc9e
    • David S. Miller's avatar
      connector: Move maintainence under networking drivers umbrella. · 46cf789b
      David S. Miller authored
      Evgeniy does not have the time nor capacity to maintain the
      connector subsystem any longer, so just move it under networking
      as that is effectively what has been happening lately.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46cf789b
  2. 09 Sep, 2020 19 commits
    • David S. Miller's avatar
      Merge branch 'net-qed-disable-aRFS-in-NPAR-and-100G' · 9b29e26f
      David S. Miller authored
      Igor Russkikh says:
      
      ====================
      net: qed disable aRFS in NPAR and 100G
      
      This patchset fixes some recent issues found by customers.
      
      v3:
        resending on Dmitry's behalf
      
      v2:
        correct hash in Fixes tag
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b29e26f
    • Dmitry Bogdanov's avatar
      net: qed: RDMA personality shouldn't fail VF load · ce1cf9e5
      Dmitry Bogdanov authored
      Fix the assert during VF driver installation when the personality is iWARP
      
      Fixes: 1fe614d1 ("qed: Relax VF firmware requirements")
      Signed-off-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarMichal Kalderon <michal.kalderon@marvell.com>
      Signed-off-by: default avatarDmitry Bogdanov <dbogdanov@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce1cf9e5
    • Dmitry Bogdanov's avatar
      net: qede: Disable aRFS for NPAR and 100G · 0367f058
      Dmitry Bogdanov authored
      In some configurations ARFS cannot be used, so disable it if device
      is not capable.
      
      Fixes: e4917d46 ("qede: Add aRFS support")
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarMichal Kalderon <michal.kalderon@marvell.com>
      Signed-off-by: default avatarDmitry Bogdanov <dbogdanov@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0367f058
    • Dmitry Bogdanov's avatar
      net: qed: Disable aRFS for NPAR and 100G · 2d2fe843
      Dmitry Bogdanov authored
      In CMT and NPAR the PF is unknown when the GFS block processes the
      packet. Therefore cannot use searcher as it has a per PF database,
      and thus ARFS must be disabled.
      
      Fixes: d51e4af5 ("qed: aRFS infrastructure support")
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarMichal Kalderon <michal.kalderon@marvell.com>
      Signed-off-by: default avatarDmitry Bogdanov <dbogdanov@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d2fe843
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-2020-09-09' of... · a19454b6
      David S. Miller authored
      Merge tag 'wireless-drivers-2020-09-09' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for v5.9
      
      First set of fixes for v5.9, small but important.
      
      brcmfmac
      
      * fix a throughput regression on bcm4329
      
      mt76
      
      * fix a regression with stations reconnecting on mt7616
      
      * properly free tx skbs, it was working by accident before
      
      mwifiex
      
      * fix a regression with 256 bit encryption keys
      
      wlcore
      
      * revert AES CMAC support as it caused a regression
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a19454b6
    • David S. Miller's avatar
      Merge branch 'wireguard-fixes' · 99dc4a5d
      David S. Miller authored
      Jason A. Donenfeld says:
      
      ====================
      wireguard fixes for 5.9-rc5
      
      Yesterday, Eric reported a race condition found by syzbot. This series
      contains two commits, one that fixes the direct issue, and another that
      addresses the more general issue, as a defense in depth.
      
      1) The basic problem syzbot unearthed was that one particular mutation
         of handshake->entry was not protected by the handshake mutex like the
         other cases, so this patch basically just reorders a line to make
         sure the mutex is actually taken at the right point. Most of the work
         here went into making sure the race was fully understood and making a
         reproducer (which syzbot was unable to do itself, due to the rarity
         of the race).
      
      2) Eric's initial suggestion for fixing this was taking a spinlock
         around the hash table replace function where the null ptr deref was
         happening. This doesn't address the main problem in the most precise
         possible way like (1) does, but it is a good suggestion for
         defense-in-depth, in case related issues come up in the future, and
         basically costs nothing from a performance perspective. I thought it
         aided in implementing a good general rule: all mutators of that hash
         table take the table lock. So that's part of this series as a
         companion.
      
      Both of these contain Fixes: tags and are good candidates for stable.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99dc4a5d
    • Jason A. Donenfeld's avatar
      wireguard: peerlookup: take lock before checking hash in replace operation · 6147f7b1
      Jason A. Donenfeld authored
      Eric's suggested fix for the previous commit's mentioned race condition
      was to simply take the table->lock in wg_index_hashtable_replace(). The
      table->lock of the hash table is supposed to protect the bucket heads,
      not the entires, but actually, since all the mutator functions are
      already taking it, it makes sense to take it too for the test to
      hlist_unhashed, as a defense in depth measure, so that it no longer
      races with deletions, regardless of what other locks are protecting
      individual entries. This is sensible from a performance perspective
      because, as Eric pointed out, the case of being unhashed is already the
      unlikely case, so this won't add common contention. And comparing
      instructions, this basically doesn't make much of a difference other
      than pushing and popping %r13, used by the new `bool ret`. More
      generally, I like the idea of locking consistency across table mutator
      functions, and this might let me rest slightly easier at night.
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/wireguard/20200908145911.4090480-1-edumazet@google.com/
      Fixes: e7096c13 ("net: WireGuard secure network tunnel")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6147f7b1
    • Jason A. Donenfeld's avatar
      wireguard: noise: take lock when removing handshake entry from table · 9179ba31
      Jason A. Donenfeld authored
      Eric reported that syzkaller found a race of this variety:
      
      CPU 1                                       CPU 2
      -------------------------------------------|---------------------------------------
      wg_index_hashtable_replace(old, ...)       |
        if (hlist_unhashed(&old->index_hash))    |
                                                 | wg_index_hashtable_remove(old)
                                                 |   hlist_del_init_rcu(&old->index_hash)
      				           |     old->index_hash.pprev = NULL
        hlist_replace_rcu(&old->index_hash, ...) |
          *old->index_hash.pprev                 |
      
      Syzbot wasn't actually able to reproduce this more than once or create a
      reproducer, because the race window between checking "hlist_unhashed" and
      calling "hlist_replace_rcu" is just so small. Adding an mdelay(5) or
      similar there helps make this demonstrable using this simple script:
      
          #!/bin/bash
          set -ex
          trap 'kill $pid1; kill $pid2; ip link del wg0; ip link del wg1' EXIT
          ip link add wg0 type wireguard
          ip link add wg1 type wireguard
          wg set wg0 private-key <(wg genkey) listen-port 9999
          wg set wg1 private-key <(wg genkey) peer $(wg show wg0 public-key) endpoint 127.0.0.1:9999 persistent-keepalive 1
          wg set wg0 peer $(wg show wg1 public-key)
          ip link set wg0 up
          yes link set wg1 up | ip -force -batch - &
          pid1=$!
          yes link set wg1 down | ip -force -batch - &
          pid2=$!
          wait
      
      The fundumental underlying problem is that we permit calls to wg_index_
      hashtable_remove(handshake.entry) without requiring the caller to take
      the handshake mutex that is intended to protect members of handshake
      during mutations. This is consistently the case with calls to wg_index_
      hashtable_insert(handshake.entry) and wg_index_hashtable_replace(
      handshake.entry), but it's missing from a pertinent callsite of wg_
      index_hashtable_remove(handshake.entry). So, this patch makes sure that
      mutex is taken.
      
      The original code was a little bit funky though, in the form of:
      
          remove(handshake.entry)
          lock(), memzero(handshake.some_members), unlock()
          remove(handshake.entry)
      
      The original intention of that double removal pattern outside the lock
      appears to be some attempt to prevent insertions that might happen while
      locks are dropped during expensive crypto operations, but actually, all
      callers of wg_index_hashtable_insert(handshake.entry) take the write
      lock and then explicitly check handshake.state, as they should, which
      the aforementioned memzero clears, which means an insertion should
      already be impossible. And regardless, the original intention was
      necessarily racy, since it wasn't guaranteed that something else would
      run after the unlock() instead of after the remove(). So, from a
      soundness perspective, it seems positive to remove what looks like a
      hack at best.
      
      The crash from both syzbot and from the script above is as follows:
      
        general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
        KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
        CPU: 0 PID: 7395 Comm: kworker/0:3 Not tainted 5.9.0-rc4-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Workqueue: wg-kex-wg1 wg_packet_handshake_receive_worker
        RIP: 0010:hlist_replace_rcu include/linux/rculist.h:505 [inline]
        RIP: 0010:wg_index_hashtable_replace+0x176/0x330 drivers/net/wireguard/peerlookup.c:174
        Code: 00 fc ff df 48 89 f9 48 c1 e9 03 80 3c 01 00 0f 85 44 01 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 10 48 89 c6 48 c1 ee 03 <80> 3c 0e 00 0f 85 06 01 00 00 48 85 d2 4c 89 28 74 47 e8 a3 4f b5
        RSP: 0018:ffffc90006a97bf8 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffff888050ffc4f8 RCX: dffffc0000000000
        RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88808e04e010
        RBP: ffff88808e04e000 R08: 0000000000000001 R09: ffff8880543d0000
        R10: ffffed100a87a000 R11: 000000000000016e R12: ffff8880543d0000
        R13: ffff88808e04e008 R14: ffff888050ffc508 R15: ffff888050ffc500
        FS:  0000000000000000(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000f5505db0 CR3: 0000000097cf7000 CR4: 00000000001526f0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
        wg_noise_handshake_begin_session+0x752/0xc9a drivers/net/wireguard/noise.c:820
        wg_receive_handshake_packet drivers/net/wireguard/receive.c:183 [inline]
        wg_packet_handshake_receive_worker+0x33b/0x730 drivers/net/wireguard/receive.c:220
        process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
        worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
        kthread+0x3b5/0x4a0 kernel/kthread.c:292
        ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/wireguard/20200908145911.4090480-1-edumazet@google.com/
      Fixes: e7096c13 ("net: WireGuard secure network tunnel")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9179ba31
    • Ye Bin's avatar
      hsr: avoid newline at end of message in NL_SET_ERR_MSG_MOD · b87f9fe1
      Ye Bin authored
      clean follow coccicheck warning:
      net//hsr/hsr_netlink.c:94:8-42: WARNING avoid newline at end of message
      in NL_SET_ERR_MSG_MOD
      net//hsr/hsr_netlink.c:87:30-57: WARNING avoid newline at end of message
      in NL_SET_ERR_MSG_MOD
      net//hsr/hsr_netlink.c:79:29-53: WARNING avoid newline at end of message
      in NL_SET_ERR_MSG_MOD
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b87f9fe1
    • David S. Miller's avatar
      Merge branch 'net-skb_put_padto-fixes' · 0ddaa278
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      net: skb_put_padto() fixes
      
      sysbot reported a bug in qrtr leading to use-after-free.
      
      First patch fixes the issue.
      
      Second patch addes __must_check attribute to avoid similar
      issues in the future.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ddaa278
    • Eric Dumazet's avatar
      net: add __must_check to skb_put_padto() · 4a009cb0
      Eric Dumazet authored
      skb_put_padto() and __skb_put_padto() callers
      must check return values or risk use-after-free.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a009cb0
    • Eric Dumazet's avatar
      net: qrtr: check skb_put_padto() return value · 3ca1a42a
      Eric Dumazet authored
      If skb_put_padto() returns an error, skb has been freed.
      Better not touch it anymore, as reported by syzbot [1]
      
      Note to qrtr maintainers : this suggests qrtr_sendmsg()
      should adjust sock_alloc_send_skb() second parameter
      to account for the potential added alignment to avoid
      reallocation.
      
      [1]
      
      BUG: KASAN: use-after-free in __skb_insert include/linux/skbuff.h:1907 [inline]
      BUG: KASAN: use-after-free in __skb_queue_before include/linux/skbuff.h:2016 [inline]
      BUG: KASAN: use-after-free in __skb_queue_tail include/linux/skbuff.h:2049 [inline]
      BUG: KASAN: use-after-free in skb_queue_tail+0x6b/0x120 net/core/skbuff.c:3146
      Write of size 8 at addr ffff88804d8ab3c0 by task syz-executor.4/4316
      
      CPU: 1 PID: 4316 Comm: syz-executor.4 Not tainted 5.9.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1d6/0x29e lib/dump_stack.c:118
       print_address_description+0x66/0x620 mm/kasan/report.c:383
       __kasan_report mm/kasan/report.c:513 [inline]
       kasan_report+0x132/0x1d0 mm/kasan/report.c:530
       __skb_insert include/linux/skbuff.h:1907 [inline]
       __skb_queue_before include/linux/skbuff.h:2016 [inline]
       __skb_queue_tail include/linux/skbuff.h:2049 [inline]
       skb_queue_tail+0x6b/0x120 net/core/skbuff.c:3146
       qrtr_tun_send+0x1a/0x40 net/qrtr/tun.c:23
       qrtr_node_enqueue+0x44f/0xc00 net/qrtr/qrtr.c:364
       qrtr_bcast_enqueue+0xbe/0x140 net/qrtr/qrtr.c:861
       qrtr_sendmsg+0x680/0x9c0 net/qrtr/qrtr.c:960
       sock_sendmsg_nosec net/socket.c:651 [inline]
       sock_sendmsg net/socket.c:671 [inline]
       sock_write_iter+0x317/0x470 net/socket.c:998
       call_write_iter include/linux/fs.h:1882 [inline]
       new_sync_write fs/read_write.c:503 [inline]
       vfs_write+0xa96/0xd10 fs/read_write.c:578
       ksys_write+0x11b/0x220 fs/read_write.c:631
       do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x45d5b9
      Code: 5d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f84b5b81c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000038b40 RCX: 000000000045d5b9
      RDX: 0000000000000055 RSI: 0000000020001240 RDI: 0000000000000003
      RBP: 00007f84b5b81ca0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000f
      R13: 00007ffcbbf86daf R14: 00007f84b5b829c0 R15: 000000000118cf4c
      
      Allocated by task 4316:
       kasan_save_stack mm/kasan/common.c:48 [inline]
       kasan_set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc+0x100/0x130 mm/kasan/common.c:461
       slab_post_alloc_hook+0x3e/0x290 mm/slab.h:518
       slab_alloc mm/slab.c:3312 [inline]
       kmem_cache_alloc+0x1c1/0x2d0 mm/slab.c:3482
       skb_clone+0x1b2/0x370 net/core/skbuff.c:1449
       qrtr_bcast_enqueue+0x6d/0x140 net/qrtr/qrtr.c:857
       qrtr_sendmsg+0x680/0x9c0 net/qrtr/qrtr.c:960
       sock_sendmsg_nosec net/socket.c:651 [inline]
       sock_sendmsg net/socket.c:671 [inline]
       sock_write_iter+0x317/0x470 net/socket.c:998
       call_write_iter include/linux/fs.h:1882 [inline]
       new_sync_write fs/read_write.c:503 [inline]
       vfs_write+0xa96/0xd10 fs/read_write.c:578
       ksys_write+0x11b/0x220 fs/read_write.c:631
       do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 4316:
       kasan_save_stack mm/kasan/common.c:48 [inline]
       kasan_set_track+0x3d/0x70 mm/kasan/common.c:56
       kasan_set_free_info+0x17/0x30 mm/kasan/generic.c:355
       __kasan_slab_free+0xdd/0x110 mm/kasan/common.c:422
       __cache_free mm/slab.c:3418 [inline]
       kmem_cache_free+0x82/0xf0 mm/slab.c:3693
       __skb_pad+0x3f5/0x5a0 net/core/skbuff.c:1823
       __skb_put_padto include/linux/skbuff.h:3233 [inline]
       skb_put_padto include/linux/skbuff.h:3252 [inline]
       qrtr_node_enqueue+0x62f/0xc00 net/qrtr/qrtr.c:360
       qrtr_bcast_enqueue+0xbe/0x140 net/qrtr/qrtr.c:861
       qrtr_sendmsg+0x680/0x9c0 net/qrtr/qrtr.c:960
       sock_sendmsg_nosec net/socket.c:651 [inline]
       sock_sendmsg net/socket.c:671 [inline]
       sock_write_iter+0x317/0x470 net/socket.c:998
       call_write_iter include/linux/fs.h:1882 [inline]
       new_sync_write fs/read_write.c:503 [inline]
       vfs_write+0xa96/0xd10 fs/read_write.c:578
       ksys_write+0x11b/0x220 fs/read_write.c:631
       do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The buggy address belongs to the object at ffff88804d8ab3c0
       which belongs to the cache skbuff_head_cache of size 224
      The buggy address is located 0 bytes inside of
       224-byte region [ffff88804d8ab3c0, ffff88804d8ab4a0)
      The buggy address belongs to the page:
      page:00000000ea8cccfb refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88804d8abb40 pfn:0x4d8ab
      flags: 0xfffe0000000200(slab)
      raw: 00fffe0000000200 ffffea0002237ec8 ffffea00029b3388 ffff88821bb66800
      raw: ffff88804d8abb40 ffff88804d8ab000 000000010000000b 0000000000000000
      page dumped because: kasan: bad access detected
      
      Fixes: ce57785b ("net: qrtr: fix len of skb_put_padto in qrtr_node_enqueue")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Carl Huang <cjhuang@codeaurora.org>
      Cc: Wen Gong <wgong@codeaurora.org>
      Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
      Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Acked-by: default avatarManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Reviewed-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ca1a42a
    • Wei Wang's avatar
      ip: fix tos reflection in ack and reset packets · ba9e04a7
      Wei Wang authored
      Currently, in tcp_v4_reqsk_send_ack() and tcp_v4_send_reset(), we
      echo the TOS value of the received packets in the response.
      However, we do not want to echo the lower 2 ECN bits in accordance
      with RFC 3168 6.1.5 robustness principles.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba9e04a7
    • David S. Miller's avatar
      Merge tag 'ieee802154-for-davem-2020-09-08' of... · 6fd40d32
      David S. Miller authored
      Merge tag 'ieee802154-for-davem-2020-09-08' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
      
      Stefan Schmidt says:
      
      ====================
      pull-request: ieee802154 for net 2020-09-08
      
      An update from ieee802154 for your *net* tree.
      
      A potential memory leak fix for ca8210 from Liu Jian,
      a check on the return for a register read in adf7242
      and finally a user after free fix in the softmac tx
      function from Eric found by syzkaller.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6fd40d32
    • Jakub Kicinski's avatar
      MAINTAINERS: remove John Allen from ibmvnic · 2a154988
      Jakub Kicinski authored
      John's email has bounced and Thomas confirms he no longer
      works on ibmvnic.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a154988
    • Brian Vazquez's avatar
      fib: fix fib_rule_ops indirect call wrappers when CONFIG_IPV6=m · 923f614c
      Brian Vazquez authored
      If CONFIG_IPV6=m, the IPV6 functions won't be found by the linker:
      
      ld: net/core/fib_rules.o: in function `fib_rules_lookup':
      fib_rules.c:(.text+0x606): undefined reference to `fib6_rule_match'
      ld: fib_rules.c:(.text+0x611): undefined reference to `fib6_rule_match'
      ld: fib_rules.c:(.text+0x68c): undefined reference to `fib6_rule_action'
      ld: fib_rules.c:(.text+0x693): undefined reference to `fib6_rule_action'
      ld: fib_rules.c:(.text+0x6aa): undefined reference to `fib6_rule_suppress'
      ld: fib_rules.c:(.text+0x6bc): undefined reference to `fib6_rule_suppress'
      make: *** [Makefile:1166: vmlinux] Error 1
      Reported-by: default avatarSven Joachim <svenjoac@gmx.de>
      Fixes: b9aaec8f ("fib: use indirect call wrappers in the most common fib_rules_ops")
      Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
      Signed-off-by: default avatarBrian Vazquez <brianvv@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      923f614c
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 2650be2c
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ===================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Allow conntrack entries with l3num == NFPROTO_IPV4 or == NFPROTO_IPV6
         only via ctnetlink, from Will McVicker.
      
      2) Batch notifications to userspace to improve netlink socket receive
         utilization.
      
      3) Restore mark based dump filtering via ctnetlink, from Martin Willi.
      
      4) nf_conncount_init() fails with -EPROTO with CONFIG_IPV6, from
         Eelco Chaudron.
      
      5) Containers fail to match on meta skuid and skgid, use socket user_ns
         to retrieve meta skuid and skgid.
      ===================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2650be2c
    • Eric Dumazet's avatar
      ipv6: avoid lockdep issue in fib6_del() · 843d926b
      Eric Dumazet authored
      syzbot reported twice a lockdep issue in fib6_del() [1]
      which I think is caused by net->ipv6.fib6_null_entry
      having a NULL fib6_table pointer.
      
      fib6_del() already checks for fib6_null_entry special
      case, we only need to return earlier.
      
      Bug seems to occur very rarely, I have thus chosen
      a 'bug origin' that makes backports not too complex.
      
      [1]
      WARNING: suspicious RCU usage
      5.9.0-rc4-syzkaller #0 Not tainted
      -----------------------------
      net/ipv6/ip6_fib.c:1996 suspicious rcu_dereference_protected() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      4 locks held by syz-executor.5/8095:
       #0: ffffffff8a7ea708 (rtnl_mutex){+.+.}-{3:3}, at: ppp_release+0x178/0x240 drivers/net/ppp/ppp_generic.c:401
       #1: ffff88804c422dd8 (&net->ipv6.fib6_gc_lock){+.-.}-{2:2}, at: spin_trylock_bh include/linux/spinlock.h:414 [inline]
       #1: ffff88804c422dd8 (&net->ipv6.fib6_gc_lock){+.-.}-{2:2}, at: fib6_run_gc+0x21b/0x2d0 net/ipv6/ip6_fib.c:2312
       #2: ffffffff89bd6a40 (rcu_read_lock){....}-{1:2}, at: __fib6_clean_all+0x0/0x290 net/ipv6/ip6_fib.c:2613
       #3: ffff8880a82e6430 (&tb->tb6_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:359 [inline]
       #3: ffff8880a82e6430 (&tb->tb6_lock){+.-.}-{2:2}, at: __fib6_clean_all+0x107/0x290 net/ipv6/ip6_fib.c:2245
      
      stack backtrace:
      CPU: 1 PID: 8095 Comm: syz-executor.5 Not tainted 5.9.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x198/0x1fd lib/dump_stack.c:118
       fib6_del+0x12b4/0x1630 net/ipv6/ip6_fib.c:1996
       fib6_clean_node+0x39b/0x570 net/ipv6/ip6_fib.c:2180
       fib6_walk_continue+0x4aa/0x8e0 net/ipv6/ip6_fib.c:2102
       fib6_walk+0x182/0x370 net/ipv6/ip6_fib.c:2150
       fib6_clean_tree+0xdb/0x120 net/ipv6/ip6_fib.c:2230
       __fib6_clean_all+0x120/0x290 net/ipv6/ip6_fib.c:2246
       fib6_clean_all net/ipv6/ip6_fib.c:2257 [inline]
       fib6_run_gc+0x113/0x2d0 net/ipv6/ip6_fib.c:2320
       ndisc_netdev_event+0x217/0x350 net/ipv6/ndisc.c:1805
       notifier_call_chain+0xb5/0x200 kernel/notifier.c:83
       call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:2033
       call_netdevice_notifiers_extack net/core/dev.c:2045 [inline]
       call_netdevice_notifiers net/core/dev.c:2059 [inline]
       dev_close_many+0x30b/0x650 net/core/dev.c:1634
       rollback_registered_many+0x3a8/0x1210 net/core/dev.c:9261
       rollback_registered net/core/dev.c:9329 [inline]
       unregister_netdevice_queue+0x2dd/0x570 net/core/dev.c:10410
       unregister_netdevice include/linux/netdevice.h:2774 [inline]
       ppp_release+0x216/0x240 drivers/net/ppp/ppp_generic.c:403
       __fput+0x285/0x920 fs/file_table.c:281
       task_work_run+0xdd/0x190 kernel/task_work.c:141
       tracehook_notify_resume include/linux/tracehook.h:188 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:163 [inline]
       exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190
       syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 421842ed ("net/ipv6: Add fib6_null_entry")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      843d926b
    • Vladimir Oltean's avatar
      net: dsa: link interfaces with the DSA master to get rid of lockdep warnings · 2f1e8ea7
      Vladimir Oltean authored
      Since commit 845e0ebb ("net: change addr_list_lock back to static
      key"), cascaded DSA setups (DSA switch port as DSA master for another
      DSA switch port) are emitting this lockdep warning:
      
      ============================================
      WARNING: possible recursive locking detected
      5.8.0-rc1-00133-g923e4b5032dd-dirty #208 Not tainted
      --------------------------------------------
      dhcpcd/323 is trying to acquire lock:
      ffff000066dd4268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90
      
      but task is already holding lock:
      ffff00006608c268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&dsa_master_addr_list_lock_key/1);
        lock(&dsa_master_addr_list_lock_key/1);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      3 locks held by dhcpcd/323:
       #0: ffffdbd1381dda18 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x30
       #1: ffff00006614b268 (_xmit_ETHER){+...}-{2:2}, at: dev_set_rx_mode+0x28/0x48
       #2: ffff00006608c268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90
      
      stack backtrace:
      Call trace:
       dump_backtrace+0x0/0x1e0
       show_stack+0x20/0x30
       dump_stack+0xec/0x158
       __lock_acquire+0xca0/0x2398
       lock_acquire+0xe8/0x440
       _raw_spin_lock_nested+0x64/0x90
       dev_mc_sync+0x44/0x90
       dsa_slave_set_rx_mode+0x34/0x50
       __dev_set_rx_mode+0x60/0xa0
       dev_mc_sync+0x84/0x90
       dsa_slave_set_rx_mode+0x34/0x50
       __dev_set_rx_mode+0x60/0xa0
       dev_set_rx_mode+0x30/0x48
       __dev_open+0x10c/0x180
       __dev_change_flags+0x170/0x1c8
       dev_change_flags+0x2c/0x70
       devinet_ioctl+0x774/0x878
       inet_ioctl+0x348/0x3b0
       sock_do_ioctl+0x50/0x310
       sock_ioctl+0x1f8/0x580
       ksys_ioctl+0xb0/0xf0
       __arm64_sys_ioctl+0x28/0x38
       el0_svc_common.constprop.0+0x7c/0x180
       do_el0_svc+0x2c/0x98
       el0_sync_handler+0x9c/0x1b8
       el0_sync+0x158/0x180
      
      Since DSA never made use of the netdev API for describing links between
      upper devices and lower devices, the dev->lower_level value of a DSA
      switch interface would be 1, which would warn when it is a DSA master.
      
      We can use netdev_upper_dev_link() to describe the relationship between
      a DSA slave and a DSA master. To be precise, a DSA "slave" (switch port)
      is an "upper" to a DSA "master" (host port). The relationship is "many
      uppers to one lower", like in the case of VLAN. So, for that reason, we
      use the same function as VLAN uses.
      
      There might be a chance that somebody will try to take hold of this
      interface and use it immediately after register_netdev() and before
      netdev_upper_dev_link(). To avoid that, we do the registration and
      linkage while holding the RTNL, and we use the RTNL-locked cousin of
      register_netdev(), which is register_netdevice().
      
      Since this warning was not there when lockdep was using dynamic keys for
      addr_list_lock, we are blaming the lockdep patch itself. The network
      stack _has_ been using static lockdep keys before, and it _is_ likely
      that stacked DSA setups have been triggering these lockdep warnings
      since forever, however I can't test very old kernels on this particular
      stacked DSA setup, to ensure I'm not in fact introducing regressions.
      
      Fixes: 845e0ebb ("net: change addr_list_lock back to static key")
      Suggested-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f1e8ea7
  3. 08 Sep, 2020 7 commits
    • Eric Dumazet's avatar
      mac802154: tx: fix use-after-free · 0ff4628f
      Eric Dumazet authored
      syzbot reported a bug in ieee802154_tx() [1]
      
      A similar issue in ieee802154_xmit_worker() is also fixed in this patch.
      
      [1]
      BUG: KASAN: use-after-free in ieee802154_tx+0x3d2/0x480 net/mac802154/tx.c:88
      Read of size 4 at addr ffff8880251a8c70 by task syz-executor.3/928
      
      CPU: 0 PID: 928 Comm: syz-executor.3 Not tainted 5.9.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x198/0x1fd lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
       __kasan_report mm/kasan/report.c:513 [inline]
       kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
       ieee802154_tx+0x3d2/0x480 net/mac802154/tx.c:88
       ieee802154_subif_start_xmit+0xbe/0xe4 net/mac802154/tx.c:130
       __netdev_start_xmit include/linux/netdevice.h:4634 [inline]
       netdev_start_xmit include/linux/netdevice.h:4648 [inline]
       dev_direct_xmit+0x4e9/0x6e0 net/core/dev.c:4203
       packet_snd net/packet/af_packet.c:2989 [inline]
       packet_sendmsg+0x2413/0x5290 net/packet/af_packet.c:3014
       sock_sendmsg_nosec net/socket.c:651 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:671
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2353
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2407
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2440
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x45d5b9
      Code: 5d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fc98e749c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 000000000002ccc0 RCX: 000000000045d5b9
      RDX: 0000000000000000 RSI: 0000000020007780 RDI: 000000000000000b
      RBP: 000000000118d020 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118cfec
      R13: 00007fff690c720f R14: 00007fc98e74a9c0 R15: 000000000118cfec
      
      Allocated by task 928:
       kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
       kasan_set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461
       slab_post_alloc_hook mm/slab.h:518 [inline]
       slab_alloc_node mm/slab.c:3254 [inline]
       kmem_cache_alloc_node+0x136/0x3e0 mm/slab.c:3574
       __alloc_skb+0x71/0x550 net/core/skbuff.c:198
       alloc_skb include/linux/skbuff.h:1094 [inline]
       alloc_skb_with_frags+0x92/0x570 net/core/skbuff.c:5771
       sock_alloc_send_pskb+0x72a/0x880 net/core/sock.c:2348
       packet_alloc_skb net/packet/af_packet.c:2837 [inline]
       packet_snd net/packet/af_packet.c:2932 [inline]
       packet_sendmsg+0x19fb/0x5290 net/packet/af_packet.c:3014
       sock_sendmsg_nosec net/socket.c:651 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:671
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2353
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2407
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2440
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 928:
       kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
       kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
       kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
       __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422
       __cache_free mm/slab.c:3418 [inline]
       kmem_cache_free.part.0+0x74/0x1e0 mm/slab.c:3693
       kfree_skbmem+0xef/0x1b0 net/core/skbuff.c:622
       __kfree_skb net/core/skbuff.c:679 [inline]
       consume_skb net/core/skbuff.c:838 [inline]
       consume_skb+0xcf/0x160 net/core/skbuff.c:832
       __dev_kfree_skb_any+0x9c/0xc0 net/core/dev.c:3107
       fakelb_hw_xmit+0x20e/0x2a0 drivers/net/ieee802154/fakelb.c:81
       drv_xmit_async net/mac802154/driver-ops.h:16 [inline]
       ieee802154_tx+0x282/0x480 net/mac802154/tx.c:81
       ieee802154_subif_start_xmit+0xbe/0xe4 net/mac802154/tx.c:130
       __netdev_start_xmit include/linux/netdevice.h:4634 [inline]
       netdev_start_xmit include/linux/netdevice.h:4648 [inline]
       dev_direct_xmit+0x4e9/0x6e0 net/core/dev.c:4203
       packet_snd net/packet/af_packet.c:2989 [inline]
       packet_sendmsg+0x2413/0x5290 net/packet/af_packet.c:3014
       sock_sendmsg_nosec net/socket.c:651 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:671
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2353
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2407
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2440
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The buggy address belongs to the object at ffff8880251a8c00
       which belongs to the cache skbuff_head_cache of size 224
      The buggy address is located 112 bytes inside of
       224-byte region [ffff8880251a8c00, ffff8880251a8ce0)
      The buggy address belongs to the page:
      page:0000000062b6a4f1 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x251a8
      flags: 0xfffe0000000200(slab)
      raw: 00fffe0000000200 ffffea0000435c88 ffffea00028b6c08 ffff8880a9055d00
      raw: 0000000000000000 ffff8880251a80c0 000000010000000c 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8880251a8b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880251a8b80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff8880251a8c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                                   ^
       ffff8880251a8c80: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
       ffff8880251a8d00: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
      
      Fixes: 409c3b0c ("mac802154: tx: move stats tx increment")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Alexander Aring <alex.aring@gmail.com>
      Cc: Stefan Schmidt <stefan@datenfreihafen.org>
      Cc: linux-wpan@vger.kernel.org
      Link: https://lore.kernel.org/r/20200908104025.4009085-1-edumazet@google.comSigned-off-by: default avatarStefan Schmidt <stefan@datenfreihafen.org>
      0ff4628f
    • Pablo Neira Ayuso's avatar
      netfilter: nft_meta: use socket user_ns to retrieve skuid and skgid · 0c92411b
      Pablo Neira Ayuso authored
      ... instead of using init_user_ns.
      
      Fixes: 96518518 ("netfilter: add nftables")
      Tested-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0c92411b
    • Eelco Chaudron's avatar
      netfilter: conntrack: nf_conncount_init is failing with IPv6 disabled · 526e81b9
      Eelco Chaudron authored
      The openvswitch module fails initialization when used in a kernel
      without IPv6 enabled. nf_conncount_init() fails because the ct code
      unconditionally tries to initialize the netns IPv6 related bit,
      regardless of the build option. The change below ignores the IPv6
      part if not enabled.
      
      Note that the corresponding _put() function already has this IPv6
      configuration check.
      
      Fixes: 11efd5cb ("openvswitch: Support conntrack zone limit")
      Signed-off-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      526e81b9
    • Martin Willi's avatar
      netfilter: ctnetlink: fix mark based dump filtering regression · 6c0d95d1
      Martin Willi authored
      conntrack mark based dump filtering may falsely skip entries if a mask
      is given: If the mask-based check does not filter out the entry, the
      else-if check is always true and compares the mark without considering
      the mask. The if/else-if logic seems wrong.
      
      Given that the mask during filter setup is implicitly set to 0xffffffff
      if not specified explicitly, the mark filtering flags seem to just
      complicate things. Restore the previously used approach by always
      matching against a zero mask is no filter mark is given.
      
      Fixes: cb8aa9a3 ("netfilter: ctnetlink: add kernel side filtering for dump")
      Signed-off-by: default avatarMartin Willi <martin@strongswan.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6c0d95d1
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: coalesce multiple notifications into one skbuff · 67cc570e
      Pablo Neira Ayuso authored
      On x86_64, each notification results in one skbuff allocation which
      consumes at least 768 bytes due to the skbuff overhead.
      
      This patch coalesces several notifications into one single skbuff, so
      each notification consumes at least ~211 bytes, that ~3.5 times less
      memory consumption. As a result, this is reducing the chances to exhaust
      the netlink socket receive buffer.
      
      Rule of thumb is that each notification batch only contains netlink
      messages whose report flag is the same, nfnetlink_send() requires this
      to do appropriate delivery to userspace, either via unicast (echo
      mode) or multicast (monitor mode).
      
      The skbuff control buffer is used to annotate the report flag for later
      handling at the new coalescing routine.
      
      The batch skbuff notification size is NLMSG_GOODSIZE, using a larger
      skbuff would allow for more socket receiver buffer savings (to amortize
      the cost of the skbuff even more), however, going over that size might
      break userspace applications, so let's be conservative and stick to
      NLMSG_GOODSIZE.
      Reported-by: default avatarPhil Sutter <phil@nwl.cc>
      Acked-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      67cc570e
    • Will McVicker's avatar
      netfilter: ctnetlink: add a range check for l3/l4 protonum · 1cc5ef91
      Will McVicker authored
      The indexes to the nf_nat_l[34]protos arrays come from userspace. So
      check the tuple's family, e.g. l3num, when creating the conntrack in
      order to prevent an OOB memory access during setup.  Here is an example
      kernel panic on 4.14.180 when userspace passes in an index greater than
      NFPROTO_NUMPROTO.
      
      Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
      Modules linked in:...
      Process poc (pid: 5614, stack limit = 0x00000000a3933121)
      CPU: 4 PID: 5614 Comm: poc Tainted: G S      W  O    4.14.180-g051355490483
      Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM
      task: 000000002a3dfffe task.stack: 00000000a3933121
      pc : __cfi_check_fail+0x1c/0x24
      lr : __cfi_check_fail+0x1c/0x24
      ...
      Call trace:
      __cfi_check_fail+0x1c/0x24
      name_to_dev_t+0x0/0x468
      nfnetlink_parse_nat_setup+0x234/0x258
      ctnetlink_parse_nat_setup+0x4c/0x228
      ctnetlink_new_conntrack+0x590/0xc40
      nfnetlink_rcv_msg+0x31c/0x4d4
      netlink_rcv_skb+0x100/0x184
      nfnetlink_rcv+0xf4/0x180
      netlink_unicast+0x360/0x770
      netlink_sendmsg+0x5a0/0x6a4
      ___sys_sendmsg+0x314/0x46c
      SyS_sendmsg+0xb4/0x108
      el0_svc_naked+0x34/0x38
      
      This crash is not happening since 5.4+, however, ctnetlink still
      allows for creating entries with unsupported layer 3 protocol number.
      
      Fixes: c1d10adb ("[NETFILTER]: Add ctnetlink port for nf_conntrack")
      Signed-off-by: default avatarWill McVicker <willmcvicker@google.com>
      [pablo@netfilter.org: rebased original patch on top of nf.git]
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1cc5ef91
    • Dexuan Cui's avatar
      hv_netvsc: Fix hibernation for mlx5 VF driver · 19162fd4
      Dexuan Cui authored
      mlx5_suspend()/resume() keep the network interface, so during hibernation
      netvsc_unregister_vf() and netvsc_register_vf() are not called, and hence
      netvsc_resume() should call netvsc_vf_changed() to switch the data path
      back to the VF after hibernation. Note: after we close and re-open the
      vmbus channel of the netvsc NIC in netvsc_suspend() and netvsc_resume(),
      the data path is implicitly switched to the netvsc NIC. Similarly,
      netvsc_suspend() should not call netvsc_unregister_vf(), otherwise the VF
      can no longer be used after hibernation.
      
      For mlx4, since the VF network interafce is explicitly destroyed and
      re-created during hibernation (see mlx4_suspend()/resume()), hv_netvsc
      already explicitly switches the data path from and to the VF automatically
      via netvsc_register_vf() and netvsc_unregister_vf(), so mlx4 doesn't need
      this fix. Note: mlx4 can still work with the fix because in
      netvsc_suspend()/resume() ndev_ctx->vf_netdev is NULL for mlx4.
      
      Fixes: 0efeea5f ("hv_netvsc: Add the support of hibernation")
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      19162fd4
  4. 07 Sep, 2020 4 commits
    • Taehee Yoo's avatar
      Revert "netns: don't disable BHs when locking "nsid_lock"" · e1f469cd
      Taehee Yoo authored
      This reverts commit 8d7e5dee.
      
      To protect netns id, the nsid_lock is used when netns id is being
      allocated and removed by peernet2id_alloc() and unhash_nsid().
      The nsid_lock can be used in BH context but only spin_lock() is used
      in this code.
      Using spin_lock() instead of spin_lock_bh() can result in a deadlock in
      the following scenario reported by the lockdep.
      In order to avoid a deadlock, the spin_lock_bh() should be used instead
      of spin_lock() to acquire nsid_lock.
      
      Test commands:
          ip netns del nst
          ip netns add nst
          ip link add veth1 type veth peer name veth2
          ip link set veth1 netns nst
          ip netns exec nst ip link add name br1 type bridge vlan_filtering 1
          ip netns exec nst ip link set dev br1 up
          ip netns exec nst ip link set dev veth1 master br1
          ip netns exec nst ip link set dev veth1 up
          ip netns exec nst ip link add macvlan0 link br1 up type macvlan
      
      Splat looks like:
      [   33.615860][  T607] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
      [   33.617194][  T607] 5.9.0-rc1+ #665 Not tainted
      [ ... ]
      [   33.670615][  T607] Chain exists of:
      [   33.670615][  T607]   &mc->mca_lock --> &bridge_netdev_addr_lock_key --> &net->nsid_lock
      [   33.670615][  T607]
      [   33.673118][  T607]  Possible interrupt unsafe locking scenario:
      [   33.673118][  T607]
      [   33.674599][  T607]        CPU0                    CPU1
      [   33.675557][  T607]        ----                    ----
      [   33.676516][  T607]   lock(&net->nsid_lock);
      [   33.677306][  T607]                                local_irq_disable();
      [   33.678517][  T607]                                lock(&mc->mca_lock);
      [   33.679725][  T607]                                lock(&bridge_netdev_addr_lock_key);
      [   33.681166][  T607]   <Interrupt>
      [   33.681791][  T607]     lock(&mc->mca_lock);
      [   33.682579][  T607]
      [   33.682579][  T607]  *** DEADLOCK ***
      [ ... ]
      [   33.922046][  T607] stack backtrace:
      [   33.922999][  T607] CPU: 3 PID: 607 Comm: ip Not tainted 5.9.0-rc1+ #665
      [   33.924099][  T607] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [   33.925714][  T607] Call Trace:
      [   33.926238][  T607]  dump_stack+0x78/0xab
      [   33.926905][  T607]  check_irq_usage+0x70b/0x720
      [   33.927708][  T607]  ? iterate_chain_key+0x60/0x60
      [   33.928507][  T607]  ? check_path+0x22/0x40
      [   33.929201][  T607]  ? check_noncircular+0xcf/0x180
      [   33.930024][  T607]  ? __lock_acquire+0x1952/0x1f20
      [   33.930860][  T607]  __lock_acquire+0x1952/0x1f20
      [   33.931667][  T607]  lock_acquire+0xaf/0x3a0
      [   33.932366][  T607]  ? peernet2id_alloc+0x3a/0x170
      [   33.933147][  T607]  ? br_port_fill_attrs+0x54c/0x6b0 [bridge]
      [   33.934140][  T607]  ? br_port_fill_attrs+0x5de/0x6b0 [bridge]
      [   33.935113][  T607]  ? kvm_sched_clock_read+0x14/0x30
      [   33.935974][  T607]  _raw_spin_lock+0x30/0x70
      [   33.936728][  T607]  ? peernet2id_alloc+0x3a/0x170
      [   33.937523][  T607]  peernet2id_alloc+0x3a/0x170
      [   33.938313][  T607]  rtnl_fill_ifinfo+0xb5e/0x1400
      [   33.939091][  T607]  rtmsg_ifinfo_build_skb+0x8a/0xf0
      [   33.939953][  T607]  rtmsg_ifinfo_event.part.39+0x17/0x50
      [   33.940863][  T607]  rtmsg_ifinfo+0x1f/0x30
      [   33.941571][  T607]  __dev_notify_flags+0xa5/0xf0
      [   33.942376][  T607]  ? __irq_work_queue_local+0x49/0x50
      [   33.943249][  T607]  ? irq_work_queue+0x1d/0x30
      [   33.943993][  T607]  ? __dev_set_promiscuity+0x7b/0x1a0
      [   33.944878][  T607]  __dev_set_promiscuity+0x7b/0x1a0
      [   33.945758][  T607]  dev_set_promiscuity+0x1e/0x50
      [   33.946582][  T607]  br_port_set_promisc+0x1f/0x40 [bridge]
      [   33.947487][  T607]  br_manage_promisc+0x8b/0xe0 [bridge]
      [   33.948388][  T607]  __dev_set_promiscuity+0x123/0x1a0
      [   33.949244][  T607]  __dev_set_rx_mode+0x68/0x90
      [   33.950021][  T607]  dev_uc_add+0x50/0x60
      [   33.950720][  T607]  macvlan_open+0x18e/0x1f0 [macvlan]
      [   33.951601][  T607]  __dev_open+0xd6/0x170
      [   33.952269][  T607]  __dev_change_flags+0x181/0x1d0
      [   33.953056][  T607]  rtnl_configure_link+0x2f/0xa0
      [   33.953884][  T607]  __rtnl_newlink+0x6b9/0x8e0
      [   33.954665][  T607]  ? __lock_acquire+0x95d/0x1f20
      [   33.955450][  T607]  ? lock_acquire+0xaf/0x3a0
      [   33.956193][  T607]  ? is_bpf_text_address+0x5/0xe0
      [   33.956999][  T607]  rtnl_newlink+0x47/0x70
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Fixes: 8d7e5dee ("netns: don't disable BHs when locking "nsid_lock"")
      Reported-by: syzbot+3f960c64a104eaa2c813@syzkaller.appspotmail.com
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e1f469cd
    • Jakub Kicinski's avatar
      ibmvnic: add missing parenthesis in do_reset() · 8ae4dff8
      Jakub Kicinski authored
      Indentation and logic clearly show that this code is missing
      parenthesis.
      
      Fixes: 9f134573 ("ibmvnic fix NULL tx_pools and rx_tools issue at do_reset")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8ae4dff8
    • Randy Dunlap's avatar
      netdevice.h: fix xdp_state kernel-doc warning · ffa59b0b
      Randy Dunlap authored
      Fix kernel-doc warning in <linux/netdevice.h>:
      
      ../include/linux/netdevice.h:2158: warning: Function parameter or member 'xdp_state' not described in 'net_device'
      
      Fixes: 7f0a8382 ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ffa59b0b
    • Randy Dunlap's avatar
      netdevice.h: fix proto_down_reason kernel-doc warning · eb02d39a
      Randy Dunlap authored
      Fix kernel-doc warning in <linux/netdevice.h>:
      
      ../include/linux/netdevice.h:2158: warning: Function parameter or member 'proto_down_reason' not described in 'net_device'
      
      Fixes: 829eb208 ("rtnetlink: add support for protodown reason")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      eb02d39a