1. 27 Feb, 2020 16 commits
    • Luo bin's avatar
      hinic: fix a bug of rss configuration · 386d4716
      Luo bin authored
      should use real receive queue number to configure hw rss
      indirect table rather than maximal queue number
      Signed-off-by: default avatarLuo bin <luobin9@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      386d4716
    • Luo bin's avatar
      hinic: fix a bug of setting hw_ioctxt · d2ed69ce
      Luo bin authored
      a reserved field is used to signify prime physical function index
      in the latest firmware version, so we must assign a value to it
      correctly
      Signed-off-by: default avatarLuo bin <luobin9@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2ed69ce
    • Luo bin's avatar
      hinic: fix a irq affinity bug · 0bff777b
      Luo bin authored
      can not use a local variable as an input parameter of
      irq_set_affinity_hint
      Signed-off-by: default avatarLuo bin <luobin9@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0bff777b
    • Karsten Graul's avatar
      net/smc: check for valid ib_client_data · a2f2ef4a
      Karsten Graul authored
      In smc_ib_remove_dev() check if the provided ib device was actually
      initialized for SMC before.
      
      Reported-by: syzbot+84484ccebdd4e5451d91@syzkaller.appspotmail.com
      Fixes: a4cf0443 ("smc: introduce SMC as an IB-client")
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2f2ef4a
    • Aaro Koskinen's avatar
      net: stmmac: fix notifier registration · 474a31e1
      Aaro Koskinen authored
      We cannot register the same netdev notifier multiple times when probing
      stmmac devices. Register the notifier only once in module init, and also
      make debugfs creation/deletion safe against simultaneous notifier call.
      
      Fixes: 481a7d15 ("stmmac: debugfs entry name is not be changed when udev rename device name.")
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@nokia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      474a31e1
    • Antoine Tenart's avatar
      net: phy: mscc: fix firmware paths · c87a9d6f
      Antoine Tenart authored
      The firmware paths for the VSC8584 PHYs not not contain the leading
      'microchip/' directory, as used in linux-firmware, resulting in an
      error when probing the driver. This patch fixes it.
      
      Fixes: a5afc167 ("net: phy: mscc: add support for VSC8584 PHY")
      Signed-off-by: default avatarAntoine Tenart <antoine.tenart@bootlin.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c87a9d6f
    • Paolo Abeni's avatar
      mptcp: add dummy icsk_sync_mss() · dc24f8b4
      Paolo Abeni authored
      syzbot noted that the master MPTCP socket lacks the icsk_sync_mss
      callback, and was able to trigger a null pointer dereference:
      
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD 8e171067 P4D 8e171067 PUD 93fa2067 PMD 0
      Oops: 0010 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 8984 Comm: syz-executor066 Not tainted 5.6.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:0x0
      Code: Bad RIP value.
      RSP: 0018:ffffc900020b7b80 EFLAGS: 00010246
      RAX: 1ffff110124ba600 RBX: 0000000000000000 RCX: ffff88809fefa600
      RDX: ffff8880994cdb18 RSI: 0000000000000000 RDI: ffff8880925d3140
      RBP: ffffc900020b7bd8 R08: ffffffff870225be R09: fffffbfff140652a
      R10: fffffbfff140652a R11: 0000000000000000 R12: ffff8880925d35d0
      R13: ffff8880925d3140 R14: dffffc0000000000 R15: 1ffff110124ba6ba
      FS:  0000000001a0b880(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffd6 CR3: 00000000a6d6f000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       cipso_v4_sock_setattr+0x34b/0x470 net/ipv4/cipso_ipv4.c:1888
       netlbl_sock_setattr+0x2a7/0x310 net/netlabel/netlabel_kapi.c:989
       smack_netlabel security/smack/smack_lsm.c:2425 [inline]
       smack_inode_setsecurity+0x3da/0x4a0 security/smack/smack_lsm.c:2716
       security_inode_setsecurity+0xb2/0x140 security/security.c:1364
       __vfs_setxattr_noperm+0x16f/0x3e0 fs/xattr.c:197
       vfs_setxattr fs/xattr.c:224 [inline]
       setxattr+0x335/0x430 fs/xattr.c:451
       __do_sys_fsetxattr fs/xattr.c:506 [inline]
       __se_sys_fsetxattr+0x130/0x1b0 fs/xattr.c:495
       __x64_sys_fsetxattr+0xbf/0xd0 fs/xattr.c:495
       do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x440199
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffcadc19e48 EFLAGS: 00000246 ORIG_RAX: 00000000000000be
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440199
      RDX: 0000000020000200 RSI: 00000000200001c0 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000000000003 R09: 00000000004002c8
      R10: 0000000000000009 R11: 0000000000000246 R12: 0000000000401a20
      R13: 0000000000401ab0 R14: 0000000000000000 R15: 0000000000000000
      Modules linked in:
      CR2: 0000000000000000
      
      Address the issue adding a dummy icsk_sync_mss callback.
      To properly sync the subflows mss and options list we need some
      additional infrastructure, which will land to net-next.
      
      Reported-by: syzbot+f4dfece964792d80b139@syzkaller.appspotmail.com
      Fixes: 2303f994 ("mptcp: Associate MPTCP context with TCP socket")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc24f8b4
    • Sudheesh Mavila's avatar
      net: phy: corrected the return value for genphy_check_and_restart_aneg and... · 4f31c532
      Sudheesh Mavila authored
      net: phy: corrected the return value for genphy_check_and_restart_aneg and genphy_c45_check_and_restart_aneg
      
      When auto-negotiation is not required, return value should be zero.
      
      Changes v1->v2:
      - improved comments and code as Andrew Lunn and Heiner Kallweit suggestion
      - fixed issue in genphy_c45_check_and_restart_aneg as Russell King
        suggestion.
      
      Fixes: 2a10ab04 ("net: phy: add genphy_check_and_restart_aneg()")
      Fixes: 1af9f168 ("net: phy: add genphy_c45_check_and_restart_aneg()")
      Signed-off-by: default avatarSudheesh Mavila <sudheesh.mavila@amd.com>
      Reviewed-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f31c532
    • yangerkun's avatar
      slip: not call free_netdev before rtnl_unlock in slip_open · f596c870
      yangerkun authored
      As the description before netdev_run_todo, we cannot call free_netdev
      before rtnl_unlock, fix it by reorder the code.
      Signed-off-by: default avataryangerkun <yangerkun@huawei.com>
      Reviewed-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f596c870
    • Eric Dumazet's avatar
      ipv6: restrict IPV6_ADDRFORM operation · b6f61189
      Eric Dumazet authored
      IPV6_ADDRFORM is able to transform IPv6 socket to IPv4 one.
      While this operation sounds illogical, we have to support it.
      
      One of the things it does for TCP socket is to switch sk->sk_prot
      to tcp_prot.
      
      We now have other layers playing with sk->sk_prot, so we should make
      sure to not interfere with them.
      
      This patch makes sure sk_prot is the default pointer for TCP IPv6 socket.
      
      syzbot reported :
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD a0113067 P4D a0113067 PUD a8771067 PMD 0
      Oops: 0010 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 10686 Comm: syz-executor.0 Not tainted 5.6.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:0x0
      Code: Bad RIP value.
      RSP: 0018:ffffc9000281fce0 EFLAGS: 00010246
      RAX: 1ffffffff15f48ac RBX: ffffffff8afa4560 RCX: dffffc0000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a69a8f40
      RBP: ffffc9000281fd10 R08: ffffffff86ed9b0c R09: ffffed1014d351f5
      R10: ffffed1014d351f5 R11: 0000000000000000 R12: ffff8880920d3098
      R13: 1ffff1101241a613 R14: ffff8880a69a8f40 R15: 0000000000000000
      FS:  00007f2ae75db700(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffd6 CR3: 00000000a3b85000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       inet_release+0x165/0x1c0 net/ipv4/af_inet.c:427
       __sock_release net/socket.c:605 [inline]
       sock_close+0xe1/0x260 net/socket.c:1283
       __fput+0x2e4/0x740 fs/file_table.c:280
       ____fput+0x15/0x20 fs/file_table.c:313
       task_work_run+0x176/0x1b0 kernel/task_work.c:113
       tracehook_notify_resume include/linux/tracehook.h:188 [inline]
       exit_to_usermode_loop arch/x86/entry/common.c:164 [inline]
       prepare_exit_to_usermode+0x480/0x5b0 arch/x86/entry/common.c:195
       syscall_return_slowpath+0x113/0x4a0 arch/x86/entry/common.c:278
       do_syscall_64+0x11f/0x1c0 arch/x86/entry/common.c:304
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x45c429
      Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f2ae75dac78 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: 0000000000000000 RBX: 00007f2ae75db6d4 RCX: 000000000045c429
      RDX: 0000000000000001 RSI: 000000000000011a RDI: 0000000000000004
      RBP: 000000000076bf20 R08: 0000000000000038 R09: 0000000000000000
      R10: 0000000020000180 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 0000000000000a9d R14: 00000000004ccfb4 R15: 000000000076bf2c
      Modules linked in:
      CR2: 0000000000000000
      ---[ end trace 82567b5207e87bae ]---
      RIP: 0010:0x0
      Code: Bad RIP value.
      RSP: 0018:ffffc9000281fce0 EFLAGS: 00010246
      RAX: 1ffffffff15f48ac RBX: ffffffff8afa4560 RCX: dffffc0000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a69a8f40
      RBP: ffffc9000281fd10 R08: ffffffff86ed9b0c R09: ffffed1014d351f5
      R10: ffffed1014d351f5 R11: 0000000000000000 R12: ffff8880920d3098
      R13: 1ffff1101241a613 R14: ffff8880a69a8f40 R15: 0000000000000000
      FS:  00007f2ae75db700(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffd6 CR3: 00000000a3b85000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: syzbot+1938db17e275e85dc328@syzkaller.appspotmail.com
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6f61189
    • Ursula Braun's avatar
      net/smc: fix cleanup for linkgroup setup failures · 51e3dfa8
      Ursula Braun authored
      If an SMC connection to a certain peer is setup the first time,
      a new linkgroup is created. In case of setup failures, such a
      linkgroup is unusable and should disappear. As a first step the
      linkgroup is removed from the linkgroup list in smc_lgr_forget().
      
      There are 2 problems:
      smc_listen_decline() might be called before linkgroup creation
      resulting in a crash due to calling smc_lgr_forget() with
      parameter NULL.
      If a setup failure occurs after linkgroup creation, the connection
      is never unregistered from the linkgroup, preventing linkgroup
      freeing.
      
      This patch introduces an enhanced smc_lgr_cleanup_early() function
      which
      * contains a linkgroup check for early smc_listen_decline()
        invocations
      * invokes smc_conn_free() to guarantee unregistering of the
        connection.
      * schedules fast linkgroup removal of the unusable linkgroup
      
      And the unused function smcd_conn_free() is removed from smc_core.h.
      
      Fixes: 3b2dec26 ("net/smc: restructure client and server code in af_smc")
      Fixes: 2a0674ff ("net/smc: improve abnormal termination of link groups")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51e3dfa8
    • Nicolas Saenz Julienne's avatar
      net: bcmgenet: Clear ID_MODE_DIS in EXT_RGMII_OOB_CTRL when not needed · 402482a6
      Nicolas Saenz Julienne authored
      Outdated Raspberry Pi 4 firmware might configure the external PHY as
      rgmii although the kernel currently sets it as rgmii-rxid. This makes
      connections unreliable as ID_MODE_DIS is left enabled. To avoid this,
      explicitly clear that bit whenever we don't need it.
      
      Fixes: da388022 ("net: bcmgenet: Add RGMII_RXID support")
      Signed-off-by: default avatarNicolas Saenz Julienne <nsaenzjulienne@suse.de>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      402482a6
    • Jiri Pirko's avatar
      sched: act: count in the size of action flags bitfield · 1521a67e
      Jiri Pirko authored
      The put of the flags was added by the commit referenced in fixes tag,
      however the size of the message was not extended accordingly.
      
      Fix this by adding size of the flags bitfield to the message size.
      
      Fixes: e3822678 ("net: sched: update action implementations to support flags")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1521a67e
    • Madhuparna Bhowmik's avatar
      net: core: devlink.c: Use built-in RCU list checking · 2eb51c75
      Madhuparna Bhowmik authored
      list_for_each_entry_rcu() has built-in RCU and lock checking.
      
      Pass cond argument to list_for_each_entry_rcu() to silence
      false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled.
      
      The devlink->lock is held when devlink_dpipe_table_find()
      is called in non RCU read side section. Therefore, pass struct devlink
      to devlink_dpipe_table_find() for lockdep checking.
      Signed-off-by: default avatarMadhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2eb51c75
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Forcibly configure IMP port for 1Gb/sec · 98c5f7d4
      Florian Fainelli authored
      We are still experiencing some packet loss with the existing advanced
      congestion buffering (ACB) settings with the IMP port configured for
      2Gb/sec, so revert to conservative link speeds that do not produce
      packet loss until this is resolved.
      
      Fixes: 8f1880cb ("net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec")
      Fixes: de34d708 ("net: dsa: bcm_sf2: Only 7278 supports 2Gb/sec IMP port")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98c5f7d4
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 574b238f
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes:
      
      1) Perform garbage collection from workqueue to fix rcu detected
         stall in ipset hash set types, from Jozsef Kadlecsik.
      
      2) Fix the forceadd evaluation path, also from Jozsef.
      
      3) Fix nft_set_pipapo selftest, from Stefano Brivio.
      
      4) Crash when add-flush-add element in pipapo set, also from Stefano.
         Add test to cover this crash.
      
      5) Remove sysctl entry under mutex in hashlimit, from Cong Wang.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      574b238f
  2. 26 Feb, 2020 7 commits
    • Jonathan Lemon's avatar
      bnxt_en: add newline to netdev_*() format strings · 9a005c38
      Jonathan Lemon authored
      Add missing newlines to netdev_* format strings so the lines
      aren't buffered by the printk subsystem.
      Nitpicked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Acked-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a005c38
    • Cong Wang's avatar
      netfilter: xt_hashlimit: unregister proc file before releasing mutex · 99b79c39
      Cong Wang authored
      Before releasing the global mutex, we only unlink the hashtable
      from the hash list, its proc file is still not unregistered at
      this point. So syzbot could trigger a race condition where a
      parallel htable_create() could register the same file immediately
      after the mutex is released.
      
      Move htable_remove_proc_entry() back to mutex protection to
      fix this. And, fold htable_destroy() into htable_put() to make
      the code slightly easier to understand.
      
      Reported-and-tested-by: syzbot+d195fd3b9a364ddd6731@syzkaller.appspotmail.com
      Fixes: c4a3922d ("netfilter: xt_hashlimit: reduce hashlimit_mutex scope for htable_put()")
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      99b79c39
    • Michal Kubecek's avatar
      ethtool: limit bitset size · e34f1753
      Michal Kubecek authored
      Syzbot reported that ethnl_compact_sanity_checks() can be tricked into
      reading past the end of ETHTOOL_A_BITSET_VALUE and ETHTOOL_A_BITSET_MASK
      attributes and even the message by passing a value between (u32)(-31)
      and (u32)(-1) as ETHTOOL_A_BITSET_SIZE.
      
      The problem is that DIV_ROUND_UP(attr_nbits, 32) is 0 for such values so
      that zero length ETHTOOL_A_BITSET_VALUE will pass the length check but
      ethnl_bitmap32_not_zero() check would try to access up to 512 MB of
      attribute "payload".
      
      Prevent this overflow byt limiting the bitset size. Technically, compact
      bitset format would allow bitset sizes up to almost 2^18 (so that the
      nest size does not exceed U16_MAX) but bitsets used by ethtool are much
      shorter. S16_MAX, the largest value which can be directly used as an
      upper limit in policy, should be a reasonable compromise.
      
      Fixes: 10b518d4 ("ethtool: netlink bitset handling")
      Reported-by: syzbot+7fd4ed5b4234ab1fdccd@syzkaller.appspotmail.com
      Reported-by: syzbot+709b7a64d57978247e44@syzkaller.appspotmail.com
      Reported-by: syzbot+983cb8fb2d17a7af549d@syzkaller.appspotmail.com
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e34f1753
    • Amritha Nambiar's avatar
      net: Fix Tx hash bound checking · 6e11d157
      Amritha Nambiar authored
      Fixes the lower and upper bounds when there are multiple TCs and
      traffic is on the the same TC on the same device.
      
      The lower bound is represented by 'qoffset' and the upper limit for
      hash value is 'qcount + qoffset'. This gives a clean Rx to Tx queue
      mapping when there are multiple TCs, as the queue indices for upper TCs
      will be offset by 'qoffset'.
      
      v2: Fixed commit description based on comments.
      
      Fixes: 1b837d48 ("net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash")
      Fixes: eadec877 ("net: Add support for subordinate traffic classes to netdev_pick_tx")
      Signed-off-by: default avatarAmritha Nambiar <amritha.nambiar@intel.com>
      Reviewed-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: default avatarSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e11d157
    • Stefano Brivio's avatar
      selftests: nft_concat_range: Add test for reported add/flush/add issue · 0954df70
      Stefano Brivio authored
      Add a specific test for the crash reported by Phil Sutter and addressed
      in the previous patch. The test cases that, in my intention, should
      have covered these cases, that is, the ones from the 'concurrency'
      section, don't run these sequences tightly enough and spectacularly
      failed to catch this.
      
      While at it, define a convenient way to add these kind of tests, by
      adding a "reported issues" test section.
      
      It's more convenient, for this particular test, to execute the set
      setup in its own function. However, future test cases like this one
      might need to call setup functions, and will typically need no tools
      other than nft, so allow for this in check_tools().
      
      The original form of the reproducer used here was provided by Phil.
      Reported-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0954df70
    • Stefano Brivio's avatar
      nft_set_pipapo: Actually fetch key data in nft_pipapo_remove() · 212d58c1
      Stefano Brivio authored
      Phil reports that adding elements, flushing and re-adding them
      right away:
      
        nft add table t '{ set s { type ipv4_addr . inet_service; flags interval; }; }'
        nft add element t s '{ 10.0.0.1 . 22-25, 10.0.0.1 . 10-20 }'
        nft flush set t s
        nft add element t s '{ 10.0.0.1 . 10-20, 10.0.0.1 . 22-25 }'
      
      triggers, almost reliably, a crash like this one:
      
        [   71.319848] general protection fault, probably for non-canonical address 0x6f6b6e696c2e756e: 0000 [#1] PREEMPT SMP PTI
        [   71.321540] CPU: 3 PID: 1201 Comm: kworker/3:2 Not tainted 5.6.0-rc1-00377-g2bb07f4e #192
        [   71.322746] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014
        [   71.324430] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
        [   71.325387] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables]
        [   71.326164] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b
        [   71.328423] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282
        [   71.329225] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0
        [   71.330365] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a
        [   71.331473] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000
        [   71.332627] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2
        [   71.333615] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0
        [   71.334596] FS:  0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
        [   71.335780] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [   71.336577] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0
        [   71.337533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [   71.338557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [   71.339718] Call Trace:
        [   71.340093]  nft_pipapo_destroy+0x7a/0x170 [nf_tables_set]
        [   71.340973]  nft_set_destroy+0x20/0x50 [nf_tables]
        [   71.341879]  nf_tables_trans_destroy_work+0x246/0x260 [nf_tables]
        [   71.342916]  process_one_work+0x1d5/0x3c0
        [   71.343601]  worker_thread+0x4a/0x3c0
        [   71.344229]  kthread+0xfb/0x130
        [   71.344780]  ? process_one_work+0x3c0/0x3c0
        [   71.345477]  ? kthread_park+0x90/0x90
        [   71.346129]  ret_from_fork+0x35/0x40
        [   71.346748] Modules linked in: nf_tables_set nf_tables nfnetlink 8021q [last unloaded: nfnetlink]
        [   71.348153] ---[ end trace 2eaa8149ca759bcc ]---
        [   71.349066] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables]
        [   71.350016] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b
        [   71.350017] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282
        [   71.350019] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0
        [   71.350019] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a
        [   71.350020] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000
        [   71.350021] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2
        [   71.350022] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0
        [   71.350025] FS:  0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
        [   71.350026] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [   71.350027] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0
        [   71.350028] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [   71.350028] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [   71.350030] Kernel panic - not syncing: Fatal exception
        [   71.350412] Kernel Offset: disabled
        [   71.365922] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      which is caused by dangling elements that have been deactivated, but
      never removed.
      
      On a flush operation, nft_pipapo_walk() walks through all the elements
      in the mapping table, which are then deactivated by nft_flush_set(),
      one by one, and added to the commit list for removal. Element data is
      then freed.
      
      On transaction commit, nft_pipapo_remove() is called, and failed to
      remove these elements, leading to the stale references in the mapping.
      The first symptom of this, revealed by KASan, is a one-byte
      use-after-free in subsequent calls to nft_pipapo_walk(), which is
      usually not enough to trigger a panic. When stale elements are used
      more heavily, though, such as double-free via nft_pipapo_destroy()
      as in Phil's case, the problem becomes more noticeable.
      
      The issue comes from that fact that, on a flush operation,
      nft_pipapo_remove() won't get the actual key data via elem->key,
      elements to be deleted upon commit won't be found by the lookup via
      pipapo_get(), and removal will be skipped. Key data should be fetched
      via nft_set_ext_key(), instead.
      Reported-by: default avatarPhil Sutter <phil@nwl.cc>
      Fixes: 3c4287f6 ("nf_tables: Add set type for arbitrary concatenation of ranges")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      212d58c1
    • Pablo Neira Ayuso's avatar
      Merge branch 'master' of git://blackhole.kfki.hu/nf · 9ea4894b
      Pablo Neira Ayuso authored
      Jozsef Kadlecsik says:
      
      ====================
      ipset patches for nf
      
      The first one is larger than usual, but the issue could not be solved simpler.
      Also, it's a resend of the patch I submitted a few days ago, with a one line
      fix on top of that: the size of the comment extensions was not taken into
      account at reporting the full size of the set.
      
      - Fix "INFO: rcu detected stall in hash_xxx" reports of syzbot
        by introducing region locking and using workqueue instead of timer based
        gc of timed out entries in hash types of sets in ipset.
      - Fix the forceadd evaluation path - the bug was also uncovered by the syzbot.
      ====================
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      9ea4894b
  3. 25 Feb, 2020 2 commits
  4. 24 Feb, 2020 15 commits
    • David S. Miller's avatar
      Merge tag 'mac80211-for-net-2020-02-24' of... · 3614d05b
      David S. Miller authored
      Merge tag 'mac80211-for-net-2020-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg
      
      ====================
      A few fixes:
       * remove a double mutex-unlock
       * fix a leak in an error path
       * NULL pointer check
       * include if_vlan.h where needed
       * avoid RCU list traversal when not under RCU
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3614d05b
    • Nikolay Aleksandrov's avatar
      net: bridge: fix stale eth hdr pointer in br_dev_xmit · 823d81b0
      Nikolay Aleksandrov authored
      In br_dev_xmit() we perform vlan filtering in br_allowed_ingress() but
      if the packet has the vlan header inside (e.g. bridge with disabled
      tx-vlan-offload) then the vlan filtering code will use skb_vlan_untag()
      to extract the vid before filtering which in turn calls pskb_may_pull()
      and we may end up with a stale eth pointer. Moreover the cached eth header
      pointer will generally be wrong after that operation. Remove the eth header
      caching and just use eth_hdr() directly, the compiler does the right thing
      and calculates it only once so we don't lose anything.
      
      Fixes: 057658cb ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      823d81b0
    • David S. Miller's avatar
      Merge branch 'net-ll_temac-Bugfixes' · e4686c2d
      David S. Miller authored
      Esben Haabendal says:
      
      ====================
      net: ll_temac: Bugfixes
      
      Fix a number of bugs which have been present since the first commit.
      
      The bugs fixed in patch 1,2 and 4 have all been observed in real systems, and
      was relatively easy to reproduce given an appropriate stress setup.
      
      Changes since v1:
      
      - Changed error handling of of dma_map_single() in temac_start_xmit() to drop
        packet instead of returning NETDEV_TX_BUSY.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4686c2d
    • Esben Haabendal's avatar
      net: ll_temac: Handle DMA halt condition caused by buffer underrun · 1d63b8d6
      Esben Haabendal authored
      The SDMA engine used by TEMAC halts operation when it has finished
      processing of the last buffer descriptor in the buffer ring.
      Unfortunately, no interrupt event is generated when this happens,
      so we need to setup another mechanism to make sure DMA operation is
      restarted when enough buffers have been added to the ring.
      
      Fixes: 92744989 ("net: add Xilinx ll_temac device driver")
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d63b8d6
    • Esben Haabendal's avatar
      net: ll_temac: Fix RX buffer descriptor handling on GFP_ATOMIC pressure · 770d9c67
      Esben Haabendal authored
      Failures caused by GFP_ATOMIC memory pressure have been observed, and
      due to the missing error handling, results in kernel crash such as
      
      [1876998.350133] kernel BUG at mm/slub.c:3952!
      [1876998.350141] invalid opcode: 0000 [#1] PREEMPT SMP PTI
      [1876998.350147] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.3.0-scnxt #1
      [1876998.350150] Hardware name: N/A N/A/COMe-bIP2, BIOS CCR2R920 03/01/2017
      [1876998.350160] RIP: 0010:kfree+0x1ca/0x220
      [1876998.350164] Code: 85 db 74 49 48 8b 95 68 01 00 00 48 31 c2 48 89 10 e9 d7 fe ff ff 49 8b 04 24 a9 00 00 01 00 75 0b 49 8b 44 24 08 a8 01 75 02 <0f> 0b 49 8b 04 24 31 f6 a9 00 00 01 00 74 06 41 0f b6 74 24
       5b
      [1876998.350172] RSP: 0018:ffffc900000f0df0 EFLAGS: 00010246
      [1876998.350177] RAX: ffffea00027f0708 RBX: ffff888008d78000 RCX: 0000000000391372
      [1876998.350181] RDX: 0000000000000000 RSI: ffffe8ffffd01400 RDI: ffff888008d78000
      [1876998.350185] RBP: ffff8881185a5d00 R08: ffffc90000087dd8 R09: 000000000000280a
      [1876998.350189] R10: 0000000000000002 R11: 0000000000000000 R12: ffffea0000235e00
      [1876998.350193] R13: ffff8881185438a0 R14: 0000000000000000 R15: ffff888118543870
      [1876998.350198] FS:  0000000000000000(0000) GS:ffff88811f300000(0000) knlGS:0000000000000000
      [1876998.350203] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      s#1 Part1
      [1876998.350206] CR2: 00007f8dac7b09f0 CR3: 000000011e20a006 CR4: 00000000001606e0
      [1876998.350210] Call Trace:
      [1876998.350215]  <IRQ>
      [1876998.350224]  ? __netif_receive_skb_core+0x70a/0x920
      [1876998.350229]  kfree_skb+0x32/0xb0
      [1876998.350234]  __netif_receive_skb_core+0x70a/0x920
      [1876998.350240]  __netif_receive_skb_one_core+0x36/0x80
      [1876998.350245]  process_backlog+0x8b/0x150
      [1876998.350250]  net_rx_action+0xf7/0x340
      [1876998.350255]  __do_softirq+0x10f/0x353
      [1876998.350262]  irq_exit+0xb2/0xc0
      [1876998.350265]  do_IRQ+0x77/0xd0
      [1876998.350271]  common_interrupt+0xf/0xf
      [1876998.350274]  </IRQ>
      
      In order to handle such failures more graceful, this change splits the
      receive loop into one for consuming the received buffers, and one for
      allocating new buffers.
      
      When GFP_ATOMIC allocations fail, the receive will continue with the
      buffers that is still there, and with the expectation that the allocations
      will succeed in a later call to receive.
      
      Fixes: 92744989 ("net: add Xilinx ll_temac device driver")
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      770d9c67
    • Esben Haabendal's avatar
      net: ll_temac: Add more error handling of dma_map_single() calls · d07c849c
      Esben Haabendal authored
      This adds error handling to the remaining dma_map_single() calls, so that
      behavior is well defined if/when we run out of DMA memory.
      
      Fixes: 92744989 ("net: add Xilinx ll_temac device driver")
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d07c849c
    • Esben Haabendal's avatar
      net: ll_temac: Fix race condition causing TX hang · 84823ff8
      Esben Haabendal authored
      It is possible that the interrupt handler fires and frees up space in
      the TX ring in between checking for sufficient TX ring space and
      stopping the TX queue in temac_start_xmit. If this happens, the
      queue wake from the interrupt handler will occur before the queue is
      stopped, causing a lost wakeup and the adapter's transmit hanging.
      
      To avoid this, after stopping the queue, check again whether there is
      sufficient space in the TX ring. If so, wake up the queue again.
      
      This is a port of the similar fix in axienet driver,
      commit 7de44285 ("net: axienet: Fix race condition causing TX hang").
      
      Fixes: 23ecc4bd ("net: ll_temac: fix checksum offload logic")
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84823ff8
    • Madhuparna Bhowmik's avatar
      mac80211: rx: avoid RCU list traversal under mutex · 253216ff
      Madhuparna Bhowmik authored
      local->sta_mtx is held in __ieee80211_check_fast_rx_iface().
      No need to use list_for_each_entry_rcu() as it also requires
      a cond argument to avoid false lockdep warnings when not used in
      RCU read-side section (with CONFIG_PROVE_RCU_LIST).
      Therefore use list_for_each_entry();
      Signed-off-by: default avatarMadhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
      Link: https://lore.kernel.org/r/20200223143302.15390-1-madhuparnabhowmik10@gmail.comSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      253216ff
    • Johannes Berg's avatar
      nl80211: explicitly include if_vlan.h · e3ae39ed
      Johannes Berg authored
      We use that here, and do seem to get it through some recursive
      include, but better include it explicitly.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Link: https://lore.kernel.org/r/20200224093814.1b9c258fec67.I45ac150d4e11c72eb263abec9f1f0c7add9bef2b@changeidSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      e3ae39ed
    • Madhuparna Bhowmik's avatar
      net: core: devlink.c: Hold devlink->lock from the beginning of devlink_dpipe_table_register() · 6132c1d9
      Madhuparna Bhowmik authored
      devlink_dpipe_table_find() should be called under either
      rcu_read_lock() or devlink->lock. devlink_dpipe_table_register()
      calls devlink_dpipe_table_find() without holding the lock
      and acquires it later. Therefore hold the devlink->lock
      from the beginning of devlink_dpipe_table_register().
      Suggested-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarMadhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6132c1d9
    • Florian Fainelli's avatar
      net: phy: Avoid multiple suspends · 503ba7c6
      Florian Fainelli authored
      It is currently possible for a PHY device to be suspended as part of a
      network device driver's suspend call while it is still being attached to
      that net_device, either via phy_suspend() or implicitly via phy_stop().
      
      Later on, when the MDIO bus controller get suspended, we would attempt
      to suspend again the PHY because it is still attached to a network
      device.
      
      This is both a waste of time and creates an opportunity for improper
      clock/power management bugs to creep in.
      
      Fixes: 803dd9c7 ("net: phy: avoid suspending twice a PHY")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      503ba7c6
    • Marek Vasut's avatar
      net: ks8851-ml: Fix IRQ handling and locking · 44343418
      Marek Vasut authored
      The KS8851 requires that packet RX and TX are mutually exclusive.
      Currently, the driver hopes to achieve this by disabling interrupt
      from the card by writing the card registers and by disabling the
      interrupt on the interrupt controller. This however is racy on SMP.
      
      Replace this approach by expanding the spinlock used around the
      ks_start_xmit() TX path to ks_irq() RX path to assure true mutual
      exclusion and remove the interrupt enabling/disabling, which is
      now not needed anymore. Furthermore, disable interrupts also in
      ks_net_stop(), which was missing before.
      
      Note that a massive improvement here would be to re-use the KS8851
      driver approach, which is to move the TX path into a worker thread,
      interrupt handling to threaded interrupt, and synchronize everything
      with mutexes, but that would be a much bigger rework, for a separate
      patch.
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Lukas Wunner <lukas@wunner.de>
      Cc: Petr Stetiar <ynezz@true.cz>
      Cc: YueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44343418
    • Jonathan Neuschäfer's avatar
      docs: networking: phy: Rephrase paragraph for clarity · 52df1e56
      Jonathan Neuschäfer authored
      Let's make it a little easier to read.
      Signed-off-by: default avatarJonathan Neuschäfer <j.neuschaefer@gmx.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52df1e56
    • Neal Cardwell's avatar
      tcp: fix TFO SYNACK undo to avoid double-timestamp-undo · dad8cea7
      Neal Cardwell authored
      In a rare corner case the new logic for undo of SYNACK RTO could
      result in triggering the warning in tcp_fastretrans_alert() that says:
              WARN_ON(tp->retrans_out != 0);
      
      The warning looked like:
      
      WARNING: CPU: 1 PID: 1 at net/ipv4/tcp_input.c:2818 tcp_ack+0x13e0/0x3270
      
      The sequence that tickles this bug is:
       - Fast Open server receives TFO SYN with data, sends SYNACK
       - (client receives SYNACK and sends ACK, but ACK is lost)
       - server app sends some data packets
       - (N of the first data packets are lost)
       - server receives client ACK that has a TS ECR matching first SYNACK,
         and also SACKs suggesting the first N data packets were lost
          - server performs TS undo of SYNACK RTO, then immediately
            enters recovery
          - buggy behavior then performed a *second* undo that caused
            the connection to be in CA_Open with retrans_out != 0
      
      Basically, the incoming ACK packet with SACK blocks causes us to first
      undo the cwnd reduction from the SYNACK RTO, but then immediately
      enters fast recovery, which then makes us eligible for undo again. And
      then tcp_rcv_synrecv_state_fastopen() accidentally performs an undo
      using a "mash-up" of state from two different loss recovery phases: it
      uses the timestamp info from the ACK of the original SYNACK, and the
      undo_marker from the fast recovery.
      
      This fix refines the logic to only invoke the tcp_try_undo_loss()
      inside tcp_rcv_synrecv_state_fastopen() if the connection is still in
      CA_Loss.  If peer SACKs triggered fast recovery, then
      tcp_rcv_synrecv_state_fastopen() can't safely undo.
      
      Fixes: 794200d6 ("tcp: undo cwnd on Fast Open spurious SYNACK retransmit")
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dad8cea7
    • Haiyang Zhang's avatar
      hv_netvsc: Fix unwanted wakeup in netvsc_attach() · f6f13c12
      Haiyang Zhang authored
      When netvsc_attach() is called by operations like changing MTU, etc.,
      an extra wakeup may happen while netvsc_attach() calling
      rndis_filter_device_add() which sends rndis messages when queue is
      stopped in netvsc_detach(). The completion message will wake up queue 0.
      
      We can reproduce the issue by changing MTU etc., then the wake_queue
      counter from "ethtool -S" will increase beyond stop_queue counter:
           stop_queue: 0
           wake_queue: 1
      The issue causes queue wake up, and counter increment, no other ill
      effects in current code. So we didn't see any network problem for now.
      
      To fix this, initialize tx_disable to true, and set it to false when
      the NIC is ready to be attached or registered.
      
      Fixes: 7b2ee50c ("hv_netvsc: common detach logic")
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6f13c12