1. 14 May, 2020 7 commits
    • Eric Dumazet's avatar
      tcp: fix error recovery in tcp_zerocopy_receive() · e776af60
      Eric Dumazet authored
      If user provides wrong virtual address in TCP_ZEROCOPY_RECEIVE
      operation we want to return -EINVAL error.
      
      But depending on zc->recv_skip_hint content, we might return
      -EIO error if the socket has SOCK_DONE set.
      
      Make sure to return -EINVAL in this case.
      
      BUG: KMSAN: uninit-value in tcp_zerocopy_receive net/ipv4/tcp.c:1833 [inline]
      BUG: KMSAN: uninit-value in do_tcp_getsockopt+0x4494/0x6320 net/ipv4/tcp.c:3685
      CPU: 1 PID: 625 Comm: syz-executor.0 Not tainted 5.7.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x220 lib/dump_stack.c:118
       kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:121
       __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
       tcp_zerocopy_receive net/ipv4/tcp.c:1833 [inline]
       do_tcp_getsockopt+0x4494/0x6320 net/ipv4/tcp.c:3685
       tcp_getsockopt+0xf8/0x1f0 net/ipv4/tcp.c:3728
       sock_common_getsockopt+0x13f/0x180 net/core/sock.c:3131
       __sys_getsockopt+0x533/0x7b0 net/socket.c:2177
       __do_sys_getsockopt net/socket.c:2192 [inline]
       __se_sys_getsockopt+0xe1/0x100 net/socket.c:2189
       __x64_sys_getsockopt+0x62/0x80 net/socket.c:2189
       do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:297
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x45c829
      Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f1deeb72c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000037
      RAX: ffffffffffffffda RBX: 00000000004e01e0 RCX: 000000000045c829
      RDX: 0000000000000023 RSI: 0000000000000006 RDI: 0000000000000009
      RBP: 000000000078bf00 R08: 0000000020000200 R09: 0000000000000000
      R10: 00000000200001c0 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000000001d8 R14: 00000000004d3038 R15: 00007f1deeb736d4
      
      Local variable ----zc@do_tcp_getsockopt created at:
       do_tcp_getsockopt+0x1a74/0x6320 net/ipv4/tcp.c:3670
       do_tcp_getsockopt+0x1a74/0x6320 net/ipv4/tcp.c:3670
      
      Fixes: 05255b82 ("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e776af60
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 1b54f4fa
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Fix gcc-10 compilation warning in nf_conntrack, from Arnd Bergmann.
      
      2) Add NF_FLOW_HW_PENDING to avoid races between stats and deletion
         commands, from Paul Blakey.
      
      3) Remove WQ_MEM_RECLAIM from the offload workqueue, from Roi Dayan.
      
      4) Infinite loop when removing nf_conntrack module, from Florian Westphal.
      
      5) Set NF_FLOW_TEARDOWN bit on expiration to avoid races when refreshing
         the timeout from the software path.
      
      6) Missing nft_set_elem_expired() check in the rbtree, from Phil Sutter.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b54f4fa
    • David S. Miller's avatar
      c9e2053d
    • Ursula Braun's avatar
      MAINTAINERS: another add of Karsten Graul for S390 networking · 865e525d
      Ursula Braun authored
      Complete adding of Karsten as maintainer for all S390 networking
      parts in the kernel.
      
      Cc: Julian Wiedmann <jwi@linux.ibm.com>
      Acked-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      865e525d
    • Wang Wenhu's avatar
      drivers: ipa: fix typos for ipa_smp2p structure doc · 16bb1b50
      Wang Wenhu authored
      Remove the duplicate "mutex", and change "Motex" to "Mutex". Also I
      recommend it's easier for understanding to make the "ready-interrupt"
      a bundle for it is a parallel description as "shutdown" which is appended
      after the slash.
      Signed-off-by: default avatarWang Wenhu <wenhu.wang@vivo.com>
      Cc: Alex Elder <elder@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16bb1b50
    • Guillaume Nault's avatar
      pppoe: only process PADT targeted at local interfaces · b8c15839
      Guillaume Nault authored
      We don't want to disconnect a session because of a stray PADT arriving
      while the interface is in promiscuous mode.
      Furthermore, multicast and broadcast packets make no sense here, so
      only PACKET_HOST is accepted.
      Reported-by: default avatarDavid Balažic <xerces9@gmail.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8c15839
    • Vinod Koul's avatar
      net: stmmac: fix num_por initialization · fd4a5177
      Vinod Koul authored
      Driver missed initializing num_por which is one of the por values that
      driver configures to hardware. In order to get these values, add a new
      structure ethqos_emac_driver_data which holds por and num_por values
      and populate that in driver probe.
      
      Fixes: a7c30e62 ("net: stmmac: Add driver for Qualcomm ethqos")
      Reported-by: default avatarRahul Ankushrao Kawadgave <rahulak@qti.qualcomm.com>
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Reviewed-by: default avatarAmit Kucheria <amit.kucheria@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd4a5177
  2. 13 May, 2020 10 commits
  3. 12 May, 2020 9 commits
  4. 11 May, 2020 7 commits
    • David S. Miller's avatar
      Merge branch 'net-ipa-fix-cleanup-after-modem-crash' · 1abfb181
      David S. Miller authored
      Alex Elder says:
      
      ====================
      net: ipa: fix cleanup after modem crash
      
      The first patch in this series fixes a bug where the size of a data
      transfer request was never set, meaning it was 0.  The consequence
      of this was that such a transfer request would never complete if
      attempted, and led to a hung task timeout.
      
      This data transfer is required for cleaning up IPA hardware state
      when recovering from a modem crash.  The code to implement this
      cleanup is already present, but its use was commented out because
      it hit the bug described above.  So the second patch in this series
      enables the use of that "tag process" cleanup code.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1abfb181
    • Alex Elder's avatar
      net: ipa: use tag process on modem crash · 2c4bb809
      Alex Elder authored
      One part of recovering from a modem crash is performing a "tag
      sequence" of several IPA immediate commands, to clear the hardware
      pipeline.  The sequence ends with a data transfer request on the
      command endpoint (which is not otherwise done).  Unfortunately,
      attempting to do the data transfer led to a hang, so that request
      plus two other commands were commented out.
      
      The previous commit fixes the bug that was causing that hang.  And
      with that bug fixed we can properly issue the tag sequence when the
      modem crashes, to return the hardware to a known state.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c4bb809
    • Alex Elder's avatar
      net: ipa: set DMA length in gsi_trans_cmd_add() · c781e1d4
      Alex Elder authored
      When a command gets added to a transaction for the AP->command
      channel we set the DMA address of its scatterlist entry, but not
      its DMA length.  Fix this bug.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c781e1d4
    • Florian Westphal's avatar
      netfilter: conntrack: fix infinite loop on rmmod · 54ab49fd
      Florian Westphal authored
      'rmmod nf_conntrack' can hang forever, because the netns exit
      gets stuck in nf_conntrack_cleanup_net_list():
      
      i_see_dead_people:
       busy = 0;
       list_for_each_entry(net, net_exit_list, exit_list) {
        nf_ct_iterate_cleanup(kill_all, net, 0, 0);
        if (atomic_read(&net->ct.count) != 0)
         busy = 1;
       }
       if (busy) {
        schedule();
        goto i_see_dead_people;
       }
      
      When nf_ct_iterate_cleanup iterates the conntrack table, all nf_conn
      structures can be found twice:
      once for the original tuple and once for the conntracks reply tuple.
      
      get_next_corpse() only calls the iterator when the entry is
      in original direction -- the idea was to avoid unneeded invocations
      of the iterator callback.
      
      When support for clashing entries was added, the assumption that
      all nf_conn objects are added twice, once in original, once for reply
      tuple no longer holds -- NF_CLASH_BIT entries are only added in
      the non-clashing reply direction.
      
      Thus, if at least one NF_CLASH entry is in the list then
      nf_conntrack_cleanup_net_list() always skips it completely.
      
      During normal netns destruction, this causes a hang of several
      seconds, until the gc worker removes the entry (NF_CLASH entries
      always have a 1 second timeout).
      
      But in the rmmod case, the gc worker has already been stopped, so
      ct.count never becomes 0.
      
      We can fix this in two ways:
      
      1. Add a second test for CLASH_BIT and call iterator for those
         entries as well, or:
      2. Skip the original tuple direction and use the reply tuple.
      
      2) is simpler, so do that.
      
      Fixes: 6a757c07 ("netfilter: conntrack: allow insertion of clashing entries")
      Reported-by: default avatarChen Yi <yiche@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      54ab49fd
    • Roi Dayan's avatar
      netfilter: flowtable: Remove WQ_MEM_RECLAIM from workqueue · 1d10da0e
      Roi Dayan authored
      This workqueue is in charge of handling offloaded flow tasks like
      add/del/stats we should not use WQ_MEM_RECLAIM flag.
      The flag can result in the following warning.
      
      [  485.557189] ------------[ cut here ]------------
      [  485.562976] workqueue: WQ_MEM_RECLAIM nf_flow_table_offload:flow_offload_worr
      [  485.562985] WARNING: CPU: 7 PID: 3731 at kernel/workqueue.c:2610 check_flush0
      [  485.590191] Kernel panic - not syncing: panic_on_warn set ...
      [  485.597100] CPU: 7 PID: 3731 Comm: kworker/u112:8 Not tainted 5.7.0-rc1.21802
      [  485.606629] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/177
      [  485.615487] Workqueue: nf_flow_table_offload flow_offload_work_handler [nf_f]
      [  485.624834] Call Trace:
      [  485.628077]  dump_stack+0x50/0x70
      [  485.632280]  panic+0xfb/0x2d7
      [  485.636083]  ? check_flush_dependency+0x110/0x130
      [  485.641830]  __warn.cold.12+0x20/0x2a
      [  485.646405]  ? check_flush_dependency+0x110/0x130
      [  485.652154]  ? check_flush_dependency+0x110/0x130
      [  485.657900]  report_bug+0xb8/0x100
      [  485.662187]  ? sched_clock_cpu+0xc/0xb0
      [  485.666974]  do_error_trap+0x9f/0xc0
      [  485.671464]  do_invalid_op+0x36/0x40
      [  485.675950]  ? check_flush_dependency+0x110/0x130
      [  485.681699]  invalid_op+0x28/0x30
      
      Fixes: 7da182a9 ("netfilter: flowtable: Use work entry per offload command")
      Reported-by: default avatarMarcelo Ricardo Leitner <mleitner@redhat.com>
      Signed-off-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarPaul Blakey <paulb@mellanox.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1d10da0e
    • Paul Blakey's avatar
      netfilter: flowtable: Add pending bit for offload work · 2c889795
      Paul Blakey authored
      Gc step can queue offloaded flow del work or stats work.
      Those work items can race each other and a flow could be freed
      before the stats work is executed and querying it.
      To avoid that, add a pending bit that if a work exists for a flow
      don't queue another work for it.
      This will also avoid adding multiple stats works in case stats work
      didn't complete but gc step started again.
      Signed-off-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2c889795
    • Luo bin's avatar
      hinic: fix a bug of ndo_stop · e8a1b0ef
      Luo bin authored
      if some function in ndo_stop interface returns failure because of
      hardware fault, must go on excuting rest steps rather than return
      failure directly, otherwise will cause memory leak.And bump the
      timeout for SET_FUNC_STATE to ensure that cmd won't return failure
      when hw is busy. Otherwise hw may stomp host memory if we free
      memory regardless of the return value of SET_FUNC_STATE.
      
      Fixes: 51ba902a ("net-next/hinic: Initialize hw interface")
      Signed-off-by: default avatarLuo bin <luobin9@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e8a1b0ef
  5. 10 May, 2020 3 commits
    • Arnd Bergmann's avatar
      netfilter: conntrack: avoid gcc-10 zero-length-bounds warning · 2c407aca
      Arnd Bergmann authored
      gcc-10 warns around a suspicious access to an empty struct member:
      
      net/netfilter/nf_conntrack_core.c: In function '__nf_conntrack_alloc':
      net/netfilter/nf_conntrack_core.c:1522:9: warning: array subscript 0 is outside the bounds of an interior zero-length array 'u8[0]' {aka 'unsigned char[0]'} [-Wzero-length-bounds]
       1522 |  memset(&ct->__nfct_init_offset[0], 0,
            |         ^~~~~~~~~~~~~~~~~~~~~~~~~~
      In file included from net/netfilter/nf_conntrack_core.c:37:
      include/net/netfilter/nf_conntrack.h:90:5: note: while referencing '__nfct_init_offset'
         90 |  u8 __nfct_init_offset[0];
            |     ^~~~~~~~~~~~~~~~~~
      
      The code is correct but a bit unusual. Rework it slightly in a way that
      does not trigger the warning, using an empty struct instead of an empty
      array. There are probably more elegant ways to do this, but this is the
      smallest change.
      
      Fixes: c41884ce ("netfilter: conntrack: avoid zeroing timer")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2c407aca
    • Florian Fainelli's avatar
      net: dsa: loop: Add module soft dependency · 3047211c
      Florian Fainelli authored
      There is a soft dependency against dsa_loop_bdinfo.ko which sets up the
      MDIO device registration, since there are no symbols referenced by
      dsa_loop.ko, there is no automatic loading of dsa_loop_bdinfo.ko which
      is needed.
      
      Fixes: 98cd1552 ("net: dsa: Mock-up driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3047211c
    • Zefan Li's avatar
      netprio_cgroup: Fix unlimited memory leak of v2 cgroups · 090e28b2
      Zefan Li authored
      If systemd is configured to use hybrid mode which enables the use of
      both cgroup v1 and v2, systemd will create new cgroup on both the default
      root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach
      task to the two cgroups. If the task does some network thing then the v2
      cgroup can never be freed after the session exited.
      
      One of our machines ran into OOM due to this memory leak.
      
      In the scenario described above when sk_alloc() is called
      cgroup_sk_alloc() thought it's in v2 mode, so it stores
      the cgroup pointer in sk->sk_cgrp_data and increments
      the cgroup refcnt, but then sock_update_netprioidx()
      thought it's in v1 mode, so it stores netprioidx value
      in sk->sk_cgrp_data, so the cgroup refcnt will never be freed.
      
      Currently we do the mode switch when someone writes to the ifpriomap
      cgroup control file. The easiest fix is to also do the switch when
      a task is attached to a new cgroup.
      
      Fixes: bd1060a1 ("sock, cgroup: add sock->sk_cgroup")
      Reported-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Tested-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      090e28b2
  6. 09 May, 2020 4 commits
    • Arnd Bergmann's avatar
      net: freescale: select CONFIG_FIXED_PHY where needed · 99352c79
      Arnd Bergmann authored
      I ran into a randconfig build failure with CONFIG_FIXED_PHY=m
      and CONFIG_GIANFAR=y:
      
      x86_64-linux-ld: drivers/net/ethernet/freescale/gianfar.o:(.rodata+0x418): undefined reference to `fixed_phy_change_carrier'
      
      It seems the same thing can happen with dpaa and ucc_geth, so change
      all three to do an explicit 'select FIXED_PHY'.
      
      The fixed-phy driver actually has an alternative stub function that
      theoretically allows building network drivers when fixed-phy is
      disabled, but I don't see how that would help here, as the drivers
      presumably would not work then.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      99352c79
    • Paolo Abeni's avatar
      net: ipv4: really enforce backoff for redirects · 57644431
      Paolo Abeni authored
      In commit b406472b ("net: ipv4: avoid mixed n_redirects and
      rate_tokens usage") I missed the fact that a 0 'rate_tokens' will
      bypass the backoff algorithm.
      
      Since rate_tokens is cleared after a redirect silence, and never
      incremented on redirects, if the host keeps receiving packets
      requiring redirect it will reply ignoring the backoff.
      
      Additionally, the 'rate_last' field will be updated with the
      cadence of the ingress packet requiring redirect. If that rate is
      high enough, that will prevent the host from generating any
      other kind of ICMP messages
      
      The check for a zero 'rate_tokens' value was likely a shortcut
      to avoid the more complex backoff algorithm after a redirect
      silence period. Address the issue checking for 'n_redirects'
      instead, which is incremented on successful redirect, and
      does not interfere with other ICMP replies.
      
      Fixes: b406472b ("net: ipv4: avoid mixed n_redirects and rate_tokens usage")
      Reported-and-tested-by: default avatarColin Walters <walters@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      57644431
    • Wei Yongjun's avatar
      octeontx2-vf: Fix error return code in otx2vf_probe() · 9302bead
      Wei Yongjun authored
      Fix to return negative error code -ENOMEM from the alloc failed error
      handling case instead of 0, as done elsewhere in this function.
      
      Fixes: 3184fb5b ("octeontx2-vf: Virtual function driver support")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9302bead
    • Vincent Minet's avatar
      umh: fix memory leak on execve failure · db803036
      Vincent Minet authored
      If a UMH process created by fork_usermode_blob() fails to execute,
      a pair of struct file allocated by umh_pipe_setup() will leak.
      
      Under normal conditions, the caller (like bpfilter) needs to manage the
      lifetime of the UMH and its two pipes. But when fork_usermode_blob()
      fails, the caller doesn't really have a way to know what needs to be
      done. It seems better to do the cleanup ourselves in this case.
      
      Fixes: 449325b5 ("umh: introduce fork_usermode_blob() helper")
      Signed-off-by: default avatarVincent Minet <v.minet@criteo.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      db803036