1. 16 Jan, 2022 1 commit
    • Eric Dumazet's avatar
      ipv4: update fib_info_cnt under spinlock protection · 0a6e6b3c
      Eric Dumazet authored
      In the past, free_fib_info() was supposed to be called
      under RTNL protection.
      
      This eventually was no longer the case.
      
      Instead of enforcing RTNL it seems we simply can
      move fib_info_cnt changes to occur when fib_info_lock
      is held.
      
      v2: David Laight suggested to update fib_info_cnt
      only when an entry is added/deleted to/from the hash table,
      as fib_info_cnt is used to make sure hash table size
      is optimal.
      
      BUG: KCSAN: data-race in fib_create_info / free_fib_info
      
      write to 0xffffffff86e243a0 of 4 bytes by task 26429 on cpu 0:
       fib_create_info+0xe78/0x3440 net/ipv4/fib_semantics.c:1428
       fib_table_insert+0x148/0x10c0 net/ipv4/fib_trie.c:1224
       fib_magic+0x195/0x1e0 net/ipv4/fib_frontend.c:1087
       fib_add_ifaddr+0xd0/0x2e0 net/ipv4/fib_frontend.c:1109
       fib_netdev_event+0x178/0x510 net/ipv4/fib_frontend.c:1466
       notifier_call_chain kernel/notifier.c:83 [inline]
       raw_notifier_call_chain+0x53/0xb0 kernel/notifier.c:391
       __dev_notify_flags+0x1d3/0x3b0
       dev_change_flags+0xa2/0xc0 net/core/dev.c:8872
       do_setlink+0x810/0x2410 net/core/rtnetlink.c:2719
       rtnl_group_changelink net/core/rtnetlink.c:3242 [inline]
       __rtnl_newlink net/core/rtnetlink.c:3396 [inline]
       rtnl_newlink+0xb10/0x13b0 net/core/rtnetlink.c:3506
       rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5571
       netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2496
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5589
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x5fc/0x6c0 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x726/0x840 net/netlink/af_netlink.c:1921
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg net/socket.c:724 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2409
       ___sys_sendmsg net/socket.c:2463 [inline]
       __sys_sendmsg+0x195/0x230 net/socket.c:2492
       __do_sys_sendmsg net/socket.c:2501 [inline]
       __se_sys_sendmsg net/socket.c:2499 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2499
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffffffff86e243a0 of 4 bytes by task 31505 on cpu 1:
       free_fib_info+0x35/0x80 net/ipv4/fib_semantics.c:252
       fib_info_put include/net/ip_fib.h:575 [inline]
       nsim_fib4_rt_destroy drivers/net/netdevsim/fib.c:294 [inline]
       nsim_fib4_rt_replace drivers/net/netdevsim/fib.c:403 [inline]
       nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:431 [inline]
       nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
       nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
       nsim_fib_event_work+0x15ca/0x2cf0 drivers/net/netdevsim/fib.c:1477
       process_one_work+0x3fc/0x980 kernel/workqueue.c:2298
       process_scheduled_works kernel/workqueue.c:2361 [inline]
       worker_thread+0x7df/0xa70 kernel/workqueue.c:2447
       kthread+0x2c7/0x2e0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00000d2d -> 0x00000d2e
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 31505 Comm: kworker/1:21 Not tainted 5.16.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events nsim_fib_event_work
      
      Fixes: 48bb9eb4 ("netdevsim: fib: Add dummy implementation for FIB offload")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: Ido Schimmel <idosch@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a6e6b3c
  2. 15 Jan, 2022 7 commits
    • Wen Gu's avatar
      net/smc: Remove unused function declaration · 9404bc1e
      Wen Gu authored
      The declaration of smc_wr_tx_dismiss_slots() is unused.
      So remove it.
      
      Fixes: 349d4312 ("net/smc: fix kernel panic caused by race of smc_sock")
      Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Reviewed-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9404bc1e
    • Slark Xiao's avatar
      net: wwan: Fix MRU mismatch issue which may lead to data connection lost · f542cdfa
      Slark Xiao authored
      In pci_generic.c there is a 'mru_default' in struct mhi_pci_dev_info.
      This value shall be used for whole mhi if it's given a value for a specific product.
      But in function mhi_net_rx_refill_work(), it's still using hard code value MHI_DEFAULT_MRU.
      'mru_default' shall have higher priority than MHI_DEFAULT_MRU.
      And after checking, this change could help fix a data connection lost issue.
      
      Fixes: 5c2c8531 ("bus: mhi: pci-generic: configurable network interface MRU")
      Signed-off-by: default avatarShujun Wang <wsj20369@163.com>
      Signed-off-by: default avatarSlark Xiao <slark_xiao@163.com>
      Reviewed-by: default avatarLoic Poulain <loic.poulain@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f542cdfa
    • Mohammad Athari Bin Ismail's avatar
      net: phy: marvell: add Marvell specific PHY loopback · 020a45af
      Mohammad Athari Bin Ismail authored
      Existing genphy_loopback() is not applicable for Marvell PHY. Besides
      configuring bit-6 and bit-13 in Page 0 Register 0 (Copper Control
      Register), it is also required to configure same bits  in Page 2
      Register 21 (MAC Specific Control Register 2) according to speed of
      the loopback is operating.
      
      Tested working on Marvell88E1510 PHY for all speeds (1000/100/10Mbps).
      
      FIXME: Based on trial and error test, it seem 1G need to have delay between
      soft reset and loopback enablement.
      
      Fixes: 014068dc ("net: phy: genphy_loopback: add link speed configuration")
      Cc: <stable@vger.kernel.org> # 5.15.x
      Signed-off-by: default avatarMohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      020a45af
    • Christophe JAILLET's avatar
      net: ethernet: sun4i-emac: Fix an error handling path in emac_probe() · 9a9acdcc
      Christophe JAILLET authored
      A dma_request_chan() call is hidden in emac_configure_dma().
      It must be released in the probe if an error occurs, as already done in
      the remove function.
      
      Add the corresponding dma_release_channel() call.
      
      Fixes: 47869e82 ("sun4i-emac.c: add dma support")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a9acdcc
    • Tom Rix's avatar
      net: ethernet: mtk_eth_soc: fix error checking in mtk_mac_config() · 214b3369
      Tom Rix authored
      Clang static analysis reports this problem
      mtk_eth_soc.c:394:7: warning: Branch condition evaluates
        to a garbage value
                      if (err)
                          ^~~
      
      err is not initialized and only conditionally set.
      So intitialize err.
      
      Fixes: 7e538372 ("net: ethernet: mediatek: Re-add support SGMII")
      Signed-off-by: default avatarTom Rix <trix@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      214b3369
    • Vladimir Oltean's avatar
      net: mscc: ocelot: don't dereference NULL pointers with shared tc filters · 80f15f3b
      Vladimir Oltean authored
      The following command sequence:
      
      tc qdisc del dev swp0 clsact
      tc qdisc add dev swp0 ingress_block 1 clsact
      tc qdisc add dev swp1 ingress_block 1 clsact
      tc filter add block 1 flower action drop
      tc qdisc del dev swp0 clsact
      
      produces the following NPD:
      
      Unable to handle kernel NULL pointer dereference at virtual address 0000000000000014
      pc : vcap_entry_set+0x14/0x70
      lr : ocelot_vcap_filter_del+0x198/0x234
      Call trace:
       vcap_entry_set+0x14/0x70
       ocelot_vcap_filter_del+0x198/0x234
       ocelot_cls_flower_destroy+0x94/0xe4
       felix_cls_flower_del+0x70/0x84
       dsa_slave_setup_tc_block_cb+0x13c/0x60c
       dsa_slave_setup_tc_block_cb_ig+0x20/0x30
       tc_setup_cb_reoffload+0x44/0x120
       fl_reoffload+0x280/0x320
       tcf_block_playback_offloads+0x6c/0x184
       tcf_block_unbind+0x80/0xe0
       tcf_block_setup+0x174/0x214
       tcf_block_offload_cmd.isra.0+0x100/0x13c
       tcf_block_offload_unbind+0x5c/0xa0
       __tcf_block_put+0x54/0x174
       tcf_block_put_ext+0x5c/0x74
       clsact_destroy+0x40/0x60
       qdisc_destroy+0x4c/0x150
       qdisc_put+0x70/0x90
       qdisc_graft+0x3f0/0x4c0
       tc_get_qdisc+0x1cc/0x364
       rtnetlink_rcv_msg+0x124/0x340
      
      The reason is that the driver isn't prepared to receive two tc filters
      with the same cookie. It unconditionally creates a new struct
      ocelot_vcap_filter for each tc filter, and it adds all filters with the
      same identifier (cookie) to the ocelot_vcap_block.
      
      The problem is here, in ocelot_vcap_filter_del():
      
      	/* Gets index of the filter */
      	index = ocelot_vcap_block_get_filter_index(block, filter);
      	if (index < 0)
      		return index;
      
      	/* Delete filter */
      	ocelot_vcap_block_remove_filter(ocelot, block, filter);
      
      	/* Move up all the blocks over the deleted filter */
      	for (i = index; i < block->count; i++) {
      		struct ocelot_vcap_filter *tmp;
      
      		tmp = ocelot_vcap_block_find_filter_by_index(block, i);
      		vcap_entry_set(ocelot, i, tmp);
      	}
      
      what will happen is ocelot_vcap_block_get_filter_index() will return the
      index (@index) of the first filter found with that cookie. This is _not_
      the index of _this_ filter, but the other one with the same cookie,
      because ocelot_vcap_filter_equal() gets fooled.
      
      Then later, ocelot_vcap_block_remove_filter() is coded to remove all
      filters that are ocelot_vcap_filter_equal() with the passed @filter.
      So unexpectedly, both filters get deleted from the list.
      
      Then ocelot_vcap_filter_del() will attempt to move all the other filters
      up, again finding them by index (@i). The block count is 2, @index was 0,
      so it will attempt to move up filter @i=0 and @i=1. It assigns tmp =
      ocelot_vcap_block_find_filter_by_index(block, i), which is now a NULL
      pointer because ocelot_vcap_block_remove_filter() has removed more than
      one filter.
      
      As far as I can see, this problem has been there since the introduction
      of tc offload support, however I cannot test beyond the blamed commit
      due to hardware availability. In any case, any fix cannot be backported
      that far, due to lots of changes to the code base.
      
      Therefore, let's go for the correct solution, which is to not call
      ocelot_vcap_filter_add() and ocelot_vcap_filter_del(), unless the filter
      is actually unique and not shared. For the shared filters, we should
      just modify the ingress port mask and call ocelot_vcap_filter_replace(),
      a function introduced by commit 95706be1 ("net: mscc: ocelot: create
      a function that replaces an existing VCAP filter"). This way,
      block->rules will only contain filters with unique cookies, by design.
      
      Fixes: 07d985ee ("net: dsa: felix: Wire up the ocelot cls_flower methods")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80f15f3b
    • Eric Dumazet's avatar
      af_unix: annote lockless accesses to unix_tot_inflight & gc_in_progress · 9d6d7f1c
      Eric Dumazet authored
      wait_for_unix_gc() reads unix_tot_inflight & gc_in_progress
      without synchronization.
      
      Adds READ_ONCE()/WRITE_ONCE() and their associated comments
      to better document the intent.
      
      BUG: KCSAN: data-race in unix_inflight / wait_for_unix_gc
      
      write to 0xffffffff86e2b7c0 of 4 bytes by task 9380 on cpu 0:
       unix_inflight+0x1e8/0x260 net/unix/scm.c:63
       unix_attach_fds+0x10c/0x1e0 net/unix/scm.c:121
       unix_scm_to_skb net/unix/af_unix.c:1674 [inline]
       unix_dgram_sendmsg+0x679/0x16b0 net/unix/af_unix.c:1817
       unix_seqpacket_sendmsg+0xcc/0x110 net/unix/af_unix.c:2258
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg net/socket.c:724 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2409
       ___sys_sendmsg net/socket.c:2463 [inline]
       __sys_sendmmsg+0x267/0x4c0 net/socket.c:2549
       __do_sys_sendmmsg net/socket.c:2578 [inline]
       __se_sys_sendmmsg net/socket.c:2575 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2575
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffffffff86e2b7c0 of 4 bytes by task 9375 on cpu 1:
       wait_for_unix_gc+0x24/0x160 net/unix/garbage.c:196
       unix_dgram_sendmsg+0x8e/0x16b0 net/unix/af_unix.c:1772
       unix_seqpacket_sendmsg+0xcc/0x110 net/unix/af_unix.c:2258
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg net/socket.c:724 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2409
       ___sys_sendmsg net/socket.c:2463 [inline]
       __sys_sendmmsg+0x267/0x4c0 net/socket.c:2549
       __do_sys_sendmmsg net/socket.c:2578 [inline]
       __se_sys_sendmmsg net/socket.c:2575 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2575
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00000002 -> 0x00000004
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 9375 Comm: syz-executor.1 Not tainted 5.16.0-rc7-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 9915672d ("af_unix: limit unix_tot_inflight")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220114164328.2038499-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9d6d7f1c
  3. 14 Jan, 2022 7 commits
  4. 13 Jan, 2022 12 commits
    • Kevin Bracey's avatar
      net_sched: restore "mpu xxx" handling · fb80445c
      Kevin Bracey authored
      commit 56b765b7 ("htb: improved accuracy at high rates") broke
      "overhead X", "linklayer atm" and "mpu X" attributes.
      
      "overhead X" and "linklayer atm" have already been fixed. This restores
      the "mpu X" handling, as might be used by DOCSIS or Ethernet shaping:
      
          tc class add ... htb rate X overhead 4 mpu 64
      
      The code being fixed is used by htb, tbf and act_police. Cake has its
      own mpu handling. qdisc_calculate_pkt_len still uses the size table
      containing values adjusted for mpu by user space.
      
      iproute2 tc has always passed mpu into the kernel via a tc_ratespec
      structure, but the kernel never directly acted on it, merely stored it
      so that it could be read back by `tc class show`.
      
      Rather, tc would generate length-to-time tables that included the mpu
      (and linklayer) in their construction, and the kernel used those tables.
      
      Since v3.7, the tables were no longer used. Along with "mpu", this also
      broke "overhead" and "linklayer" which were fixed in 01cb71d2
      ("net_sched: restore "overhead xxx" handling", v3.10) and 8a8e3d84
      ("net_sched: restore "linklayer atm" handling", v3.11).
      
      "overhead" was fixed by simply restoring use of tc_ratespec::overhead -
      this had originally been used by the kernel but was initially omitted
      from the new non-table-based calculations.
      
      "linklayer" had been handled in the table like "mpu", but the mode was
      not originally passed in tc_ratespec. The new implementation was made to
      handle it by getting new versions of tc to pass the mode in an extended
      tc_ratespec, and for older versions of tc the table contents were analysed
      at load time to deduce linklayer.
      
      As "mpu" has always been given to the kernel in tc_ratespec,
      accompanying the mpu-based table, we can restore system functionality
      with no userspace change by making the kernel act on the tc_ratespec
      value.
      
      Fixes: 56b765b7 ("htb: improved accuracy at high rates")
      Signed-off-by: default avatarKevin Bracey <kevin@bracey.fi>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: Vimalkumar <j.vimal@gmail.com>
      Link: https://lore.kernel.org/r/20220112170210.1014351-1-kevin@bracey.fiSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fb80445c
    • Kyoungkyu Park's avatar
      net: qmi_wwan: Add Hucom Wireless HM-211S/K · a6fadfd7
      Kyoungkyu Park authored
      The Hucom Wireless HM-211S/K is an LTE module based on Qualcomm MDM9207.
      This module supports LTE Band 1, 3, 5, 7, 8 and WCDMA Band 1.
      
      Manual testing showed that only interface
      number two replies to QMI messages.
      
      T:  Bus=01 Lev=02 Prnt=02 Port=01 Cnt=01 Dev#=  3 Spd=480  MxCh= 0
      D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
      P:  Vendor=22de ProdID=9051 Rev= 3.18
      S:  Manufacturer=Android
      S:  Product=Android
      S:  SerialNumber=0123456789ABCDEF
      C:* #Ifs= 4 Cfg#= 1 Atr=80 MxPwr=500mA
      I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
      E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
      E:  Ad=83(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      E:  Ad=85(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
      E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
      E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      Signed-off-by: default avatarKyoungkyu Park <choryu.park@choryu.space>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Link: https://lore.kernel.org/r/Yd+nxAA6KorDpQFv@choryu-tfx5470hSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a6fadfd7
    • Wen Gu's avatar
      net/smc: Resolve the race between SMC-R link access and clear · 20c9398d
      Wen Gu authored
      We encountered some crashes caused by the race between SMC-R
      link access and link clear that triggered by abnormal link
      group termination, such as port error.
      
      Here is an example of this kind of crashes:
      
       BUG: kernel NULL pointer dereference, address: 0000000000000000
       Workqueue: smc_hs_wq smc_listen_work [smc]
       RIP: 0010:smc_llc_flow_initiate+0x44/0x190 [smc]
       Call Trace:
        <TASK>
        ? __smc_buf_create+0x75a/0x950 [smc]
        smcr_lgr_reg_rmbs+0x2a/0xbf [smc]
        smc_listen_work+0xf72/0x1230 [smc]
        ? process_one_work+0x25c/0x600
        process_one_work+0x25c/0x600
        worker_thread+0x4f/0x3a0
        ? process_one_work+0x600/0x600
        kthread+0x15d/0x1a0
        ? set_kthread_struct+0x40/0x40
        ret_from_fork+0x1f/0x30
        </TASK>
      
      smc_listen_work()                     __smc_lgr_terminate()
      ---------------------------------------------------------------
                                          | smc_lgr_free()
                                          |  |- smcr_link_clear()
                                          |      |- memset(lnk, 0)
      smc_listen_rdma_reg()               |
       |- smcr_lgr_reg_rmbs()             |
           |- smc_llc_flow_initiate()     |
               |- access lnk->lgr (panic) |
      
      These crashes are similarly caused by clearing SMC-R link
      resources when some functions is still accessing to them.
      This patch tries to fix the issue by introducing reference
      count of SMC-R links and ensuring that the sensitive resources
      of links won't be cleared until reference count reaches zero.
      
      The operation to the SMC-R link reference count can be concluded
      as follows:
      
      object          [hold or initialized as 1]         [put]
      --------------------------------------------------------------------
      links           smcr_link_init()                   smcr_link_clear()
      connections     smc_conn_create()                  smc_conn_free()
      
      Through this way, the clear of SMC-R links is later than the
      free of all the smc connections above it, thus avoiding the
      unsafe reference to SMC-R links.
      Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20c9398d
    • Wen Gu's avatar
      net/smc: Introduce a new conn->lgr validity check helper · ea89c6c0
      Wen Gu authored
      It is no longer suitable to identify whether a smc connection
      is registered in a link group through checking if conn->lgr
      is NULL, because conn->lgr won't be reset even the connection
      is unregistered from a link group.
      
      So this patch introduces a new helper smc_conn_lgr_valid() and
      replaces all the check of conn->lgr in original implementation
      with the new helper to judge if conn->lgr is valid to use.
      Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea89c6c0
    • Eric Dumazet's avatar
      inet: frags: annotate races around fqdir->dead and fqdir->high_thresh · 91341fa0
      Eric Dumazet authored
      Both fields can be read/written without synchronization,
      add proper accessors and documentation.
      
      Fixes: d5dd8879 ("inet: fix various use-after-free in defrags units")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91341fa0
    • David S. Miller's avatar
      Merge branch 'smc-race-fixes' · 3ba8c625
      David S. Miller authored
      Wen Gu says:
      
      ====================
      net/smc: Fixes for race in smc link group termination
      
      We encountered some crashes recently and they are caused by the
      race between the access and free of link/link group in abnormal
      smc link group termination. The crashes can be reproduced in
      frequent abnormal link group termination, like setting RNICs up/down.
      
      This set of patches tries to fix this by extending the life cycle
      of link/link group to ensure that they won't be referred to after
      cleared or freed.
      
      v1 -> v2:
      - Improve some comments.
      
      - Move codes of waking up lgrs_deleted wait queue from smc_lgr_free()
        to __smc_lgr_free().
      
      - Move codes of waking up links_deleted wait queue from smcr_link_clear()
        to __smcr_link_clear().
      
      - Move codes of smc_ibdev_cnt_dec() and put_device() from smcr_link_clear()
        to __smcr_link_clear()
      
      - Move smc_lgr_put() to the end of __smcr_link_clear().
      
      - Call smc_lgr_put() after 'out' tag in smcr_link_init() when link
        initialization fails.
      
      - Modify the location where smc connection holds the lgr or link.
      
          before:
            * hold lgr in smc_lgr_register_conn().
            * hold link in smcr_lgr_conn_assign_link().
          after:
            * hold both lgr and link in smc_conn_create().
      
        Modify the location to symmetrical with the place where smc connections
        put the lgr or link, which is smc_conn_free().
      
      - Initialize conn->freed as zero in smc_conn_create().
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ba8c625
    • Wen Gu's avatar
      net/smc: Resolve the race between link group access and termination · 61f434b0
      Wen Gu authored
      We encountered some crashes caused by the race between the access
      and the termination of link groups.
      
      Here are some of panic stacks we met:
      
      1) Race between smc_clc_wait_msg() and __smc_lgr_terminate()
      
       BUG: kernel NULL pointer dereference, address: 00000000000002f0
       Workqueue: smc_hs_wq smc_listen_work [smc]
       RIP: 0010:smc_clc_wait_msg+0x3eb/0x5c0 [smc]
       Call Trace:
        <TASK>
        ? smc_clc_send_accept+0x45/0xa0 [smc]
        ? smc_clc_send_accept+0x45/0xa0 [smc]
        smc_listen_work+0x783/0x1220 [smc]
        ? finish_task_switch+0xc4/0x2e0
        ? process_one_work+0x1ad/0x3c0
        process_one_work+0x1ad/0x3c0
        worker_thread+0x4c/0x390
        ? rescuer_thread+0x320/0x320
        kthread+0x149/0x190
        ? set_kthread_struct+0x40/0x40
        ret_from_fork+0x1f/0x30
        </TASK>
      
      smc_listen_work()                abnormal case like port error
      ---------------------------------------------------------------
                                      | __smc_lgr_terminate()
                                      |  |- smc_conn_kill()
                                      |      |- smc_lgr_unregister_conn()
                                      |          |- set conn->lgr = NULL
      smc_clc_wait_msg()              |
       |- access conn->lgr (panic)    |
      
      2) Race between smc_setsockopt() and __smc_lgr_terminate()
      
       BUG: kernel NULL pointer dereference, address: 00000000000002e8
       RIP: 0010:smc_setsockopt+0x17a/0x280 [smc]
       Call Trace:
        <TASK>
        __sys_setsockopt+0xfc/0x190
        __x64_sys_setsockopt+0x20/0x30
        do_syscall_64+0x34/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae
        </TASK>
      
      smc_setsockopt()                 abnormal case like port error
      --------------------------------------------------------------
                                      | __smc_lgr_terminate()
                                      |  |- smc_conn_kill()
                                      |      |- smc_lgr_unregister_conn()
                                      |          |- set conn->lgr = NULL
      mod_delayed_work()              |
       |- access conn->lgr (panic)    |
      
      There are some other panic places and they are caused by the
      similar reason as described above, which is accessing link
      group after termination, thus getting a NULL pointer or invalid
      resource.
      
      Currently, there seems to be no synchronization between the
      link group access and a sudden termination of it. This patch
      tries to fix this by introducing reference count of link group
      and not freeing link group until reference count is zero.
      
      Link group might be referred to by links or smc connections. So
      the operation to the link group reference count can be concluded
      as follows:
      
      object          [hold or initialized as 1]       [put]
      -------------------------------------------------------------------
      link group      smc_lgr_create()                 smc_lgr_free()
      connections     smc_conn_create()                smc_conn_free()
      links           smcr_link_init()                 smcr_link_clear()
      
      Througth this way, we extend the life cycle of link group and
      ensure it is longer than the life cycle of connections and links
      above it, so that avoid invalid access to link group after its
      termination.
      Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61f434b0
    • Li Zhijian's avatar
      kselftests/net: adapt the timeout to the largest runtime · de0e4447
      Li Zhijian authored
      timeout in settings is used by each case under the same directory, so
      it should adapt to the maximum runtime.
      
      A normally running net/fib_nexthops.sh may be killed by this unsuitable
      timeout. Furthermore, since the defect[1] of kselftests framework,
      net/fib_nexthops.sh which might take at least (300 * 4) seconds would
      block the whole kselftests framework previously.
      $ git grep -w 'sleep 300' tools/testing/selftests/net
      tools/testing/selftests/net/fib_nexthops.sh:    sleep 300
      tools/testing/selftests/net/fib_nexthops.sh:    sleep 300
      tools/testing/selftests/net/fib_nexthops.sh:    sleep 300
      tools/testing/selftests/net/fib_nexthops.sh:    sleep 300
      
      Enlarge the timeout by plus 300 based on the obvious largest runtime
      to avoid the blocking.
      
      [1]: https://www.spinics.net/lists/kernel/msg4185370.htmlSigned-off-by: default avatarZhou Jie <zhoujie2011@fujitsu.com>
      Signed-off-by: default avatarLi Zhijian <lizhijian@fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de0e4447
    • Vladimir Oltean's avatar
      net: mscc: ocelot: don't let phylink re-enable TX PAUSE on the NPI port · 33cb0ff3
      Vladimir Oltean authored
      Since commit b3964807 ("net: mscc: ocelot: disable flow control on
      NPI interface"), flow control should be disabled on the DSA CPU port
      when used in NPI mode.
      
      However, the commit blamed in the Fixes: tag below broke this, because
      it allowed felix_phylink_mac_link_up() to overwrite SYS_PAUSE_CFG_PAUSE_ENA
      for the DSA CPU port.
      
      This issue became noticeable since the device tree update from commit
      8fcea7be ("arm64: dts: ls1028a: mark internal links between Felix
      and ENETC as capable of flow control").
      
      The solution is to check whether this is the currently configured NPI
      port from ocelot_phylink_mac_link_up(), and to not modify the statically
      disabled PAUSE frame transmission if it is.
      
      When the port is configured for lossless mode as opposed to tail drop
      mode, but the link partner (DSA master) doesn't observe the transmitted
      PAUSE frames, the switch termination throughput is much worse, as can be
      seen below.
      
      Before:
      
      root@debian:~# iperf3 -c 192.168.100.2
      Connecting to host 192.168.100.2, port 5201
      [  5] local 192.168.100.1 port 37504 connected to 192.168.100.2 port 5201
      [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
      [  5]   0.00-1.00   sec  28.4 MBytes   238 Mbits/sec  357   22.6 KBytes
      [  5]   1.00-2.00   sec  33.6 MBytes   282 Mbits/sec  426   19.8 KBytes
      [  5]   2.00-3.00   sec  34.0 MBytes   285 Mbits/sec  343   21.2 KBytes
      [  5]   3.00-4.00   sec  32.9 MBytes   276 Mbits/sec  354   22.6 KBytes
      [  5]   4.00-5.00   sec  32.3 MBytes   271 Mbits/sec  297   18.4 KBytes
      ^C[  5]   5.00-5.06   sec  2.05 MBytes   270 Mbits/sec   45   19.8 KBytes
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID] Interval           Transfer     Bitrate         Retr
      [  5]   0.00-5.06   sec   163 MBytes   271 Mbits/sec  1822             sender
      [  5]   0.00-5.06   sec  0.00 Bytes  0.00 bits/sec                  receiver
      
      After:
      
      root@debian:~# iperf3 -c 192.168.100.2
      Connecting to host 192.168.100.2, port 5201
      [  5] local 192.168.100.1 port 49470 connected to 192.168.100.2 port 5201
      [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
      [  5]   0.00-1.00   sec   112 MBytes   941 Mbits/sec  259    143 KBytes
      [  5]   1.00-2.00   sec   110 MBytes   920 Mbits/sec  329    144 KBytes
      [  5]   2.00-3.00   sec   112 MBytes   936 Mbits/sec  255    144 KBytes
      [  5]   3.00-4.00   sec   110 MBytes   927 Mbits/sec  355    105 KBytes
      [  5]   4.00-5.00   sec   110 MBytes   926 Mbits/sec  350    156 KBytes
      [  5]   5.00-6.00   sec   110 MBytes   925 Mbits/sec  305    148 KBytes
      [  5]   6.00-7.00   sec   110 MBytes   924 Mbits/sec  320    143 KBytes
      [  5]   7.00-8.00   sec   110 MBytes   925 Mbits/sec  273   97.6 KBytes
      [  5]   8.00-9.00   sec   109 MBytes   913 Mbits/sec  299    141 KBytes
      [  5]   9.00-10.00  sec   110 MBytes   922 Mbits/sec  287    146 KBytes
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID] Interval           Transfer     Bitrate         Retr
      [  5]   0.00-10.00  sec  1.08 GBytes   926 Mbits/sec  3032             sender
      [  5]   0.00-10.00  sec  1.08 GBytes   925 Mbits/sec                  receiver
      
      Fixes: de274be3 ("net: dsa: felix: set TX flow control according to the phylink_mac_link_up resolution")
      Reported-by: default avatarXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33cb0ff3
    • Colin Ian King's avatar
      atm: iphase: remove redundant pointer skb · d7b43034
      Colin Ian King authored
      The pointer skb is redundant, it is assigned a value that is never
      read and hence can be removed. Cleans up clang scan warning:
      
      drivers/atm/iphase.c:205:18: warning: Although the value stored
      to 'skb' is used in the enclosing expression, the value is never
      actually read from 'skb' [deadcode.DeadStores]
      Signed-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7b43034
    • Maxim Mikityanskiy's avatar
      sch_api: Don't skip qdisc attach on ingress · de2d807b
      Maxim Mikityanskiy authored
      The attach callback of struct Qdisc_ops is used by only a few qdiscs:
      mq, mqprio and htb. qdisc_graft() contains the following logic
      (pseudocode):
      
          if (!qdisc->ops->attach) {
              if (ingress)
                  do ingress stuff;
              else
                  do egress stuff;
          }
          if (!ingress) {
              ...
              if (qdisc->ops->attach)
                  qdisc->ops->attach(qdisc);
          } else {
              ...
          }
      
      As we see, the attach callback is not called if the qdisc is being
      attached to ingress (TC_H_INGRESS). That wasn't a problem for mq and
      mqprio, since they contain a check that they are attached to TC_H_ROOT,
      and they can't be attached to TC_H_INGRESS anyway.
      
      However, the commit cited below added the attach callback to htb. It is
      needed for the hardware offload, but in the non-offload mode it
      simulates the "do egress stuff" part of the pseudocode above. The
      problem is that when htb is attached to ingress, neither "do ingress
      stuff" nor attach() is called. It results in an inconsistency, and the
      following message is printed to dmesg:
      
      unregister_netdevice: waiting for lo to become free. Usage count = 2
      
      This commit addresses the issue by running "do ingress stuff" in the
      ingress flow even in the attach callback is present, which is fine,
      because attach isn't going to be called afterwards.
      
      The bug was found by syzbot and reported by Eric.
      
      Fixes: d03b195b ("sch_htb: Hierarchical QoS hardware offload")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de2d807b
    • Pawel Dembicki's avatar
      net: qmi_wwan: add ZTE MF286D modem 19d2:1485 · 078c6a1c
      Pawel Dembicki authored
      Modem from ZTE MF286D is an Qualcomm MDM9250 based 3G/4G modem.
      
      T:  Bus=02 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  3 Spd=5000 MxCh= 0
      D:  Ver= 3.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 9 #Cfgs=  1
      P:  Vendor=19d2 ProdID=1485 Rev=52.87
      S:  Manufacturer=ZTE,Incorporated
      S:  Product=ZTE Technologies MSM
      S:  SerialNumber=MF286DZTED000000
      C:* #Ifs= 7 Cfg#= 1 Atr=80 MxPwr=896mA
      A:  FirstIf#= 0 IfCount= 2 Cls=02(comm.) Sub=06 Prot=00
      I:* If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=02 Prot=ff Driver=rndis_host
      E:  Ad=82(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
      I:* If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host
      E:  Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      E:  Ad=01(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      E:  Ad=83(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      E:  Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      E:  Ad=85(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=84(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      E:  Ad=03(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      E:  Ad=87(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=86(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      E:  Ad=04(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      I:* If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      E:  Ad=88(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
      E:  Ad=8e(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      E:  Ad=0f(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      I:* If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=usbfs
      E:  Ad=05(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      E:  Ad=89(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      Signed-off-by: default avatarPawel Dembicki <paweldembicki@gmail.com>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      078c6a1c
  5. 12 Jan, 2022 13 commits
    • Ignat Korchagin's avatar
      sit: allow encapsulated IPv6 traffic to be delivered locally · ed6ae5ca
      Ignat Korchagin authored
      While experimenting with FOU encapsulation Amir noticed that encapsulated IPv6
      traffic fails to be delivered, if the peer IP address is configured locally.
      
      It can be easily verified by creating a sit interface like below:
      
      $ sudo ip link add name fou_test type sit remote 127.0.0.1 encap fou encap-sport auto encap-dport 1111
      $ sudo ip link set fou_test up
      
      and sending some IPv4 and IPv6 traffic to it
      
      $ ping -I fou_test -c 1 1.1.1.1
      $ ping6 -I fou_test -c 1 fe80::d0b0:dfff:fe4c:fcbc
      
      "tcpdump -i any udp dst port 1111" will confirm that only the first IPv4 ping
      was encapsulated and attempted to be delivered.
      
      This seems like a limitation: for example, in a cloud environment the "peer"
      service may be arbitrarily scheduled on any server within the cluster, where all
      nodes are trying to send encapsulated traffic. And the unlucky node will not be
      able to. Moreover, delivering encapsulated IPv4 traffic locally is allowed.
      
      But I may not have all the context about this restriction and this code predates
      the observable git history.
      Reported-by: default avatarAmir Razmjou <arazmjou@cloudflare.com>
      Signed-off-by: default avatarIgnat Korchagin <ignat@cloudflare.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20220107123842.211335-1-ignat@cloudflare.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ed6ae5ca
    • Yevhen Orlov's avatar
      net: marvell: prestera: Fix deinit sequence for router · e179f045
      Yevhen Orlov authored
      * Add missed call prestera_router_fini in prestera_switch_fini
      * Add prestera_router_hw_fini, which verify lists are empty
      
      Fixes: 69204174 ("net: marvell: prestera: Add prestera router infra")
      Signed-off-by: default avatarYevhen Orlov <yevhen.orlov@plvision.eu>
      Link: https://lore.kernel.org/r/20220111011129.5457-1-yevhen.orlov@plvision.euSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e179f045
    • Yevhen Orlov's avatar
      net: marvell: prestera: Refactor router functions · 32d098bb
      Yevhen Orlov authored
      * Reverse xmas tree variables order
      * User friendly messages on error paths
      * Refactor __prestera_inetaddr_event to use early return
      Signed-off-by: default avatarYevhen Orlov <yevhen.orlov@plvision.eu>
      Link: https://lore.kernel.org/r/20220111011051.4941-1-yevhen.orlov@plvision.euSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      32d098bb
    • Yevhen Orlov's avatar
      net: marvell: prestera: Refactor get/put VR functions · 6a1ba875
      Yevhen Orlov authored
      * Use refcount, instead of uint
      * Increment/decrement recount inside get/put
      * Fix error path in __prestera_vr_create. Remove unnecessary kfree.
      * Make __prestera_vr_destroy symmetric to "create"
      
      Fixes: bca5859b ("net: marvell: prestera: add hardware router objects accounting")
      Signed-off-by: default avatarYevhen Orlov <yevhen.orlov@plvision.eu>
      Link: https://lore.kernel.org/r/20220111011014.4418-1-yevhen.orlov@plvision.euSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6a1ba875
    • Yevhen Orlov's avatar
      net: marvell: prestera: Cleanup router struct · 9c0c2c7a
      Yevhen Orlov authored
      Field "aborted" was added in
      69204174 ("net: marvell: prestera: Add prestera router infra").
      It will not be used. So remove.
      Signed-off-by: default avatarYevhen Orlov <yevhen.orlov@plvision.eu>
      Link: https://lore.kernel.org/r/20220111010826.3779-1-yevhen.orlov@plvision.euSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9c0c2c7a
    • Jakub Kicinski's avatar
      Merge branch 'arm-ox810se-add-ethernet-support' · 2716a527
      Jakub Kicinski authored
      Neil Armstrong says:
      
      ====================
      ARM: ox810se: Add Ethernet support
      
      This adds support for the Synopsys DWMAC controller found in the
      OX820SE SoC, by using almost the same glue code as the OX820.
      ====================
      
      Link: https://lore.kernel.org/r/20220104145646.135877-1-narmstrong@baylibre.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2716a527
    • Neil Armstrong's avatar
      net: stmmac: dwmac-oxnas: Add support for OX810SE · 72f1f7e4
      Neil Armstrong authored
      Add support for OX810SE dwmac glue setup, which is a simplified version
      of the OX820 introduced later with more control on the PHY interface.
      Signed-off-by: default avatarNeil Armstrong <narmstrong@baylibre.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      72f1f7e4
    • Neil Armstrong's avatar
      dt-bindings: net: oxnas-dwmac: Add bindings for OX810SE · 8973d7b8
      Neil Armstrong authored
      Add SoC specific bindings for OX810SE support.
      Signed-off-by: default avatarNeil Armstrong <narmstrong@baylibre.com>
      Acked-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8973d7b8
    • Jie Wang's avatar
      net: bonding: fix bond_xmit_broadcast return value error bug · 4e5bd03a
      Jie Wang authored
      In Linux bonding scenario, one packet is copied to several copies and sent
      by all slave device of bond0 in mode 3(broadcast mode). The mode 3 xmit
      function bond_xmit_broadcast() only ueses the last slave device's tx result
      as the final result. In this case, if the last slave device is down, then
      it always return NET_XMIT_DROP, even though the other slave devices xmit
      success. It may cause the tx statistics error, and cause the application
      (e.g. scp) consider the network is unreachable.
      
      For example, use the following command to configure server A.
      
      echo 3 > /sys/class/net/bond0/bonding/mode
      ifconfig bond0 up
      ifenslave bond0 eth0 eth1
      ifconfig bond0 192.168.1.125
      ifconfig eth0 up
      ifconfig eth1 down
      The slave device eth0 and eth1 are connected to server B(192.168.1.107).
      Run the ping 192.168.1.107 -c 3 -i 0.2 command, the following information
      is displayed.
      
      PING 192.168.1.107 (192.168.1.107) 56(84) bytes of data.
      64 bytes from 192.168.1.107: icmp_seq=1 ttl=64 time=0.077 ms
      64 bytes from 192.168.1.107: icmp_seq=2 ttl=64 time=0.056 ms
      64 bytes from 192.168.1.107: icmp_seq=3 ttl=64 time=0.051 ms
      
       192.168.1.107 ping statistics
      0 packets transmitted, 3 received
      
      Actually, the slave device eth0 of the bond successfully sends three
      ICMP packets, but the result shows that 0 packets are transmitted.
      
      Also if we use scp command to get remote files, the command end with the
      following printings.
      
      ssh_exchange_identification: read: Connection timed out
      
      So this patch modifies the bond_xmit_broadcast to return NET_XMIT_SUCCESS
      if one slave device in the bond sends packets successfully. If all slave
      devices send packets fail, the discarded packets stats is increased. The
      skb is released when there is no slave device in the bond or the last slave
      device is down.
      
      Fixes: ae46f184 ("bonding: propagate transmit status")
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e5bd03a
    • Eric Dumazet's avatar
      net/smc: fix possible NULL deref in smc_pnet_add_eth() · 7b9b1d44
      Eric Dumazet authored
      I missed that @ndev value can be NULL.
      
      I prefer not factorizing this NULL check, and instead
      clearly document where a NULL might be expected.
      
      general protection fault, probably for non-canonical address 0xdffffc00000000ba: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x00000000000005d0-0x00000000000005d7]
      CPU: 0 PID: 19875 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__lock_acquire+0xd7a/0x5470 kernel/locking/lockdep.c:4897
      Code: 14 0e 41 bf 01 00 00 00 0f 86 c8 00 00 00 89 05 5c 20 14 0e e9 bd 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 9f 2e 00 00 49 81 3e 20 c5 1a 8f 0f 84 52 f3 ff
      RSP: 0018:ffffc900057071d0 EFLAGS: 00010002
      RAX: dffffc0000000000 RBX: 1ffff92000ae0e65 RCX: 1ffff92000ae0e4c
      RDX: 00000000000000ba RSI: 0000000000000000 RDI: 0000000000000001
      RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
      R10: fffffbfff1b24ae2 R11: 000000000008808a R12: 0000000000000000
      R13: ffff888040ca4000 R14: 00000000000005d0 R15: 0000000000000000
      FS:  00007fbd683e0700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b2be22000 CR3: 0000000013fea000 CR4: 00000000003526f0
      Call Trace:
       <TASK>
       lock_acquire kernel/locking/lockdep.c:5637 [inline]
       lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5602
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0x39/0x50 kernel/locking/spinlock.c:162
       ref_tracker_alloc+0x182/0x440 lib/ref_tracker.c:84
       netdev_tracker_alloc include/linux/netdevice.h:3859 [inline]
       smc_pnet_add_eth net/smc/smc_pnet.c:372 [inline]
       smc_pnet_enter net/smc/smc_pnet.c:492 [inline]
       smc_pnet_add+0x49a/0x14d0 net/smc/smc_pnet.c:555
       genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731
       genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
       genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
       genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: b6064524 ("net/smc: add net device tracker to struct smc_pnetentry")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b9b1d44
    • Eric Dumazet's avatar
      net: bridge: fix net device refcount tracking issue in error path · fcfb894d
      Eric Dumazet authored
      I left one dev_put() in br_add_if() error path and sure enough
      syzbot found its way.
      
      As the tracker is allocated in new_nbp(), we must make sure
      to properly free it.
      
      We have to call dev_put_track(dev, &p->dev_tracker) before
      @p object is freed, of course. This is not an issue because
      br_add_if() owns a reference on @dev.
      
      Fixes: b2dcdc7f ("net: bridge: add net device refcount tracker")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fcfb894d
    • David S. Miller's avatar
      Merge branch 'ipa-fixes' · 0bbed88a
      David S. Miller authored
      Alex Elder says:
      
      ====================
      net: ipa: fix two replenish bugs
      
      This series contains two fixes for bugs in the IPA receive buffer
      replenishing code.  The (new) second patch defines a bitmap to
      represent endpoint the replenish enabled flag.  Its purpose is to
      prepare for the third patch, which adds an additional flag.
      
      Version 2 of this series uses bitmap operations in the second bug
      fix rather than an atomic variable, as suggested by Jakub.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0bbed88a
    • Alex Elder's avatar
      net: ipa: prevent concurrent replenish · 998c0bd2
      Alex Elder authored
      We have seen cases where an endpoint RX completion interrupt arrives
      while replenishing for the endpoint is underway.  This causes another
      instance of replenishing to begin as part of completing the receive
      transaction.  If this occurs it can lead to transaction corruption.
      
      Use a new flag to ensure only one replenish instance for an endpoint
      executes at a time.
      
      Fixes: 84f9bd12 ("soc: qcom: ipa: IPA endpoints")
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      998c0bd2