1. 02 Feb, 2022 19 commits
    • Maor Dickman's avatar
      net/mlx5e: Fix handling of wrong devices during bond netevent · ec41332e
      Maor Dickman authored
      Current implementation of bond netevent handler only check if
      the handled netdev is VF representor and it missing a check if
      the VF representor is on the same phys device of the bond handling
      the netevent.
      
      Fix by adding the missing check and optimizing the check if
      the netdev is VF representor so it will not access uninitialized
      private data and crashes.
      
      BUG: kernel NULL pointer dereference, address: 000000000000036c
      PGD 0 P4D 0
      Oops: 0000 [#1] SMP NOPTI
      Workqueue: eth3bond0 bond_mii_monitor [bonding]
      RIP: 0010:mlx5e_is_uplink_rep+0xc/0x50 [mlx5_core]
      RSP: 0018:ffff88812d69fd60 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: ffff8881cf800000 RCX: 0000000000000000
      RDX: ffff88812d69fe10 RSI: 000000000000001b RDI: ffff8881cf800880
      RBP: ffff8881cf800000 R08: 00000445cabccf2b R09: 0000000000000008
      R10: 0000000000000004 R11: 0000000000000008 R12: ffff88812d69fe10
      R13: 00000000fffffffe R14: ffff88820c0f9000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff88846fb00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000000036c CR3: 0000000103d80006 CR4: 0000000000370ea0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       mlx5e_eswitch_uplink_rep+0x31/0x40 [mlx5_core]
       mlx5e_rep_is_lag_netdev+0x94/0xc0 [mlx5_core]
       mlx5e_rep_esw_bond_netevent+0xeb/0x3d0 [mlx5_core]
       raw_notifier_call_chain+0x41/0x60
       call_netdevice_notifiers_info+0x34/0x80
       netdev_lower_state_changed+0x4e/0xa0
       bond_mii_monitor+0x56b/0x640 [bonding]
       process_one_work+0x1b9/0x390
       worker_thread+0x4d/0x3d0
       ? rescuer_thread+0x350/0x350
       kthread+0x124/0x150
       ? set_kthread_struct+0x40/0x40
       ret_from_fork+0x1f/0x30
      
      Fixes: 7e51891a ("net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule")
      Signed-off-by: default avatarMaor Dickman <maord@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      ec41332e
    • Khalid Manaa's avatar
      net/mlx5e: Fix broken SKB allocation in HW-GRO · 7957837b
      Khalid Manaa authored
      In case the HW doesn't perform header-data split, it will write the whole
      packet into the data buffer in the WQ, in this case the SHAMPO CQE handler
      couldn't use the header entry to build the SKB, instead it should allocate
      a new memory to build the SKB using the function:
      mlx5e_skb_from_cqe_mpwrq_nonlinear.
      
      Fixes: f97d5c2a ("net/mlx5e: Add handle SHAMPO cqe support")
      Signed-off-by: default avatarKhalid Manaa <khalidm@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      7957837b
    • Khalid Manaa's avatar
      net/mlx5e: Fix wrong calculation of header index in HW_GRO · b8d91145
      Khalid Manaa authored
      The HW doesn't wrap the CQE.shampo.header_index field according to the
      headers buffer size, instead it always increases it until reaching overflow
      of u16 size.
      
      Thus the mlx5e_handle_rx_cqe_mpwrq_shampo handler should mask the
      CQE header_index field to find the actual header index in the headers buffer.
      
      Fixes: f97d5c2a ("net/mlx5e: Add handle SHAMPO cqe support")
      Signed-off-by: default avatarKhalid Manaa <khalidm@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b8d91145
    • Roi Dayan's avatar
      net/mlx5: Bridge, Fix devlink deadlock on net namespace deletion · 880b5176
      Roi Dayan authored
      When changing mode to switchdev, rep bridge init registered to netdevice
      notifier holds the devlink lock and then takes pernet_ops_rwsem.
      At that time deleting a netns holds pernet_ops_rwsem and then takes
      the devlink lock.
      
      Example sequence is:
      $ ip netns add foo
      $ devlink dev eswitch set pci/0000:00:08.0 mode switchdev &
      $ ip netns del foo
      
      deleting netns trace:
      
      [ 1185.365555]  ? devlink_pernet_pre_exit+0x74/0x1c0
      [ 1185.368331]  ? mutex_lock_io_nested+0x13f0/0x13f0
      [ 1185.370984]  ? xt_find_table+0x40/0x100
      [ 1185.373244]  ? __mutex_lock+0x24a/0x15a0
      [ 1185.375494]  ? net_generic+0xa0/0x1c0
      [ 1185.376844]  ? wait_for_completion_io+0x280/0x280
      [ 1185.377767]  ? devlink_pernet_pre_exit+0x74/0x1c0
      [ 1185.378686]  devlink_pernet_pre_exit+0x74/0x1c0
      [ 1185.379579]  ? devlink_nl_cmd_get_dumpit+0x3a0/0x3a0
      [ 1185.380557]  ? xt_find_table+0xda/0x100
      [ 1185.381367]  cleanup_net+0x372/0x8e0
      
      changing mode to switchdev trace:
      
      [ 1185.411267]  down_write+0x13a/0x150
      [ 1185.412029]  ? down_write_killable+0x180/0x180
      [ 1185.413005]  register_netdevice_notifier+0x1e/0x210
      [ 1185.414000]  mlx5e_rep_bridge_init+0x181/0x360 [mlx5_core]
      [ 1185.415243]  mlx5e_uplink_rep_enable+0x269/0x480 [mlx5_core]
      [ 1185.416464]  ? mlx5e_uplink_rep_disable+0x210/0x210 [mlx5_core]
      [ 1185.417749]  mlx5e_attach_netdev+0x232/0x400 [mlx5_core]
      [ 1185.418906]  mlx5e_netdev_attach_profile+0x15b/0x1e0 [mlx5_core]
      [ 1185.420172]  mlx5e_netdev_change_profile+0x15a/0x1d0 [mlx5_core]
      [ 1185.421459]  mlx5e_vport_rep_load+0x557/0x780 [mlx5_core]
      [ 1185.422624]  ? mlx5e_stats_grp_vport_rep_num_stats+0x10/0x10 [mlx5_core]
      [ 1185.424006]  mlx5_esw_offloads_rep_load+0xdb/0x190 [mlx5_core]
      [ 1185.425277]  esw_offloads_enable+0xd74/0x14a0 [mlx5_core]
      
      Fix this by registering rep bridges for per net netdev notifier
      instead of global one, which operats on the net namespace without holding
      the pernet_ops_rwsem.
      
      Fixes: 19e9bfa0 ("net/mlx5: Bridge, add offload infrastructure")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      880b5176
    • Dima Chumak's avatar
      net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE · 55b2ca70
      Dima Chumak authored
      Only prio 1 is supported for nic mode when there is no ignore flow level
      support in firmware. But for switchdev mode, which supports fixed number
      of statically pre-allocated prios, this restriction is not relevant so
      it can be relaxed.
      
      Fixes: d671e109 ("net/mlx5: Fix tc max supported prio for nic mode")
      Signed-off-by: default avatarDima Chumak <dchumak@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      55b2ca70
    • Roi Dayan's avatar
      net/mlx5e: TC, Reject rules with forward and drop actions · 5623ef8a
      Roi Dayan authored
      Such rules are redundant but allowed and passed to the driver.
      The driver does not support offloading such rules so return an error.
      
      Fixes: 03a9d11e ("net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      5623ef8a
    • Maher Sanalla's avatar
      net/mlx5: Use del_timer_sync in fw reset flow of halting poll · 3c5193a8
      Maher Sanalla authored
      Substitute del_timer() with del_timer_sync() in fw reset polling
      deactivation flow, in order to prevent a race condition which occurs
      when del_timer() is called and timer is deactivated while another
      process is handling the timer interrupt. A situation that led to
      the following call trace:
      	RIP: 0010:run_timer_softirq+0x137/0x420
      	<IRQ>
      	recalibrate_cpu_khz+0x10/0x10
      	ktime_get+0x3e/0xa0
      	? sched_clock_cpu+0xb/0xc0
      	__do_softirq+0xf5/0x2ea
      	irq_exit_rcu+0xc1/0xf0
      	sysvec_apic_timer_interrupt+0x9e/0xc0
      	asm_sysvec_apic_timer_interrupt+0x12/0x20
      	</IRQ>
      
      Fixes: 38b9f903 ("net/mlx5: Handle sync reset request event")
      Signed-off-by: default avatarMaher Sanalla <msanalla@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      3c5193a8
    • Gal Pressman's avatar
      net/mlx5e: Fix module EEPROM query · 4a08a131
      Gal Pressman authored
      When querying the module EEPROM, there was a misusage of the 'offset'
      variable vs the 'query.offset' field.
      Fix that by always using 'offset' and assigning its value to
      'query.offset' right before the mcia register read call.
      
      While at it, the cross-pages read size adjustment was changed to be more
      intuitive.
      
      Fixes: e19b0a34 ("net/mlx5: Refactor module EEPROM query")
      Reported-by: default avatarWang Yugui <wangyugui@e16-tech.com>
      Signed-off-by: default avatarGal Pressman <gal@nvidia.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      4a08a131
    • Roi Dayan's avatar
      net/mlx5e: TC, Reject rules with drop and modify hdr action · a2446bc7
      Roi Dayan authored
      This kind of action is not supported by firmware and generates a
      syndrome.
      
      kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3)
      
      Fixes: d7e75a32 ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a2446bc7
    • Vlad Buslov's avatar
      net/mlx5: Bridge, ensure dev_name is null-terminated · 350d9a82
      Vlad Buslov authored
      Even though net_device->name is guaranteed to be null-terminated string of
      size<=IFNAMSIZ, the test robot complains that return value of netdev_name()
      can be larger:
      
      In file included from include/trace/define_trace.h:102,
                          from drivers/net/ethernet/mellanox/mlx5/core/esw/diag/bridge_tracepoint.h:113,
                          from drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c:12:
         drivers/net/ethernet/mellanox/mlx5/core/esw/diag/bridge_tracepoint.h: In function 'trace_event_raw_event_mlx5_esw_bridge_fdb_template':
      >> drivers/net/ethernet/mellanox/mlx5/core/esw/diag/bridge_tracepoint.h:24:29: warning: 'strncpy' output may be truncated copying 16 bytes from a string of length 20 [-Wstringop-truncation]
            24 |                             strncpy(__entry->dev_name,
               |                             ^~~~~~~~~~~~~~~~~~~~~~~~~~
            25 |                                     netdev_name(fdb->dev),
               |                                     ~~~~~~~~~~~~~~~~~~~~~~
            26 |                                     IFNAMSIZ);
               |                                     ~~~~~~~~~
      
      This is caused by the fact that default value of IFNAMSIZ is 16, while
      placeholder value that is returned by netdev_name() for unnamed net devices
      is larger than that.
      
      The offending code is in a tracing function that is only called for mlx5
      representors, so there is no straightforward way to reproduce the issue but
      let's fix it for correctness sake by replacing strncpy() with strscpy() to
      ensure that resulting string is always null-terminated.
      
      Fixes: 9724fd5d ("net/mlx5: Bridge, add tracepoints")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      350d9a82
    • Vlad Buslov's avatar
      net/mlx5: Bridge, take rtnl lock in init error handler · 04f8c12f
      Vlad Buslov authored
      The mlx5_esw_bridge_cleanup() is expected to be called with rtnl lock
      taken, which is true for mlx5e_rep_bridge_cleanup() function but not for
      error handling code in mlx5e_rep_bridge_init(). Add missing rtnl
      lock/unlock calls and extend both mlx5_esw_bridge_cleanup() and its dual
      function mlx5_esw_bridge_init() with ASSERT_RTNL() to verify the invariant
      from now on.
      
      Fixes: 7cd6a54a ("net/mlx5: Bridge, handle FDB events")
      Fixes: 19e9bfa0 ("net/mlx5: Bridge, add offload infrastructure")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      04f8c12f
    • Jakub Kicinski's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · c7108979
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2022-01-31
      
      This series contains updates to i40e driver only.
      
      Jedrzej fixes a condition check which would cause an error when
      resetting bandwidth when DCB is active with one TC.
      
      Karen resolves a null pointer dereference that could occur when removing
      the driver while VSI rings are being disabled.
      
      * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        i40e: Fix reset path while removing the driver
        i40e: Fix reset bw limit when DCB enabled with 1 TC
      ====================
      
      Link: https://lore.kernel.org/r/20220201000522.505909-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c7108979
    • Lior Nahmanson's avatar
      net: macsec: Verify that send_sci is on when setting Tx sci explicitly · d0cfa548
      Lior Nahmanson authored
      When setting Tx sci explicit, the Rx side is expected to use this
      sci and not recalculate it from the packet.However, in case of Tx sci
      is explicit and send_sci is off, the receiver is wrongly recalculate
      the sci from the source MAC address which most likely be different
      than the explicit sci.
      
      Fix by preventing such configuration when macsec newlink is established
      and return EINVAL error code on such cases.
      
      Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver")
      Signed-off-by: default avatarLior Nahmanson <liorna@nvidia.com>
      Reviewed-by: default avatarRaed Salem <raeds@nvidia.com>
      Signed-off-by: default avatarRaed Salem <raeds@nvidia.com>
      Link: https://lore.kernel.org/r/1643542672-29403-1-git-send-email-raeds@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d0cfa548
    • Georgi Valkov's avatar
      ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback · 63e4b45c
      Georgi Valkov authored
      When rx_buf is allocated we need to account for IPHETH_IP_ALIGN,
      which reduces the usable size by 2 bytes. Otherwise we have 1512
      bytes usable instead of 1514, and if we receive more than 1512
      bytes, ipheth_rcvbulk_callback is called with status -EOVERFLOW,
      after which the driver malfunctiones and all communication stops.
      
      Resolves ipheth 2-1:4.2: ipheth_rcvbulk_callback: urb status: -75
      
      Fixes: f33d9e2b ("usbnet: ipheth: fix connectivity with iOS 14")
      Signed-off-by: default avatarGeorgi Valkov <gvalkov@abv.bg>
      Tested-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Link: https://lore.kernel.org/all/B60B8A4B-92A0-49B3-805D-809A2433B46C@abv.bg/
      Link: https://lore.kernel.org/all/24851bd2769434a5fc24730dce8e8a984c5a4505.1643699778.git.jan.kiszka@siemens.com/Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      63e4b45c
    • Eric Dumazet's avatar
      tcp: fix mem under-charging with zerocopy sendmsg() · 479f5547
      Eric Dumazet authored
      We got reports of following warning in inet_sock_destruct()
      
      	WARN_ON(sk_forward_alloc_get(sk));
      
      Whenever we add a non zero-copy fragment to a pure zerocopy skb,
      we have to anticipate that whole skb->truesize will be uncharged
      when skb is finally freed.
      
      skb->data_len is the payload length. But the memory truesize
      estimated by __zerocopy_sg_from_iter() is page aligned.
      
      Fixes: 9b65b17d ("net: avoid double accounting for pure zerocopy skbs")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Talal Ahmad <talalahmad@google.com>
      Cc: Arjun Roy <arjunroy@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Link: https://lore.kernel.org/r/20220201065254.680532-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      479f5547
    • Eric Dumazet's avatar
      af_packet: fix data-race in packet_setsockopt / packet_setsockopt · e42e70ad
      Eric Dumazet authored
      When packet_setsockopt( PACKET_FANOUT_DATA ) reads po->fanout,
      no lock is held, meaning that another thread can change po->fanout.
      
      Given that po->fanout can only be set once during the socket lifetime
      (it is only cleared from fanout_release()), we can use
      READ_ONCE()/WRITE_ONCE() to document the race.
      
      BUG: KCSAN: data-race in packet_setsockopt / packet_setsockopt
      
      write to 0xffff88813ae8e300 of 8 bytes by task 14653 on cpu 0:
       fanout_add net/packet/af_packet.c:1791 [inline]
       packet_setsockopt+0x22fe/0x24a0 net/packet/af_packet.c:3931
       __sys_setsockopt+0x209/0x2a0 net/socket.c:2180
       __do_sys_setsockopt net/socket.c:2191 [inline]
       __se_sys_setsockopt net/socket.c:2188 [inline]
       __x64_sys_setsockopt+0x62/0x70 net/socket.c:2188
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff88813ae8e300 of 8 bytes by task 14654 on cpu 1:
       packet_setsockopt+0x691/0x24a0 net/packet/af_packet.c:3935
       __sys_setsockopt+0x209/0x2a0 net/socket.c:2180
       __do_sys_setsockopt net/socket.c:2191 [inline]
       __se_sys_setsockopt net/socket.c:2188 [inline]
       __x64_sys_setsockopt+0x62/0x70 net/socket.c:2188
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x0000000000000000 -> 0xffff888106f8c000
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 14654 Comm: syz-executor.3 Not tainted 5.16.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 47dceb8e ("packet: add classic BPF fanout mode")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220201022358.330621-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e42e70ad
    • Eric Dumazet's avatar
      rtnetlink: make sure to refresh master_dev/m_ops in __rtnl_newlink() · c6f6f244
      Eric Dumazet authored
      While looking at one unrelated syzbot bug, I found the replay logic
      in __rtnl_newlink() to potentially trigger use-after-free.
      
      It is better to clear master_dev and m_ops inside the loop,
      in case we have to replay it.
      
      Fixes: ba7d49b1 ("rtnetlink: provide api for getting and setting slave info")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Jiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20220201012106.216495-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c6f6f244
    • Eric Dumazet's avatar
      net: sched: fix use-after-free in tc_new_tfilter() · 04c2a47f
      Eric Dumazet authored
      Whenever tc_new_tfilter() jumps back to replay: label,
      we need to make sure @q and @chain local variables are cleared again,
      or risk use-after-free as in [1]
      
      For consistency, apply the same fix in tc_ctl_chain()
      
      BUG: KASAN: use-after-free in mini_qdisc_pair_swap+0x1b9/0x1f0 net/sched/sch_generic.c:1581
      Write of size 8 at addr ffff8880985c4b08 by task syz-executor.4/1945
      
      CPU: 0 PID: 1945 Comm: syz-executor.4 Not tainted 5.17.0-rc1-syzkaller-00495-gff58831f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0x8d/0x336 mm/kasan/report.c:255
       __kasan_report mm/kasan/report.c:442 [inline]
       kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
       mini_qdisc_pair_swap+0x1b9/0x1f0 net/sched/sch_generic.c:1581
       tcf_chain_head_change_item net/sched/cls_api.c:372 [inline]
       tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:386
       tcf_chain_tp_insert net/sched/cls_api.c:1657 [inline]
       tcf_chain_tp_insert_unique net/sched/cls_api.c:1707 [inline]
       tc_new_tfilter+0x1e67/0x2350 net/sched/cls_api.c:2086
       rtnetlink_rcv_msg+0x80d/0xb80 net/core/rtnetlink.c:5583
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       ____sys_sendmsg+0x331/0x810 net/socket.c:2413
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
       __sys_sendmmsg+0x195/0x470 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f2647172059
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f2645aa5168 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 00007f2647285100 RCX: 00007f2647172059
      RDX: 040000000000009f RSI: 00000000200002c0 RDI: 0000000000000006
      RBP: 00007f26471cc08d R08: 0000000000000000 R09: 0000000000000000
      R10: 9e00000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007fffb3f7f02f R14: 00007f2645aa5300 R15: 0000000000022000
       </TASK>
      
      Allocated by task 1944:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:45 [inline]
       set_alloc_info mm/kasan/common.c:436 [inline]
       ____kasan_kmalloc mm/kasan/common.c:515 [inline]
       ____kasan_kmalloc mm/kasan/common.c:474 [inline]
       __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:524
       kmalloc_node include/linux/slab.h:604 [inline]
       kzalloc_node include/linux/slab.h:726 [inline]
       qdisc_alloc+0xac/0xa10 net/sched/sch_generic.c:941
       qdisc_create.constprop.0+0xce/0x10f0 net/sched/sch_api.c:1211
       tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660
       rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5592
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       ____sys_sendmsg+0x331/0x810 net/socket.c:2413
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
       __sys_sendmmsg+0x195/0x470 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Freed by task 3609:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       kasan_set_track+0x21/0x30 mm/kasan/common.c:45
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
       ____kasan_slab_free mm/kasan/common.c:366 [inline]
       ____kasan_slab_free+0x130/0x160 mm/kasan/common.c:328
       kasan_slab_free include/linux/kasan.h:236 [inline]
       slab_free_hook mm/slub.c:1728 [inline]
       slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1754
       slab_free mm/slub.c:3509 [inline]
       kfree+0xcb/0x280 mm/slub.c:4562
       rcu_do_batch kernel/rcu/tree.c:2527 [inline]
       rcu_core+0x7b8/0x1540 kernel/rcu/tree.c:2778
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
      
      Last potentially related work creation:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
       __call_rcu kernel/rcu/tree.c:3026 [inline]
       call_rcu+0xb1/0x740 kernel/rcu/tree.c:3106
       qdisc_put_unlocked+0x6f/0x90 net/sched/sch_generic.c:1109
       tcf_block_release+0x86/0x90 net/sched/cls_api.c:1238
       tc_new_tfilter+0xc0d/0x2350 net/sched/cls_api.c:2148
       rtnetlink_rcv_msg+0x80d/0xb80 net/core/rtnetlink.c:5583
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       ____sys_sendmsg+0x331/0x810 net/socket.c:2413
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
       __sys_sendmmsg+0x195/0x470 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The buggy address belongs to the object at ffff8880985c4800
       which belongs to the cache kmalloc-1k of size 1024
      The buggy address is located 776 bytes inside of
       1024-byte region [ffff8880985c4800, ffff8880985c4c00)
      The buggy address belongs to the page:
      page:ffffea0002617000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x985c0
      head:ffffea0002617000 order:3 compound_mapcount:0 compound_pincount:0
      flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000010200 0000000000000000 dead000000000122 ffff888010c41dc0
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 3, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 1941, ts 1038999441284, free_ts 1033444432829
       prep_new_page mm/page_alloc.c:2434 [inline]
       get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4165
       __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5389
       alloc_pages+0x1aa/0x310 mm/mempolicy.c:2271
       alloc_slab_page mm/slub.c:1799 [inline]
       allocate_slab mm/slub.c:1944 [inline]
       new_slab+0x28a/0x3b0 mm/slub.c:2004
       ___slab_alloc+0x87c/0xe90 mm/slub.c:3018
       __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3105
       slab_alloc_node mm/slub.c:3196 [inline]
       slab_alloc mm/slub.c:3238 [inline]
       __kmalloc+0x2fb/0x340 mm/slub.c:4420
       kmalloc include/linux/slab.h:586 [inline]
       kzalloc include/linux/slab.h:715 [inline]
       __register_sysctl_table+0x112/0x1090 fs/proc/proc_sysctl.c:1335
       neigh_sysctl_register+0x2c8/0x5e0 net/core/neighbour.c:3787
       devinet_sysctl_register+0xb1/0x230 net/ipv4/devinet.c:2618
       inetdev_init+0x286/0x580 net/ipv4/devinet.c:278
       inetdev_event+0xa8a/0x15d0 net/ipv4/devinet.c:1532
       notifier_call_chain+0xb5/0x200 kernel/notifier.c:84
       call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1919
       call_netdevice_notifiers_extack net/core/dev.c:1931 [inline]
       call_netdevice_notifiers net/core/dev.c:1945 [inline]
       register_netdevice+0x1073/0x1500 net/core/dev.c:9698
       veth_newlink+0x59c/0xa90 drivers/net/veth.c:1722
      page last free stack trace:
       reset_page_owner include/linux/page_owner.h:24 [inline]
       free_pages_prepare mm/page_alloc.c:1352 [inline]
       free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1404
       free_unref_page_prepare mm/page_alloc.c:3325 [inline]
       free_unref_page+0x19/0x690 mm/page_alloc.c:3404
       release_pages+0x748/0x1220 mm/swap.c:956
       tlb_batch_pages_flush mm/mmu_gather.c:50 [inline]
       tlb_flush_mmu_free mm/mmu_gather.c:243 [inline]
       tlb_flush_mmu+0xe9/0x6b0 mm/mmu_gather.c:250
       zap_pte_range mm/memory.c:1441 [inline]
       zap_pmd_range mm/memory.c:1490 [inline]
       zap_pud_range mm/memory.c:1519 [inline]
       zap_p4d_range mm/memory.c:1540 [inline]
       unmap_page_range+0x1d1d/0x2a30 mm/memory.c:1561
       unmap_single_vma+0x198/0x310 mm/memory.c:1606
       unmap_vmas+0x16b/0x2f0 mm/memory.c:1638
       exit_mmap+0x201/0x670 mm/mmap.c:3178
       __mmput+0x122/0x4b0 kernel/fork.c:1114
       mmput+0x56/0x60 kernel/fork.c:1135
       exit_mm kernel/exit.c:507 [inline]
       do_exit+0xa3c/0x2a30 kernel/exit.c:793
       do_group_exit+0xd2/0x2f0 kernel/exit.c:935
       __do_sys_exit_group kernel/exit.c:946 [inline]
       __se_sys_exit_group kernel/exit.c:944 [inline]
       __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:944
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Memory state around the buggy address:
       ffff8880985c4a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880985c4a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8880985c4b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
       ffff8880985c4b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880985c4c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      
      Fixes: 470502de ("net: sched: unlock rules update API")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220131172018.3704490-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      04c2a47f
    • Jakub Kicinski's avatar
      ethernet: smc911x: fix indentation in get/set EEPROM · 6dde7acd
      Jakub Kicinski authored
      Build bot produced a smatch indentation warning,
      the code looks correct but it mixes spaces and tabs.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/r/20220131211730.3940875-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6dde7acd
  2. 01 Feb, 2022 4 commits
  3. 31 Jan, 2022 3 commits
    • Karen Sornek's avatar
      i40e: Fix reset path while removing the driver · 6533e558
      Karen Sornek authored
      Fix the crash in kernel while dereferencing the NULL pointer,
      when the driver is unloaded and simultaneously the VSI rings
      are being stopped.
      
      The hardware requires 50msec in order to finish RX queues
      disable. For this purpose the driver spins in mdelay function
      for the operation to be completed.
      
      For example changing number of queues which requires reset would
      fail in the following call stack:
      
      1) i40e_prep_for_reset
      2) i40e_pf_quiesce_all_vsi
      3) i40e_quiesce_vsi
      4) i40e_vsi_close
      5) i40e_down
      6) i40e_vsi_stop_rings
      7) i40e_vsi_control_rx -> disable requires the delay of 50msecs
      8) continue back in i40e_down function where
         i40e_clean_tx_ring(vsi->tx_rings[i]) is going to crash
      
      When the driver was spinning vsi_release called
      i40e_vsi_free_arrays where the vsi->tx_rings resources
      were freed and the pointer was set to NULL.
      
      Fixes: 5b6d4a7f ("i40e: Fix crash during removing i40e driver")
      Signed-off-by: default avatarSlawomir Laba <slawomirx.laba@intel.com>
      Signed-off-by: default avatarSylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
      Signed-off-by: default avatarKaren Sornek <karen.sornek@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      6533e558
    • Jedrzej Jagielski's avatar
      i40e: Fix reset bw limit when DCB enabled with 1 TC · 3d250466
      Jedrzej Jagielski authored
      There was an AQ error I40E_AQ_RC_EINVAL when trying
      to reset bw limit as part of bw allocation setup.
      This was caused by trying to reset bw limit with
      DCB enabled. Bw limit should not be reset when
      DCB is enabled. The code was relying on the pf->flags
      to check if DCB is enabled but if only 1 TC is available
      this flag will not be set even though DCB is enabled.
      Add a check for number of TC and if it is 1
      don't try to reset bw limit even if pf->flags shows
      DCB as disabled.
      
      Fixes: fa38e30a ("i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled")
      Suggested-by: Alexander Lobakin <alexandr.lobakin@intel.com> # Flatten the condition
      Signed-off-by: default avatarSylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
      Signed-off-by: default avatarJedrzej Jagielski <jedrzej.jagielski@intel.com>
      Reviewed-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Tested-by: default avatarImam Hassan Reza Biswas <imam.hassan.reza.biswas@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      3d250466
    • Wen Gu's avatar
      net/smc: Forward wakeup to smc socket waitqueue after fallback · 341adeec
      Wen Gu authored
      When we replace TCP with SMC and a fallback occurs, there may be
      some socket waitqueue entries remaining in smc socket->wq, such
      as eppoll_entries inserted by userspace applications.
      
      After the fallback, data flows over TCP/IP and only clcsocket->wq
      will be woken up. Applications can't be notified by the entries
      which were inserted in smc socket->wq before fallback. So we need
      a mechanism to wake up smc socket->wq at the same time if some
      entries remaining in it.
      
      The current workaround is to transfer the entries from smc socket->wq
      to clcsock->wq during the fallback. But this may cause a crash
      like this:
      
       general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] PREEMPT SMP PTI
       CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G E     5.16.0+ #107
       RIP: 0010:__wake_up_common+0x65/0x170
       Call Trace:
        <IRQ>
        __wake_up_common_lock+0x7a/0xc0
        sock_def_readable+0x3c/0x70
        tcp_data_queue+0x4a7/0xc40
        tcp_rcv_established+0x32f/0x660
        ? sk_filter_trim_cap+0xcb/0x2e0
        tcp_v4_do_rcv+0x10b/0x260
        tcp_v4_rcv+0xd2a/0xde0
        ip_protocol_deliver_rcu+0x3b/0x1d0
        ip_local_deliver_finish+0x54/0x60
        ip_local_deliver+0x6a/0x110
        ? tcp_v4_early_demux+0xa2/0x140
        ? tcp_v4_early_demux+0x10d/0x140
        ip_sublist_rcv_finish+0x49/0x60
        ip_sublist_rcv+0x19d/0x230
        ip_list_rcv+0x13e/0x170
        __netif_receive_skb_list_core+0x1c2/0x240
        netif_receive_skb_list_internal+0x1e6/0x320
        napi_complete_done+0x11d/0x190
        mlx5e_napi_poll+0x163/0x6b0 [mlx5_core]
        __napi_poll+0x3c/0x1b0
        net_rx_action+0x27c/0x300
        __do_softirq+0x114/0x2d2
        irq_exit_rcu+0xb4/0xe0
        common_interrupt+0xba/0xe0
        </IRQ>
        <TASK>
      
      The crash is caused by privately transferring waitqueue entries from
      smc socket->wq to clcsock->wq. The owners of these entries, such as
      epoll, have no idea that the entries have been transferred to a
      different socket wait queue and still use original waitqueue spinlock
      (smc socket->wq.wait.lock) to make the entries operation exclusive,
      but it doesn't work. The operations to the entries, such as removing
      from the waitqueue (now is clcsock->wq after fallback), may cause a
      crash when clcsock waitqueue is being iterated over at the moment.
      
      This patch tries to fix this by no longer transferring wait queue
      entries privately, but introducing own implementations of clcsock's
      callback functions in fallback situation. The callback functions will
      forward the wakeup to smc socket->wq if clcsock->wq is actually woken
      up and smc socket->wq has remaining entries.
      
      Fixes: 2153bd1e ("net/smc: Transfer remaining wait queue entries during fallback")
      Suggested-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      341adeec
  4. 28 Jan, 2022 10 commits
    • Jisheng Zhang's avatar
      net: stmmac: properly handle with runtime pm in stmmac_dvr_remove() · 64495203
      Jisheng Zhang authored
      There are two issues with runtime pm handling in stmmac_dvr_remove():
      
      1. the mac is runtime suspended before stopping dma and rx/tx. We
      need to ensure the device is properly resumed back.
      
      2. the stmmaceth clk enable/disable isn't balanced in both exit and
      error handling code path. Take the exit code path for example, when we
      unbind the driver or rmmod the driver module, the mac is runtime
      suspended as said above, so the stmmaceth clk is disabled, but
      	stmmac_dvr_remove()
      	  stmmac_remove_config_dt()
      	    clk_disable_unprepare()
      CCF will complain this time. The error handling code path suffers
      from the similar situtaion.
      
      Here are kernel warnings in error handling code path on Allwinner D1
      platform:
      
      [    1.604695] ------------[ cut here ]------------
      [    1.609328] bus-emac already disabled
      [    1.613015] WARNING: CPU: 0 PID: 38 at drivers/clk/clk.c:952 clk_core_disable+0xcc/0xec
      [    1.621039] CPU: 0 PID: 38 Comm: kworker/u2:1 Not tainted 5.14.0-rc4#1
      [    1.627653] Hardware name: Allwinner D1 NeZha (DT)
      [    1.632443] Workqueue: events_unbound deferred_probe_work_func
      [    1.638286] epc : clk_core_disable+0xcc/0xec
      [    1.642561]  ra : clk_core_disable+0xcc/0xec
      [    1.646835] epc : ffffffff8023c2ec ra : ffffffff8023c2ec sp : ffffffd00411bb10
      [    1.654054]  gp : ffffffff80ec9988 tp : ffffffe00143a800 t0 : ffffffff80ed6a6f
      [    1.661272]  t1 : ffffffff80ed6a60 t2 : 0000000000000000 s0 : ffffffe001509e00
      [    1.668489]  s1 : 0000000000000001 a0 : 0000000000000019 a1 : ffffffff80e80bd8
      [    1.675707]  a2 : 00000000ffffefff a3 : 00000000000000f4 a4 : 0000000000000002
      [    1.682924]  a5 : 0000000000000001 a6 : 0000000000000030 a7 : 00000000028f5c29
      [    1.690141]  s2 : 0000000000000800 s3 : ffffffe001375000 s4 : ffffffe01fdf7a80
      [    1.697358]  s5 : ffffffe001375010 s6 : ffffffff8001fc10 s7 : ffffffffffffffff
      [    1.704577]  s8 : 0000000000000001 s9 : ffffffff80ecb248 s10: ffffffe001b80000
      [    1.711794]  s11: ffffffe001b80760 t3 : 0000000000000062 t4 : ffffffffffffffff
      [    1.719012]  t5 : ffffffff80e0f6d8 t6 : ffffffd00411b8f0
      [    1.724321] status: 8000000201800100 badaddr: 0000000000000000 cause: 0000000000000003
      [    1.732233] [<ffffffff8023c2ec>] clk_core_disable+0xcc/0xec
      [    1.737810] [<ffffffff80240430>] clk_disable+0x38/0x78
      [    1.742956] [<ffffffff8001fc0c>] worker_thread+0x1a8/0x4d8
      [    1.748451] [<ffffffff8031a500>] stmmac_remove_config_dt+0x1c/0x4c
      [    1.754646] [<ffffffff8031c8ec>] sun8i_dwmac_probe+0x378/0x82c
      [    1.760484] [<ffffffff8001fc0c>] worker_thread+0x1a8/0x4d8
      [    1.765975] [<ffffffff8029a6c8>] platform_probe+0x64/0xf0
      [    1.771382] [<ffffffff8029833c>] really_probe.part.0+0x8c/0x30c
      [    1.777305] [<ffffffff8029865c>] __driver_probe_device+0xa0/0x148
      [    1.783402] [<ffffffff8029873c>] driver_probe_device+0x38/0x138
      [    1.789324] [<ffffffff802989cc>] __device_attach_driver+0xd0/0x170
      [    1.795508] [<ffffffff802988f8>] __driver_attach_async_helper+0xbc/0xc0
      [    1.802125] [<ffffffff802965ac>] bus_for_each_drv+0x68/0xb4
      [    1.807701] [<ffffffff80298d1c>] __device_attach+0xd8/0x184
      [    1.813277] [<ffffffff802967b0>] bus_probe_device+0x98/0xbc
      [    1.818852] [<ffffffff80297904>] deferred_probe_work_func+0x90/0xd4
      [    1.825122] [<ffffffff8001f8b8>] process_one_work+0x1e4/0x390
      [    1.830872] [<ffffffff8001fd80>] worker_thread+0x31c/0x4d8
      [    1.836362] [<ffffffff80026bf4>] kthreadd+0x94/0x188
      [    1.841335] [<ffffffff80026bf4>] kthreadd+0x94/0x188
      [    1.846304] [<ffffffff8001fa60>] process_one_work+0x38c/0x390
      [    1.852054] [<ffffffff80026564>] kthread+0x124/0x160
      [    1.857021] [<ffffffff8002643c>] set_kthread_struct+0x5c/0x60
      [    1.862770] [<ffffffff80001f08>] ret_from_syscall_rejected+0x8/0xc
      [    1.868956] ---[ end trace 8d5c6046255f84a0 ]---
      [    1.873675] ------------[ cut here ]------------
      [    1.878366] bus-emac already unprepared
      [    1.882378] WARNING: CPU: 0 PID: 38 at drivers/clk/clk.c:810 clk_core_unprepare+0xe4/0x168
      [    1.890673] CPU: 0 PID: 38 Comm: kworker/u2:1 Tainted: G        W	5.14.0-rc4 #1
      [    1.898674] Hardware name: Allwinner D1 NeZha (DT)
      [    1.903464] Workqueue: events_unbound deferred_probe_work_func
      [    1.909305] epc : clk_core_unprepare+0xe4/0x168
      [    1.913840]  ra : clk_core_unprepare+0xe4/0x168
      [    1.918375] epc : ffffffff8023d6cc ra : ffffffff8023d6cc sp : ffffffd00411bb10
      [    1.925593]  gp : ffffffff80ec9988 tp : ffffffe00143a800 t0 : 0000000000000002
      [    1.932811]  t1 : ffffffe01f743be0 t2 : 0000000000000040 s0 : ffffffe001509e00
      [    1.940029]  s1 : 0000000000000001 a0 : 000000000000001b a1 : ffffffe00143a800
      [    1.947246]  a2 : 0000000000000000 a3 : 00000000000000f4 a4 : 0000000000000001
      [    1.954463]  a5 : 0000000000000000 a6 : 0000000005fce2a5 a7 : 0000000000000001
      [    1.961680]  s2 : 0000000000000800 s3 : ffffffff80afeb90 s4 : ffffffe01fdf7a80
      [    1.968898]  s5 : ffffffe001375010 s6 : ffffffff8001fc10 s7 : ffffffffffffffff
      [    1.976115]  s8 : 0000000000000001 s9 : ffffffff80ecb248 s10: ffffffe001b80000
      [    1.983333]  s11: ffffffe001b80760 t3 : ffffffff80b39120 t4 : 0000000000000001
      [    1.990550]  t5 : 0000000000000000 t6 : ffffffe001600002
      [    1.995859] status: 8000000201800120 badaddr: 0000000000000000 cause: 0000000000000003
      [    2.003771] [<ffffffff8023d6cc>] clk_core_unprepare+0xe4/0x168
      [    2.009609] [<ffffffff802403a0>] clk_unprepare+0x24/0x3c
      [    2.014929] [<ffffffff8031a508>] stmmac_remove_config_dt+0x24/0x4c
      [    2.021125] [<ffffffff8031c8ec>] sun8i_dwmac_probe+0x378/0x82c
      [    2.026965] [<ffffffff8001fc0c>] worker_thread+0x1a8/0x4d8
      [    2.032463] [<ffffffff8029a6c8>] platform_probe+0x64/0xf0
      [    2.037871] [<ffffffff8029833c>] really_probe.part.0+0x8c/0x30c
      [    2.043795] [<ffffffff8029865c>] __driver_probe_device+0xa0/0x148
      [    2.049892] [<ffffffff8029873c>] driver_probe_device+0x38/0x138
      [    2.055815] [<ffffffff802989cc>] __device_attach_driver+0xd0/0x170
      [    2.061999] [<ffffffff802988f8>] __driver_attach_async_helper+0xbc/0xc0
      [    2.068616] [<ffffffff802965ac>] bus_for_each_drv+0x68/0xb4
      [    2.074193] [<ffffffff80298d1c>] __device_attach+0xd8/0x184
      [    2.079769] [<ffffffff802967b0>] bus_probe_device+0x98/0xbc
      [    2.085345] [<ffffffff80297904>] deferred_probe_work_func+0x90/0xd4
      [    2.091616] [<ffffffff8001f8b8>] process_one_work+0x1e4/0x390
      [    2.097367] [<ffffffff8001fd80>] worker_thread+0x31c/0x4d8
      [    2.102858] [<ffffffff80026bf4>] kthreadd+0x94/0x188
      [    2.107830] [<ffffffff80026bf4>] kthreadd+0x94/0x188
      [    2.112800] [<ffffffff8001fa60>] process_one_work+0x38c/0x390
      [    2.118551] [<ffffffff80026564>] kthread+0x124/0x160
      [    2.123520] [<ffffffff8002643c>] set_kthread_struct+0x5c/0x60
      [    2.129268] [<ffffffff80001f08>] ret_from_syscall_rejected+0x8/0xc
      [    2.135455] ---[ end trace 8d5c6046255f84a1 ]---
      
      Fixes: 5ec55823 ("net: stmmac: add clocks management for gmac driver")
      Signed-off-by: default avatarJisheng Zhang <jszhang@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      64495203
    • David S. Miller's avatar
      Merge tag 'ieee802154-for-net-2022-01-28' of... · 010a2a66
      David S. Miller authored
      Merge tag 'ieee802154-for-net-2022-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
      
      Stefan Schmidt says:
      
      ====================
      pull-request: ieee802154 for net 2022-01-28
      
      An update from ieee802154 for your *net* tree.
      
      A bunch of fixes in drivers, all from Miquel Raynal.
      Clarifying the default channel in hwsim, leak fixes in at86rf230 and ca8210 as
      well as a symbol duration fix for mcr20a. Topping up the driver fixes with
      better error codes in nl802154 and a cleanup in MAINTAINERS for an orphaned
      driver.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      010a2a66
    • Haiyue Wang's avatar
      gve: fix the wrong AdminQ buffer queue index check · 1f84a945
      Haiyue Wang authored
      The 'tail' and 'head' are 'unsigned int' type free-running count, when
      'head' is overflow, the 'int i (= tail) < u32 head' will be false:
      
      Only '- loop 0: idx = 63' result is shown, so it needs to use 'int' type
      to compare, it can handle the overflow correctly.
      
      typedef uint32_t u32;
      
      int main()
      {
              u32 tail, head;
              int stail, shead;
              int i, loop;
      
              tail = 0xffffffff;
              head = 0x00000000;
      
              for (i = tail, loop = 0; i < head; i++) {
                      unsigned int idx = i & 63;
      
                      printf("+ loop %d: idx = %u\n", loop++, idx);
              }
      
              stail = tail;
              shead = head;
              for (i = stail, loop = 0; i < shead; i++) {
                      unsigned int idx = i & 63;
      
                      printf("- loop %d: idx = %u\n", loop++, idx);
              }
      
              return 0;
      }
      
      Fixes: 5cdad90d ("gve: Batch AQ commands for creating and destroying queues.")
      Signed-off-by: default avatarHaiyue Wang <haiyue.wang@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f84a945
    • David S. Miller's avatar
      Merge branch 'ax25-fixes' · 501c8f5e
      David S. Miller authored
      Duoming Zhou says:
      
      ====================
      ax25: fix NPD and UAF bugs when detaching ax25 device
      
      There are NPD and UAF bugs when detaching ax25 device, we
      use lock and refcount to mitigate these bugs.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      501c8f5e
    • Duoming Zhou's avatar
      ax25: add refcount in ax25_dev to avoid UAF bugs · d01ffb9e
      Duoming Zhou authored
      If we dereference ax25_dev after we call kfree(ax25_dev) in
      ax25_dev_device_down(), it will lead to concurrency UAF bugs.
      There are eight syscall functions suffer from UAF bugs, include
      ax25_bind(), ax25_release(), ax25_connect(), ax25_ioctl(),
      ax25_getname(), ax25_sendmsg(), ax25_getsockopt() and
      ax25_info_show().
      
      One of the concurrency UAF can be shown as below:
      
        (USE)                       |    (FREE)
                                    |  ax25_device_event
                                    |    ax25_dev_device_down
      ax25_bind                     |    ...
        ...                         |      kfree(ax25_dev)
        ax25_fillin_cb()            |    ...
          ax25_fillin_cb_from_dev() |
        ...                         |
      
      The root cause of UAF bugs is that kfree(ax25_dev) in
      ax25_dev_device_down() is not protected by any locks.
      When ax25_dev, which there are still pointers point to,
      is released, the concurrency UAF bug will happen.
      
      This patch introduces refcount into ax25_dev in order to
      guarantee that there are no pointers point to it when ax25_dev
      is released.
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d01ffb9e
    • Duoming Zhou's avatar
      ax25: improve the incomplete fix to avoid UAF and NPD bugs · 4e0f718d
      Duoming Zhou authored
      The previous commit 1ade48d0 ("ax25: NPD bug when detaching
      AX25 device") introduce lock_sock() into ax25_kill_by_device to
      prevent NPD bug. But the concurrency NPD or UAF bug will occur,
      when lock_sock() or release_sock() dereferences the ax25_cb->sock.
      
      The NULL pointer dereference bug can be shown as below:
      
      ax25_kill_by_device()        | ax25_release()
                                   |   ax25_destroy_socket()
                                   |     ax25_cb_del()
        ...                        |     ...
                                   |     ax25->sk=NULL;
        lock_sock(s->sk); //(1)    |
        s->ax25_dev = NULL;        |     ...
        release_sock(s->sk); //(2) |
        ...                        |
      
      The root cause is that the sock is set to null before dereference
      site (1) or (2). Therefore, this patch extracts the ax25_cb->sock
      in advance, and uses ax25_list_lock to protect it, which can synchronize
      with ax25_cb_del() and ensure the value of sock is not null before
      dereference sites.
      
      The concurrency UAF bug can be shown as below:
      
      ax25_kill_by_device()        | ax25_release()
                                   |   ax25_destroy_socket()
        ...                        |   ...
                                   |   sock_put(sk); //FREE
        lock_sock(s->sk); //(1)    |
        s->ax25_dev = NULL;        |   ...
        release_sock(s->sk); //(2) |
        ...                        |
      
      The root cause is that the sock is released before dereference
      site (1) or (2). Therefore, this patch uses sock_hold() to increase
      the refcount of sock and uses ax25_list_lock to protect it, which
      can synchronize with ax25_cb_del() in ax25_destroy_socket() and
      ensure the sock wil not be released before dereference sites.
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e0f718d
    • Yuji Ishikawa's avatar
      net: stmmac: dwmac-visconti: No change to ETHER_CLOCK_SEL for unexpected speed request. · 928d6fe9
      Yuji Ishikawa authored
      Variable clk_sel_val is not initialized in the default case of the first switch statement.
      In that case, the function should return immediately without any changes to the hardware.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Fixes: b38dd98f ("net: stmmac: Add Toshiba Visconti SoCs glue driver")
      Signed-off-by: default avatarYuji Ishikawa <yuji2.ishikawa@toshiba.co.jp>
      Reviewed-by: default avatarNobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      928d6fe9
    • Raju Rangoju's avatar
      net: amd-xgbe: ensure to reset the tx_timer_active flag · 7674b7b5
      Raju Rangoju authored
      Ensure to reset the tx_timer_active flag in xgbe_stop(),
      otherwise a port restart may result in tx timeout due to
      uncleared flag.
      
      Fixes: c635eaac ("amd-xgbe: Remove Tx coalescing")
      Co-developed-by: default avatarSudheesh Mavila <sudheesh.mavila@amd.com>
      Signed-off-by: default avatarSudheesh Mavila <sudheesh.mavila@amd.com>
      Signed-off-by: default avatarRaju Rangoju <Raju.Rangoju@amd.com>
      Acked-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Link: https://lore.kernel.org/r/20220127060222.453371-1-Raju.Rangoju@amd.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7674b7b5
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 33d12dc9
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      1) Remove leftovers from flowtable modules, from Geert Uytterhoeven.
      
      2) Missing refcount increment of conntrack template in nft_ct,
         from Florian Westphal.
      
      3) Reduce nft_zone selftest time, also from Florian.
      
      4) Add selftest to cover stateless NAT on fragments, from Florian Westphal.
      
      5) Do not set net_device when for reject packets from the bridge path,
         from Phil Sutter.
      
      6) Cancel register tracking info on nft_byteorder operations.
      
      7) Extend nft_concat_range selftest to cover set reload with no elements,
         from Florian Westphal.
      
      8) Remove useless update of pointer in chain blob builder, reported
         by kbuild test robot.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
        netfilter: nf_tables: remove assignment with no effect in chain blob builder
        selftests: nft_concat_range: add test for reload with no element add/del
        netfilter: nft_byteorder: track register operations
        netfilter: nft_reject_bridge: Fix for missing reply from prerouting
        selftests: netfilter: check stateless nat udp checksum fixup
        selftests: netfilter: reduce zone stress test running time
        netfilter: nft_ct: fix use after free when attaching zone template
        netfilter: Remove flowtable relics
      ====================
      
      Link: https://lore.kernel.org/r/20220127235235.656931-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      33d12dc9
    • Shyam Sundar S K's avatar
      net: amd-xgbe: Fix skb data length underflow · 5aac9108
      Shyam Sundar S K authored
      There will be BUG_ON() triggered in include/linux/skbuff.h leading to
      intermittent kernel panic, when the skb length underflow is detected.
      
      Fix this by dropping the packet if such length underflows are seen
      because of inconsistencies in the hardware descriptors.
      
      Fixes: 622c36f1 ("amd-xgbe: Fix jumbo MTU processing on newer hardware")
      Suggested-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarShyam Sundar S K <Shyam-sundar.S-k@amd.com>
      Acked-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Link: https://lore.kernel.org/r/20220127092003.2812745-1-Shyam-sundar.S-k@amd.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5aac9108
  5. 27 Jan, 2022 4 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 23a46422
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from netfilter and can.
      
        Current release - new code bugs:
      
         - tcp: add a missing sk_defer_free_flush() in tcp_splice_read()
      
         - tcp: add a stub for sk_defer_free_flush(), fix CONFIG_INET=n
      
         - nf_tables: set last expression in register tracking area
      
         - nft_connlimit: fix memleak if nf_ct_netns_get() fails
      
         - mptcp: fix removing ids bitmap setting
      
         - bonding: use rcu_dereference_rtnl when getting active slave
      
         - fix three cases of sleep in atomic context in drivers: lan966x, gve
      
         - handful of build fixes for esoteric drivers after netdev->dev_addr
           was made const
      
        Previous releases - regressions:
      
         - revert "ipv6: Honor all IPv6 PIO Valid Lifetime values", it broke
           Linux compatibility with USGv6 tests
      
         - procfs: show net device bound packet types
      
         - ipv4: fix ip option filtering for locally generated fragments
      
         - phy: broadcom: hook up soft_reset for BCM54616S
      
        Previous releases - always broken:
      
         - ipv4: raw: lock the socket in raw_bind()
      
         - ipv4: decrease the use of shared IPID generator to decrease the
           chance of attackers guessing the values
      
         - procfs: fix cross-netns information leakage in /proc/net/ptype
      
         - ethtool: fix link extended state for big endian
      
         - bridge: vlan: fix single net device option dumping
      
         - ping: fix the sk_bound_dev_if match in ping_lookup"
      
      * tag 'net-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (86 commits)
        net: bridge: vlan: fix memory leak in __allowed_ingress
        net: socket: rename SKB_DROP_REASON_SOCKET_FILTER
        ipv4: remove sparse error in ip_neigh_gw4()
        ipv4: avoid using shared IP generator for connected sockets
        ipv4: tcp: send zero IPID in SYNACK messages
        ipv4: raw: lock the socket in raw_bind()
        MAINTAINERS: add missing IPv4/IPv6 header paths
        MAINTAINERS: add more files to eth PHY
        net: stmmac: dwmac-sun8i: use return val of readl_poll_timeout()
        net: bridge: vlan: fix single net device option dumping
        net: stmmac: skip only stmmac_ptp_register when resume from suspend
        net: stmmac: configure PTP clock source prior to PTP initialization
        Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values"
        connector/cn_proc: Use task_is_in_init_pid_ns()
        pid: Introduce helper task_is_in_init_pid_ns()
        gve: Fix GFP flags when allocing pages
        net: lan966x: Fix sleep in atomic context when updating MAC table
        net: lan966x: Fix sleep in atomic context when injecting frames
        ethernet: seeq/ether3: don't write directly to netdev->dev_addr
        ethernet: 8390/etherh: don't write directly to netdev->dev_addr
        ...
      23a46422
    • Tim Yi's avatar
      net: bridge: vlan: fix memory leak in __allowed_ingress · fd20d973
      Tim Yi authored
      When using per-vlan state, if vlan snooping and stats are disabled,
      untagged or priority-tagged ingress frame will go to check pvid state.
      If the port state is forwarding and the pvid state is not
      learning/forwarding, untagged or priority-tagged frame will be dropped
      but skb memory is not freed.
      Should free skb when __allowed_ingress returns false.
      
      Fixes: a580c76d ("net: bridge: vlan: add per-vlan state")
      Signed-off-by: default avatarTim Yi <tim.yi@pica8.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Link: https://lore.kernel.org/r/20220127074953.12632-1-tim.yi@pica8.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fd20d973
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: remove assignment with no effect in chain blob builder · b07f4137
      Pablo Neira Ayuso authored
      cppcheck possible warnings:
      
      >> net/netfilter/nf_tables_api.c:2014:2: warning: Assignment of function parameter has no effect outside the function. Did you forget dereferencing it? [uselessAssignmentPtrArg]
          ptr += offsetof(struct nft_rule_dp, data);
          ^
      Reported-by: default avatarkernel test robot <yujie.liu@intel.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b07f4137
    • Menglong Dong's avatar
      net: socket: rename SKB_DROP_REASON_SOCKET_FILTER · 364df53c
      Menglong Dong authored
      Rename SKB_DROP_REASON_SOCKET_FILTER, which is used
      as the reason of skb drop out of socket filter before
      it's part of a released kernel. It will be used for
      more protocols than just TCP in future series.
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/all/20220127091308.91401-2-imagedong@tencent.com/Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      364df53c