1. 12 May, 2022 1 commit
  2. 11 May, 2022 12 commits
  3. 10 May, 2022 9 commits
  4. 09 May, 2022 10 commits
  5. 06 May, 2022 8 commits
    • Kees Cook's avatar
      net: chelsio: cxgb4: Avoid potential negative array offset · 1c7ab9cd
      Kees Cook authored
      Using min_t(int, ...) as a potential array index implies to the compiler
      that negative offsets should be allowed. This is not the case, though.
      Replace "int" with "unsigned int". Fixes the following warning exposed
      under future CONFIG_FORTIFY_SOURCE improvements:
      
      In file included from include/linux/string.h:253,
                       from include/linux/bitmap.h:11,
                       from include/linux/cpumask.h:12,
                       from include/linux/smp.h:13,
                       from include/linux/lockdep.h:14,
                       from include/linux/rcupdate.h:29,
                       from include/linux/rculist.h:11,
                       from include/linux/pid.h:5,
                       from include/linux/sched.h:14,
                       from include/linux/delay.h:23,
                       from drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:35:
      drivers/net/ethernet/chelsio/cxgb4/t4_hw.c: In function 't4_get_raw_vpd_params':
      include/linux/fortify-string.h:46:33: warning: '__builtin_memcpy' pointer overflow between offset 29 and size [2147483648, 4294967295] [-Warray-bounds]
         46 | #define __underlying_memcpy     __builtin_memcpy
            |                                 ^
      include/linux/fortify-string.h:388:9: note: in expansion of macro '__underlying_memcpy'
        388 |         __underlying_##op(p, q, __fortify_size);                        \
            |         ^~~~~~~~~~~~~
      include/linux/fortify-string.h:433:26: note: in expansion of macro '__fortify_memcpy_chk'
        433 | #define memcpy(p, q, s)  __fortify_memcpy_chk(p, q, s,                  \
            |                          ^~~~~~~~~~~~~~~~~~~~
      drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2796:9: note: in expansion of macro 'memcpy'
       2796 |         memcpy(p->id, vpd + id, min_t(int, id_len, ID_LEN));
            |         ^~~~~~
      include/linux/fortify-string.h:46:33: warning: '__builtin_memcpy' pointer overflow between offset 0 and size [2147483648, 4294967295] [-Warray-bounds]
         46 | #define __underlying_memcpy     __builtin_memcpy
            |                                 ^
      include/linux/fortify-string.h:388:9: note: in expansion of macro '__underlying_memcpy'
        388 |         __underlying_##op(p, q, __fortify_size);                        \
            |         ^~~~~~~~~~~~~
      include/linux/fortify-string.h:433:26: note: in expansion of macro '__fortify_memcpy_chk'
        433 | #define memcpy(p, q, s)  __fortify_memcpy_chk(p, q, s,                  \
            |                          ^~~~~~~~~~~~~~~~~~~~
      drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2798:9: note: in expansion of macro 'memcpy'
       2798 |         memcpy(p->sn, vpd + sn, min_t(int, sn_len, SERNUM_LEN));
            |         ^~~~~~
      
      Additionally remove needless cast from u8[] to char * in last strim()
      call.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/lkml/202205031926.FVP7epJM-lkp@intel.com
      Fixes: fc927929 ("cxgb4: Search VPD with pci_vpd_find_ro_info_keyword()")
      Fixes: 24c521f8 ("cxgb4: Use pci_vpd_find_id_string() to find VPD ID string")
      Cc: Raju Rangoju <rajur@chelsio.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220505233101.1224230-1-keescook@chromium.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1c7ab9cd
    • Eric Dumazet's avatar
      netlink: do not reset transport header in netlink_recvmsg() · d5076fe4
      Eric Dumazet authored
      netlink_recvmsg() does not need to change transport header.
      
      If transport header was needed, it should have been reset
      by the producer (netlink_dump()), not the consumer(s).
      
      The following trace probably happened when multiple threads
      were using MSG_PEEK.
      
      BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg
      
      write to 0xffff88811e9f15b2 of 2 bytes by task 32012 on cpu 1:
       skb_reset_transport_header include/linux/skbuff.h:2760 [inline]
       netlink_recvmsg+0x1de/0x790 net/netlink/af_netlink.c:1978
       sock_recvmsg_nosec net/socket.c:948 [inline]
       sock_recvmsg net/socket.c:966 [inline]
       __sys_recvfrom+0x204/0x2c0 net/socket.c:2097
       __do_sys_recvfrom net/socket.c:2115 [inline]
       __se_sys_recvfrom net/socket.c:2111 [inline]
       __x64_sys_recvfrom+0x74/0x90 net/socket.c:2111
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      write to 0xffff88811e9f15b2 of 2 bytes by task 32005 on cpu 0:
       skb_reset_transport_header include/linux/skbuff.h:2760 [inline]
       netlink_recvmsg+0x1de/0x790 net/netlink/af_netlink.c:1978
       ____sys_recvmsg+0x162/0x2f0
       ___sys_recvmsg net/socket.c:2674 [inline]
       __sys_recvmsg+0x209/0x3f0 net/socket.c:2704
       __do_sys_recvmsg net/socket.c:2714 [inline]
       __se_sys_recvmsg net/socket.c:2711 [inline]
       __x64_sys_recvmsg+0x42/0x50 net/socket.c:2711
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0xffff -> 0x0000
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 32005 Comm: syz-executor.4 Not tainted 5.18.0-rc1-syzkaller-00328-ge1f700eb-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220505161946.2867638-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d5076fe4
    • Lokesh Dhoundiyal's avatar
      ipv4: drop dst in multicast routing path · 9e6c6d17
      Lokesh Dhoundiyal authored
      kmemleak reports the following when routing multicast traffic over an
      ipsec tunnel.
      
      Kmemleak output:
      unreferenced object 0x8000000044bebb00 (size 256):
        comm "softirq", pid 0, jiffies 4294985356 (age 126.810s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 80 00 00 00 05 13 74 80  ..............t.
          80 00 00 00 04 9b bf f9 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f83947e0>] __kmalloc+0x1e8/0x300
          [<00000000b7ed8dca>] metadata_dst_alloc+0x24/0x58
          [<0000000081d32c20>] __ipgre_rcv+0x100/0x2b8
          [<00000000824f6cf1>] gre_rcv+0x178/0x540
          [<00000000ccd4e162>] gre_rcv+0x7c/0xd8
          [<00000000c024b148>] ip_protocol_deliver_rcu+0x124/0x350
          [<000000006a483377>] ip_local_deliver_finish+0x54/0x68
          [<00000000d9271b3a>] ip_local_deliver+0x128/0x168
          [<00000000bd4968ae>] xfrm_trans_reinject+0xb8/0xf8
          [<0000000071672a19>] tasklet_action_common.isra.16+0xc4/0x1b0
          [<0000000062e9c336>] __do_softirq+0x1fc/0x3e0
          [<00000000013d7914>] irq_exit+0xc4/0xe0
          [<00000000a4d73e90>] plat_irq_dispatch+0x7c/0x108
          [<000000000751eb8e>] handle_int+0x16c/0x178
          [<000000001668023b>] _raw_spin_unlock_irqrestore+0x1c/0x28
      
      The metadata dst is leaked when ip_route_input_mc() updates the dst for
      the skb. Commit f38a9eb1 ("dst: Metadata destinations") correctly
      handled dropping the dst in ip_route_input_slow() but missed the
      multicast case which is handled by ip_route_input_mc(). Drop the dst in
      ip_route_input_mc() avoiding the leak.
      
      Fixes: f38a9eb1 ("dst: Metadata destinations")
      Signed-off-by: default avatarLokesh Dhoundiyal <lokesh.dhoundiyal@alliedtelesis.co.nz>
      Signed-off-by: default avatarChris Packham <chris.packham@alliedtelesis.co.nz>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20220505020017.3111846-1-chris.packham@alliedtelesis.co.nzSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9e6c6d17
    • Michal Michalik's avatar
      ice: fix PTP stale Tx timestamps cleanup · a11b6c1a
      Michal Michalik authored
      Read stale PTP Tx timestamps from PHY on cleanup.
      
      After running out of Tx timestamps request handlers, hardware (HW) stops
      reporting finished requests. Function ice_ptp_tx_tstamp_cleanup() used
      to only clean up stale handlers in driver and was leaving the hardware
      registers not read. Not reading stale PTP Tx timestamps prevents next
      interrupts from arriving and makes timestamping unusable.
      
      Fixes: ea9b847c ("ice: enable transmit timestamps for E810 devices")
      Signed-off-by: default avatarMichal Michalik <michal.michalik@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      a11b6c1a
    • Anatolii Gerasymenko's avatar
      ice: clear stale Tx queue settings before configuring · 6096dae9
      Anatolii Gerasymenko authored
      The iAVF driver uses 3 virtchnl op codes to communicate with the PF
      regarding the VF Tx queues:
      
      * VIRTCHNL_OP_CONFIG_VSI_QUEUES configures the hardware and firmware
      logic for the Tx queues
      
      * VIRTCHNL_OP_ENABLE_QUEUES configures the queue interrupts
      
      * VIRTCHNL_OP_DISABLE_QUEUES disables the queue interrupts and Tx rings.
      
      There is a bug in the iAVF driver due to the race condition between VF
      reset request and shutdown being executed in parallel. This leads to a
      break in logic and VIRTCHNL_OP_DISABLE_QUEUES is not being sent.
      
      If this occurs, the PF driver never cleans up the Tx queues. This results
      in leaving behind stale Tx queue settings in the hardware and firmware.
      
      The most obvious outcome is that upon the next
      VIRTCHNL_OP_CONFIG_VSI_QUEUES, the PF will fail to program the Tx
      scheduler node due to a lack of space.
      
      We need to protect ICE driver against such situation.
      
      To fix this, make sure we clear existing stale settings out when
      handling VIRTCHNL_OP_CONFIG_VSI_QUEUES. This ensures we remove the
      previous settings.
      
      Calling ice_vf_vsi_dis_single_txq should be safe as it will do nothing if
      the queue is not configured. The function already handles the case when the
      Tx queue is not currently configured and exits with a 0 return in that
      case.
      
      Fixes: 7ad15440 ("ice: Refactor VIRTCHNL_OP_CONFIG_VSI_QUEUES handling")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarAnatolii Gerasymenko <anatolii.gerasymenko@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      6096dae9
    • Ivan Vecera's avatar
      ice: Fix race during aux device (un)plugging · 486b9eee
      Ivan Vecera authored
      Function ice_plug_aux_dev() assigns pf->adev field too early prior
      aux device initialization and on other side ice_unplug_aux_dev()
      starts aux device deinit and at the end assigns NULL to pf->adev.
      This is wrong because pf->adev should always be non-NULL only when
      aux device is fully initialized and ready. This wrong order causes
      a crash when ice_send_event_to_aux() call occurs because that function
      depends on non-NULL value of pf->adev and does not assume that
      aux device is half-initialized or half-destroyed.
      After order correction the race window is tiny but it is still there,
      as Leon mentioned and manipulation with pf->adev needs to be protected
      by mutex.
      
      Fix (un-)plugging functions so pf->adev field is set after aux device
      init and prior aux device destroy and protect pf->adev assignment by
      new mutex. This mutex is also held during ice_send_event_to_aux()
      call to ensure that aux device is valid during that call.
      Note that device lock used ice_send_event_to_aux() needs to be kept
      to avoid race with aux drv unload.
      
      Reproducer:
      cycle=1
      while :;do
              echo "#### Cycle: $cycle"
      
              ip link set ens7f0 mtu 9000
              ip link add bond0 type bond mode 1 miimon 100
              ip link set bond0 up
              ifenslave bond0 ens7f0
              ip link set bond0 mtu 9000
              ethtool -L ens7f0 combined 1
              ip link del bond0
              ip link set ens7f0 mtu 1500
              sleep 1
      
              let cycle++
      done
      
      In short when the device is added/removed to/from bond the aux device
      is unplugged/plugged. When MTU of the device is changed an event is
      sent to aux device asynchronously. This can race with (un)plugging
      operation and because pf->adev is set too early (plug) or too late
      (unplug) the function ice_send_event_to_aux() can touch uninitialized
      or destroyed fields. In the case of crash below pf->adev->dev.mutex.
      
      Crash:
      [   53.372066] bond0: (slave ens7f0): making interface the new active one
      [   53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u
      p link
      [   53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
      [   53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an up
       link
      [   54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed inval
      idating tc mappings. Priority traffic classification disabled!
      [   54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed inval
      idating tc mappings. Priority traffic classification disabled!
      [   54.248204] bond0: (slave ens7f0): Releasing backup interface
      [   54.253955] bond0: (slave ens7f1): making interface the new active one
      [   54.274875] bond0: (slave ens7f1): Releasing backup interface
      [   54.289153] bond0 (unregistering): Released all slaves
      [   55.383179] MII link monitoring set to 100 ms
      [   55.398696] bond0: (slave ens7f0): making interface the new active one
      [   55.405241] BUG: kernel NULL pointer dereference, address: 0000000000000080
      [   55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u
      p link
      [   55.412198] #PF: supervisor write access in kernel mode
      [   55.412200] #PF: error_code(0x0002) - not-present page
      [   55.412201] PGD 25d2ad067 P4D 0
      [   55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI
      [   55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G S
                 5.17.0-13579-g57f2d6540f03 #1
      [   55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an up
       link
      [   55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4 10/07/
      2021
      [   55.430226] Workqueue: ice ice_service_task [ice]
      [   55.468169] RIP: 0010:mutex_unlock+0x10/0x20
      [   55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 <f0> 48 0f b1 17 75 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48
      [   55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246
      [   55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: 0000000000000001
      [   55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: 0000000000000080
      [   55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: 0000000000000041
      [   55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: ff1a79d1c7e48bc0
      [   55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: 0000000000000000
      [   55.532076] FS:  0000000000000000(0000) GS:ff1a79d0ffc00000(0000) knlGS:0000000000000000
      [   55.540163] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: 0000000000771ef0
      [   55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   55.567305] PKRU: 55555554
      [   55.570018] Call Trace:
      [   55.572474]  <TASK>
      [   55.574579]  ice_service_task+0xaab/0xef0 [ice]
      [   55.579130]  process_one_work+0x1c5/0x390
      [   55.583141]  ? process_one_work+0x390/0x390
      [   55.587326]  worker_thread+0x30/0x360
      [   55.590994]  ? process_one_work+0x390/0x390
      [   55.595180]  kthread+0xe6/0x110
      [   55.598325]  ? kthread_complete_and_exit+0x20/0x20
      [   55.603116]  ret_from_fork+0x1f/0x30
      [   55.606698]  </TASK>
      
      Fixes: f9f5301e ("ice: Register auxiliary device to provide RDMA")
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Reviewed-by: default avatarDave Ertman <david.m.ertman@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      486b9eee
    • Jakub Kicinski's avatar
      Merge branch 'ocelot-vcap-fixes' · c88d3908
      Jakub Kicinski authored
      Vladimir Oltean says:
      
      ====================
      Ocelot VCAP fixes
      
      Changes in v2:
      fix the NPDs and UAFs caused by filter->trap_list in a more robust way
      that actually does not introduce bugs of its own (1/5)
      
      This series fixes issues found while running
      tools/testing/selftests/net/forwarding/tc_actions.sh on the ocelot
      switch:
      
      - NULL pointer dereference when failing to offload a filter
      - NULL pointer dereference after deleting a trap
      - filters still having effect after being deleted
      - dropped packets still being seen by software
      - statistics counters showing double the amount of hits
      - statistics counters showing inexistent hits
      - invalid configurations not rejected
      ====================
      
      Link: https://lore.kernel.org/r/20220504235503.4161890-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c88d3908
    • Vladimir Oltean's avatar
      net: mscc: ocelot: avoid corrupting hardware counters when moving VCAP filters · 93a84170
      Vladimir Oltean authored
      Given the following order of operations:
      
      (1) we add filter A using tc-flower
      (2) we send a packet that matches it
      (3) we read the filter's statistics to find a hit count of 1
      (4) we add a second filter B with a higher preference than A, and A
          moves one position to the right to make room in the TCAM for it
      (5) we send another packet, and this matches the second filter B
      (6) we read the filter statistics again.
      
      When this happens, the hit count of filter A is 2 and of filter B is 1,
      despite a single packet having matched each filter.
      
      Furthermore, in an alternate history, reading the filter stats a second
      time between steps (3) and (4) makes the hit count of filter A remain at
      1 after step (6), as expected.
      
      The reason why this happens has to do with the filter->stats.pkts field,
      which is written to hardware through the call path below:
      
                     vcap_entry_set
                     /      |      \
                    /       |       \
                   /        |        \
                  /         |         \
      es0_entry_set   is1_entry_set   is2_entry_set
                  \         |         /
                   \        |        /
                    \       |       /
              vcap_data_set(data.counter, ...)
      
      The primary role of filter->stats.pkts is to transport the filter hit
      counters from the last readout all the way from vcap_entry_get() ->
      ocelot_vcap_filter_stats_update() -> ocelot_cls_flower_stats().
      The reason why vcap_entry_set() writes it to hardware is so that the
      counters (saturating and having a limited bit width) are cleared
      after each user space readout.
      
      The writing of filter->stats.pkts to hardware during the TCAM entry
      movement procedure is an unintentional consequence of the code design,
      because the hit count isn't up to date at this point.
      
      So at step (4), when filter A is moved by ocelot_vcap_filter_add() to
      make room for filter B, the hardware hit count is 0 (no packet matched
      on it in the meantime), but filter->stats.pkts is 1, because the last
      readout saw the earlier packet. The movement procedure programs the old
      hit count back to hardware, so this creates the impression to user space
      that more packets have been matched than they really were.
      
      The bug can be seen when running the gact_drop_and_ok_test() from the
      tc_actions.sh selftest.
      
      Fix the issue by reading back the hit count to tmp->stats.pkts before
      migrating the VCAP filter. Sure, this is a best-effort technique, since
      the packets that hit the rule between vcap_entry_get() and
      vcap_entry_set() won't be counted, but at least it allows the counters
      to be reliably used for selftests where the traffic is under control.
      
      The vcap_entry_get() name is a bit unintuitive, but it only reads back
      the counter portion of the TCAM entry, not the entire entry.
      
      The index from which we retrieve the counter is also a bit unintuitive
      (i - 1 during add, i + 1 during del), but this is the way in which TCAM
      entry movement works. The "entry index" isn't a stored integer for a
      TCAM filter, instead it is dynamically computed by
      ocelot_vcap_block_get_filter_index() based on the entry's position in
      the &block->rules list. That position (as well as block->count) is
      automatically updated by ocelot_vcap_filter_add_to_block() on add, and
      by ocelot_vcap_block_remove_filter() on del. So "i" is the new filter
      index, and "i - 1" or "i + 1" respectively are the old addresses of that
      TCAM entry (we only support installing/deleting one filter at a time).
      
      Fixes: b5962294 ("net: mscc: ocelot: Add support for tcam")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      93a84170