1. 05 Jun, 2024 17 commits
    • Eric Dumazet's avatar
      net/sched: taprio: always validate TCA_TAPRIO_ATTR_PRIOMAP · f921a58a
      Eric Dumazet authored
      If one TCA_TAPRIO_ATTR_PRIOMAP attribute has been provided,
      taprio_parse_mqprio_opt() must validate it, or userspace
      can inject arbitrary data to the kernel, the second time
      taprio_change() is called.
      
      First call (with valid attributes) sets dev->num_tc
      to a non zero value.
      
      Second call (with arbitrary mqprio attributes)
      returns early from taprio_parse_mqprio_opt()
      and bad things can happen.
      
      Fixes: a3d43c0d ("taprio: Add support adding an admin schedule")
      Reported-by: default avatarNoam Rathaus <noamr@ssd-disclosure.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20240604181511.769870-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f921a58a
    • Aleksandr Mishin's avatar
      net/mlx5: Fix tainted pointer delete is case of flow rules creation fail · 229bedbf
      Aleksandr Mishin authored
      In case of flow rule creation fail in mlx5_lag_create_port_sel_table(),
      instead of previously created rules, the tainted pointer is deleted
      deveral times.
      Fix this bug by using correct flow rules pointers.
      
      Found by Linux Verification Center (linuxtesting.org) with SVACE.
      
      Fixes: 352899f3 ("net/mlx5: Lag, use buckets in hash mode")
      Signed-off-by: default avatarAleksandr Mishin <amishin@t-argos.ru>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240604100552.25201-1-amishin@t-argos.ruSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      229bedbf
    • David S. Miller's avatar
      Merge branch 'mlx5-fixes' · f8f0de9d
      David S. Miller authored
      Tariq Toukan says:
      
      ====================
      mlx5 core fixes 20240603
      
      This small patchset provides two bug fixes from the team to the mlx5 core driver.
      
      Series generated against:
      commit 33700a0c ("net/tcp: Don't consider TCP_CLOSE in TCP_AO_ESTABLISHED")
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8f0de9d
    • Shay Drory's avatar
      net/mlx5: Always stop health timer during driver removal · c8b3f38d
      Shay Drory authored
      Currently, if teardown_hca fails to execute during driver removal, mlx5
      does not stop the health timer. Afterwards, mlx5 continue with driver
      teardown. This may lead to a UAF bug, which results in page fault
      Oops[1], since the health timer invokes after resources were freed.
      
      Hence, stop the health monitor even if teardown_hca fails.
      
      [1]
      mlx5_core 0000:18:00.0: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
      mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
      mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
      mlx5_core 0000:18:00.0: E-Switch: cleanup
      mlx5_core 0000:18:00.0: wait_func:1155:(pid 1967079): TEARDOWN_HCA(0x103) timeout. Will cause a leak of a command resource
      mlx5_core 0000:18:00.0: mlx5_function_close:1288:(pid 1967079): tear_down_hca failed, skip cleanup
      BUG: unable to handle page fault for address: ffffa26487064230
      PGD 100c00067 P4D 100c00067 PUD 100e5a067 PMD 105ed7067 PTE 0
      Oops: 0000 [#1] PREEMPT SMP PTI
      CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OE     -------  ---  6.7.0-68.fc38.x86_64 #1
      Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0013.121520200651 12/15/2020
      RIP: 0010:ioread32be+0x34/0x60
      RSP: 0018:ffffa26480003e58 EFLAGS: 00010292
      RAX: ffffa26487064200 RBX: ffff9042d08161a0 RCX: ffff904c108222c0
      RDX: 000000010bbf1b80 RSI: ffffffffc055ddb0 RDI: ffffa26487064230
      RBP: ffff9042d08161a0 R08: 0000000000000022 R09: ffff904c108222e8
      R10: 0000000000000004 R11: 0000000000000441 R12: ffffffffc055ddb0
      R13: ffffa26487064200 R14: ffffa26480003f00 R15: ffff904c108222c0
      FS:  0000000000000000(0000) GS:ffff904c10800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffa26487064230 CR3: 00000002c4420006 CR4: 00000000007706f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <IRQ>
       ? __die+0x23/0x70
       ? page_fault_oops+0x171/0x4e0
       ? exc_page_fault+0x175/0x180
       ? asm_exc_page_fault+0x26/0x30
       ? __pfx_poll_health+0x10/0x10 [mlx5_core]
       ? __pfx_poll_health+0x10/0x10 [mlx5_core]
       ? ioread32be+0x34/0x60
       mlx5_health_check_fatal_sensors+0x20/0x100 [mlx5_core]
       ? __pfx_poll_health+0x10/0x10 [mlx5_core]
       poll_health+0x42/0x230 [mlx5_core]
       ? __next_timer_interrupt+0xbc/0x110
       ? __pfx_poll_health+0x10/0x10 [mlx5_core]
       call_timer_fn+0x21/0x130
       ? __pfx_poll_health+0x10/0x10 [mlx5_core]
       __run_timers+0x222/0x2c0
       run_timer_softirq+0x1d/0x40
       __do_softirq+0xc9/0x2c8
       __irq_exit_rcu+0xa6/0xc0
       sysvec_apic_timer_interrupt+0x72/0x90
       </IRQ>
       <TASK>
       asm_sysvec_apic_timer_interrupt+0x1a/0x20
      RIP: 0010:cpuidle_enter_state+0xcc/0x440
       ? cpuidle_enter_state+0xbd/0x440
       cpuidle_enter+0x2d/0x40
       do_idle+0x20d/0x270
       cpu_startup_entry+0x2a/0x30
       rest_init+0xd0/0xd0
       arch_call_rest_init+0xe/0x30
       start_kernel+0x709/0xa90
       x86_64_start_reservations+0x18/0x30
       x86_64_start_kernel+0x96/0xa0
       secondary_startup_64_no_verify+0x18f/0x19b
      ---[ end trace 0000000000000000 ]---
      
      Fixes: 9b98d395 ("net/mlx5: Start health poll at earlier stage of driver load")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8b3f38d
    • Moshe Shemesh's avatar
      net/mlx5: Stop waiting for PCI if pci channel is offline · 33afbfcc
      Moshe Shemesh authored
      In case pci channel becomes offline the driver should not wait for PCI
      reads during health dump and recovery flow. The driver has timeout for
      each of these loops trying to read PCI, so it would fail anyway.
      However, in case of recovery waiting till timeout may cause the pci
      error_detected() callback fail to meet pci_dpc_recovered() wait timeout.
      
      Fixes: b3bd076f ("net/mlx5: Report devlink health on FW fatal issues")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarShay Drori <shayd@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33afbfcc
    • Frank Wunderlich's avatar
      net: ethernet: mtk_eth_soc: handle dma buffer size soc specific · c57e5581
      Frank Wunderlich authored
      The mainline MTK ethernet driver suffers long time from rarly but
      annoying tx queue timeouts. We think that this is caused by fixed
      dma sizes hardcoded for all SoCs.
      
      We suspect this problem arises from a low level of free TX DMADs,
      the TX Ring alomost full.
      
      The transmit timeout is caused by the Tx queue not waking up. The
      Tx queue stops when the free counter is less than ring->thres, and
      it will wake up once the free counter is greater than ring->thres.
      If the CPU is too late to wake up the Tx queues, it may cause a
      transmit timeout.
      Therefore, we increased the TX and RX DMADs to improve this error
      situation.
      
      Use the dma-size implementation from SDK in a per SoC manner. In
      difference to SDK we have no RSS feature yet, so all RX/TX sizes
      should be raised from 512 to 2048 byte except fqdma on mt7988 to
      avoid the tx timeout issue.
      
      Fixes: 656e7052 ("net-next: mediatek: add support for MT7623 ethernet")
      Suggested-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarFrank Wunderlich <frank-w@public-files.de>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c57e5581
    • Jakub Kicinski's avatar
      rtnetlink: make the "split" NLM_DONE handling generic · 5b4b62a1
      Jakub Kicinski authored
      Jaroslav reports Dell's OMSA Systems Management Data Engine
      expects NLM_DONE in a separate recvmsg(), both for rtnl_dump_ifinfo()
      and inet_dump_ifaddr(). We already added a similar fix previously in
      commit 460b0d33 ("inet: bring NLM_DONE out to a separate recv() again")
      
      Instead of modifying all the dump handlers, and making them look
      different than modern for_each_netdev_dump()-based dump handlers -
      put the workaround in rtnetlink code. This will also help us move
      the custom rtnl-locking from af_netlink in the future (in net-next).
      
      Note that this change is not touching rtnl_dump_all(). rtnl_dump_all()
      is different kettle of fish and a potential problem. We now mix families
      in a single recvmsg(), but NLM_DONE is not coalesced.
      
      Tested:
      
        ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_addr.yaml \
                 --dump getaddr --json '{"ifa-family": 2}'
      
        ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
                 --dump getroute --json '{"rtm-family": 2}'
      
        ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_link.yaml \
                 --dump getlink
      
      Fixes: 3e41af90 ("rtnetlink: use xarray iterator to implement rtnl_dump_ifinfo()")
      Fixes: cdb2f80f ("inet: use xa_array iterator to implement inet_dump_ifaddr()")
      Reported-by: default avatarJaroslav Pulchart <jaroslav.pulchart@gooddata.com>
      Link: https://lore.kernel.org/all/CAK8fFZ7MKoFSEzMBDAOjoUt+vTZRRQgLDNXEOfdCCXSoXXKE0g@mail.gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b4b62a1
    • David S. Miller's avatar
      Merge branch 'tcp-mptcp-close-wait' · e137596e
      David S. Miller authored
      Jason Xing says:
      
      ====================
      tcp/mptcp: count CLOSE-WAIT for CurrEstab
      
      Taking CLOSE-WAIT sockets into CurrEstab counters is in accordance with RFC
      1213, as suggested by Eric and Neal.
      
      v5
      Link: https://lore.kernel.org/all/20240531091753.75930-1-kerneljasonxing@gmail.com/
      1. add more detailed comment (Matthieu)
      
      v4
      Link: https://lore.kernel.org/all/20240530131308.59737-1-kerneljasonxing@gmail.com/
      1. correct the Fixes: tag in patch [2/2]. (Eric)
      
      Previous discussion
      Link: https://lore.kernel.org/all/20240529033104.33882-1-kerneljasonxing@gmail.com/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e137596e
    • Jason Xing's avatar
      mptcp: count CLOSE-WAIT sockets for MPTCP_MIB_CURRESTAB · 9633e937
      Jason Xing authored
      Like previous patch does in TCP, we need to adhere to RFC 1213:
      
        "tcpCurrEstab OBJECT-TYPE
         ...
         The number of TCP connections for which the current state
         is either ESTABLISHED or CLOSE- WAIT."
      
      So let's consider CLOSE-WAIT sockets.
      
      The logic of counting
      When we increment the counter?
      a) Only if we change the state to ESTABLISHED.
      
      When we decrement the counter?
      a) if the socket leaves ESTABLISHED and will never go into CLOSE-WAIT,
      say, on the client side, changing from ESTABLISHED to FIN-WAIT-1.
      b) if the socket leaves CLOSE-WAIT, say, on the server side, changing
      from CLOSE-WAIT to LAST-ACK.
      
      Fixes: d9cd27b8 ("mptcp: add CurrEstab MIB counter support")
      Signed-off-by: default avatarJason Xing <kernelxing@tencent.com>
      Reviewed-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9633e937
    • Jason Xing's avatar
      tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB · a46d0ea5
      Jason Xing authored
      According to RFC 1213, we should also take CLOSE-WAIT sockets into
      consideration:
      
        "tcpCurrEstab OBJECT-TYPE
         ...
         The number of TCP connections for which the current state
         is either ESTABLISHED or CLOSE- WAIT."
      
      After this, CurrEstab counter will display the total number of
      ESTABLISHED and CLOSE-WAIT sockets.
      
      The logic of counting
      When we increment the counter?
      a) if we change the state to ESTABLISHED.
      b) if we change the state from SYN-RECEIVED to CLOSE-WAIT.
      
      When we decrement the counter?
      a) if the socket leaves ESTABLISHED and will never go into CLOSE-WAIT,
      say, on the client side, changing from ESTABLISHED to FIN-WAIT-1.
      b) if the socket leaves CLOSE-WAIT, say, on the server side, changing
      from CLOSE-WAIT to LAST-ACK.
      
      Please note: there are two chances that old state of socket can be changed
      to CLOSE-WAIT in tcp_fin(). One is SYN-RECV, the other is ESTABLISHED.
      So we have to take care of the former case.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarJason Xing <kernelxing@tencent.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a46d0ea5
    • Hangbin Liu's avatar
      selftests: hsr: add missing config for CONFIG_BRIDGE · 712115a2
      Hangbin Liu authored
      hsr_redbox.sh test need to create bridge for testing. Add the missing
      config CONFIG_BRIDGE in config file.
      
      Fixes: eafbf057 ("test: hsr: Extend the hsr_redbox.sh to have more SAN devices connected")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Tested-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      712115a2
    • Daniel Borkmann's avatar
      vxlan: Fix regression when dropping packets due to invalid src addresses · 1cd4bc98
      Daniel Borkmann authored
      Commit f58f45c1 ("vxlan: drop packets from invalid src-address")
      has recently been added to vxlan mainly in the context of source
      address snooping/learning so that when it is enabled, an entry in the
      FDB is not being created for an invalid address for the corresponding
      tunnel endpoint.
      
      Before commit f58f45c1 vxlan was similarly behaving as geneve in
      that it passed through whichever macs were set in the L2 header. It
      turns out that this change in behavior breaks setups, for example,
      Cilium with netkit in L3 mode for Pods as well as tunnel mode has been
      passing before the change in f58f45c1 for both vxlan and geneve.
      After mentioned change it is only passing for geneve as in case of
      vxlan packets are dropped due to vxlan_set_mac() returning false as
      source and destination macs are zero which for E/W traffic via tunnel
      is totally fine.
      
      Fix it by only opting into the is_valid_ether_addr() check in
      vxlan_set_mac() when in fact source address snooping/learning is
      actually enabled in vxlan. This is done by moving the check into
      vxlan_snoop(). With this change, the Cilium connectivity test suite
      passes again for both tunnel flavors.
      
      Fixes: f58f45c1 ("vxlan: drop packets from invalid src-address")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: David Bauer <mail@david-bauer.net>
      Cc: Ido Schimmel <idosch@nvidia.com>
      Cc: Nikolay Aleksandrov <razor@blackwall.org>
      Cc: Martin KaFai Lau <martin.lau@kernel.org>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Reviewed-by: default avatarDavid Bauer <mail@david-bauer.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cd4bc98
    • Hangyu Hua's avatar
      net: sched: sch_multiq: fix possible OOB write in multiq_tune() · affc18fd
      Hangyu Hua authored
      q->bands will be assigned to qopt->bands to execute subsequent code logic
      after kmalloc. So the old q->bands should not be used in kmalloc.
      Otherwise, an out-of-bounds write will occur.
      
      Fixes: c2999f7f ("net: sched: multiq: don't call qdisc_put() while holding tree lock")
      Signed-off-by: default avatarHangyu Hua <hbh25y@gmail.com>
      Acked-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      affc18fd
    • Taehee Yoo's avatar
      ionic: fix kernel panic in XDP_TX action · 491aee89
      Taehee Yoo authored
      In the XDP_TX path, ionic driver sends a packet to the TX path with rx
      page and corresponding dma address.
      After tx is done, ionic_tx_clean() frees that page.
      But RX ring buffer isn't reset to NULL.
      So, it uses a freed page, which causes kernel panic.
      
      BUG: unable to handle page fault for address: ffff8881576c110c
      PGD 773801067 P4D 773801067 PUD 87f086067 PMD 87efca067 PTE 800ffffea893e060
      Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN NOPTI
      CPU: 1 PID: 25 Comm: ksoftirqd/1 Not tainted 6.9.0+ #11
      Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
      RIP: 0010:bpf_prog_f0b8caeac1068a55_balancer_ingress+0x3b/0x44f
      Code: 00 53 41 55 41 56 41 57 b8 01 00 00 00 48 8b 5f 08 4c 8b 77 00 4c 89 f7 48 83 c7 0e 48 39 d8
      RSP: 0018:ffff888104e6fa28 EFLAGS: 00010283
      RAX: 0000000000000002 RBX: ffff8881576c1140 RCX: 0000000000000002
      RDX: ffffffffc0051f64 RSI: ffffc90002d33048 RDI: ffff8881576c110e
      RBP: ffff888104e6fa88 R08: 0000000000000000 R09: ffffed1027a04a23
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881b03a21a8
      R13: ffff8881589f800f R14: ffff8881576c1100 R15: 00000001576c1100
      FS: 0000000000000000(0000) GS:ffff88881ae00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff8881576c110c CR3: 0000000767a90000 CR4: 00000000007506f0
      PKRU: 55555554
      Call Trace:
      <TASK>
      ? __die+0x20/0x70
      ? page_fault_oops+0x254/0x790
      ? __pfx_page_fault_oops+0x10/0x10
      ? __pfx_is_prefetch.constprop.0+0x10/0x10
      ? search_bpf_extables+0x165/0x260
      ? fixup_exception+0x4a/0x970
      ? exc_page_fault+0xcb/0xe0
      ? asm_exc_page_fault+0x22/0x30
      ? 0xffffffffc0051f64
      ? bpf_prog_f0b8caeac1068a55_balancer_ingress+0x3b/0x44f
      ? do_raw_spin_unlock+0x54/0x220
      ionic_rx_service+0x11ab/0x3010 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ? ionic_tx_clean+0x29b/0xc60 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ? __pfx_ionic_tx_clean+0x10/0x10 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ? __pfx_ionic_rx_service+0x10/0x10 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ? ionic_tx_cq_service+0x25d/0xa00 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ? __pfx_ionic_rx_service+0x10/0x10 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ionic_cq_service+0x69/0x150 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ionic_txrx_napi+0x11a/0x540 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      __napi_poll.constprop.0+0xa0/0x440
      net_rx_action+0x7e7/0xc30
      ? __pfx_net_rx_action+0x10/0x10
      
      Fixes: 8eeed837 ("ionic: Add XDP_TX support")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Reviewed-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      491aee89
    • Tristram Ha's avatar
      net: phy: Micrel KSZ8061: fix errata solution not taking effect problem · 0a8d3f2e
      Tristram Ha authored
      KSZ8061 needs to write to a MMD register at driver initialization to fix
      an errata.  This worked in 5.0 kernel but not in newer kernels.  The
      issue is the main phylib code no longer resets PHY at the very beginning.
      Calling phy resuming code later will reset the chip if it is already
      powered down at the beginning.  This wipes out the MMD register write.
      Solution is to implement a phy resume function for KSZ8061 to take care
      of this problem.
      
      Fixes: 232ba3a5 ("net: phy: Micrel KSZ8061: link failure after cable connect")
      Signed-off-by: default avatarTristram Ha <tristram.ha@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a8d3f2e
    • Wen Gu's avatar
      net/smc: avoid overwriting when adjusting sock bufsizes · fb0aa078
      Wen Gu authored
      When copying smc settings to clcsock, avoid setting clcsock's sk_sndbuf
      to sysctl_tcp_wmem[1], since this may overwrite the value set by
      tcp_sndbuf_expand() in TCP connection establishment.
      
      And the other setting sk_{snd|rcv}buf to sysctl value in
      smc_adjust_sock_bufsizes() can also be omitted since the initialization
      of smc sock and clcsock has set sk_{snd|rcv}buf to smc.sysctl_{w|r}mem
      or ipv4_sysctl_tcp_{w|r}mem[1].
      
      Fixes: 30c3c4a4 ("net/smc: Use correct buffer sizes when switching between TCP and SMC")
      Link: https://lore.kernel.org/r/5eaf3858-e7fd-4db8-83e8-3d7a3e0e9ae2@linux.alibaba.comSigned-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>, too.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb0aa078
    • Subbaraya Sundeep's avatar
      octeontx2-af: Always allocate PF entries from low prioriy zone · 8b0f7410
      Subbaraya Sundeep authored
      PF mcam entries has to be at low priority always so that VF
      can install longest prefix match rules at higher priority.
      This was taken care currently but when priority allocation
      wrt reference entry is requested then entries are allocated
      from mid-zone instead of low priority zone. Fix this and
      always allocate entries from low priority zone for PFs.
      
      Fixes: 7df5b4b2 ("octeontx2-af: Allocate low priority entries for PF")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b0f7410
  2. 04 Jun, 2024 10 commits
    • Jakub Kicinski's avatar
      net: tls: fix marking packets as decrypted · a535d594
      Jakub Kicinski authored
      For TLS offload we mark packets with skb->decrypted to make sure
      they don't escape the host without getting encrypted first.
      The crypto state lives in the socket, so it may get detached
      by a call to skb_orphan(). As a safety check - the egress path
      drops all packets with skb->decrypted and no "crypto-safe" socket.
      
      The skb marking was added to sendpage only (and not sendmsg),
      because tls_device injected data into the TCP stack using sendpage.
      This special case was missed when sendpage got folded into sendmsg.
      
      Fixes: c5c37af6 ("tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240530232607.82686-1-kuba@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a535d594
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2024-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · d6301802
      Jakub Kicinski authored
      Kalle Valo says:
      
      ====================
      wireless fixes for v6.10-rc3
      
      The first fixes for v6.10. And we have a big one, I suspect the
      biggest wireless pull request we ever had. There are fixes all over,
      both in stack and drivers. Likely the most important here are mt76 not
      working on mt7615 devices, ath11k not being able to connect to 6 GHz
      networks and rtlwifi suffering from packet loss. But of course there's
      much more.
      
      * tag 'wireless-2024-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (37 commits)
        wifi: rtlwifi: Ignore IEEE80211_CONF_CHANGE_RETRY_LIMITS
        wifi: mt76: mt7615: add missing chanctx ops
        wifi: wilc1000: document SRCU usage instead of SRCU
        Revert "wifi: wilc1000: set atomic flag on kmemdup in srcu critical section"
        Revert "wifi: wilc1000: convert list management to RCU"
        wifi: mac80211: fix UBSAN noise in ieee80211_prep_hw_scan()
        wifi: mac80211: correctly parse Spatial Reuse Parameter Set element
        wifi: mac80211: fix Spatial Reuse element size check
        wifi: iwlwifi: mvm: don't read past the mfuart notifcation
        wifi: iwlwifi: mvm: Fix scan abort handling with HW rfkill
        wifi: iwlwifi: mvm: check n_ssids before accessing the ssids
        wifi: iwlwifi: mvm: properly set 6 GHz channel direct probe option
        wifi: iwlwifi: mvm: handle BA session teardown in RF-kill
        wifi: iwlwifi: mvm: Handle BIGTK cipher in kek_kck cmd
        wifi: iwlwifi: mvm: remove stale STA link data during restart
        wifi: iwlwifi: dbg_ini: move iwl_dbg_tlv_free outside of debugfs ifdef
        wifi: iwlwifi: mvm: set properly mac header
        wifi: iwlwifi: mvm: revert gen2 TX A-MPDU size to 64
        wifi: iwlwifi: mvm: d3: fix WoWLAN command version lookup
        wifi: iwlwifi: mvm: fix a crash on 7265
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20240603115129.9494CC2BD10@smtp.kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d6301802
    • Jeff Johnson's avatar
      lib/test_rhashtable: add missing MODULE_DESCRIPTION() macro · c6cab01d
      Jeff Johnson authored
      make allmodconfig && make W=1 C=1 reports:
      WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_rhashtable.o
      
      Add the missing invocation of the MODULE_DESCRIPTION() macro.
      Signed-off-by: default avatarJeff Johnson <quic_jjohnson@quicinc.com>
      Link: https://lore.kernel.org/r/20240531-md-lib-test_rhashtable-v1-1-cd6d4138f1b6@quicinc.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c6cab01d
    • Jakub Kicinski's avatar
      Merge branch 'dst_cache-fix-possible-races' · d730a42c
      Jakub Kicinski authored
      Eric Dumazet says:
      
      ====================
      dst_cache: fix possible races
      
      This series is inspired by various undisclosed syzbot
      reports hinting at corruptions in dst_cache structures.
      
      It seems at least four users of dst_cache are racy against
      BH reentrancy.
      
      Last patch is adding a DEBUG_NET check to catch future misuses.
      ====================
      
      Link: https://lore.kernel.org/r/20240531132636.2637995-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d730a42c
    • Eric Dumazet's avatar
      net: dst_cache: add two DEBUG_NET warnings · 2fe6fb36
      Eric Dumazet authored
      After fixing four different bugs involving dst_cache
      users, it might be worth adding a check about BH being
      blocked by dst_cache callers.
      
      DEBUG_NET_WARN_ON_ONCE(!in_softirq());
      
      It is not fatal, if we missed valid case where no
      BH deadlock is to be feared, we might change this.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/20240531132636.2637995-6-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2fe6fb36
    • Eric Dumazet's avatar
      ila: block BH in ila_output() · cf28ff8e
      Eric Dumazet authored
      As explained in commit 13788174 ("tipc: block BH
      before using dst_cache"), net/core/dst_cache.c
      helpers need to be called with BH disabled.
      
      ila_output() is called from lwtunnel_output()
      possibly from process context, and under rcu_read_lock().
      
      We might be interrupted by a softirq, re-enter ila_output()
      and corrupt dst_cache data structures.
      
      Fix the race by using local_bh_disable().
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/20240531132636.2637995-5-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cf28ff8e
    • Eric Dumazet's avatar
      ipv6: sr: block BH in seg6_output_core() and seg6_input_core() · c0b98ac1
      Eric Dumazet authored
      As explained in commit 13788174 ("tipc: block BH
      before using dst_cache"), net/core/dst_cache.c
      helpers need to be called with BH disabled.
      
      Disabling preemption in seg6_output_core() is not good enough,
      because seg6_output_core() is called from process context,
      lwtunnel_output() only uses rcu_read_lock().
      
      We might be interrupted by a softirq, re-enter seg6_output_core()
      and corrupt dst_cache data structures.
      
      Fix the race by using local_bh_disable() instead of
      preempt_disable().
      
      Apply a similar change in seg6_input_core().
      
      Fixes: fa79581e ("ipv6: sr: fix several BUGs when preemption is enabled")
      Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Lebrun <dlebrun@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/20240531132636.2637995-4-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c0b98ac1
    • Eric Dumazet's avatar
      net: ipv6: rpl_iptunnel: block BH in rpl_output() and rpl_input() · db0090c6
      Eric Dumazet authored
      As explained in commit 13788174 ("tipc: block BH
      before using dst_cache"), net/core/dst_cache.c
      helpers need to be called with BH disabled.
      
      Disabling preemption in rpl_output() is not good enough,
      because rpl_output() is called from process context,
      lwtunnel_output() only uses rcu_read_lock().
      
      We might be interrupted by a softirq, re-enter rpl_output()
      and corrupt dst_cache data structures.
      
      Fix the race by using local_bh_disable() instead of
      preempt_disable().
      
      Apply a similar change in rpl_input().
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Alexander Aring <aahringo@redhat.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/20240531132636.2637995-3-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      db0090c6
    • Eric Dumazet's avatar
      ipv6: ioam: block BH from ioam6_output() · 2fe40483
      Eric Dumazet authored
      As explained in commit 13788174 ("tipc: block BH
      before using dst_cache"), net/core/dst_cache.c
      helpers need to be called with BH disabled.
      
      Disabling preemption in ioam6_output() is not good enough,
      because ioam6_output() is called from process context,
      lwtunnel_output() only uses rcu_read_lock().
      
      We might be interrupted by a softirq, re-enter ioam6_output()
      and corrupt dst_cache data structures.
      
      Fix the race by using local_bh_disable() instead of
      preempt_disable().
      
      Fixes: 8cb3bf8b ("ipv6: ioam: Add support for the ip6ip6 encapsulation")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Justin Iurman <justin.iurman@uliege.be>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/20240531132636.2637995-2-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2fe40483
    • Matthias Stocker's avatar
      vmxnet3: disable rx data ring on dma allocation failure · ffbe335b
      Matthias Stocker authored
      When vmxnet3_rq_create() fails to allocate memory for rq->data_ring.base,
      the subsequent call to vmxnet3_rq_destroy_all_rxdataring does not reset
      rq->data_ring.desc_size for the data ring that failed, which presumably
      causes the hypervisor to reference it on packet reception.
      
      To fix this bug, rq->data_ring.desc_size needs to be set to 0 to tell
      the hypervisor to disable this feature.
      
      [   95.436876] kernel BUG at net/core/skbuff.c:207!
      [   95.439074] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      [   95.440411] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 6.9.3-dirty #1
      [   95.441558] Hardware name: VMware, Inc. VMware Virtual
      Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
      [   95.443481] RIP: 0010:skb_panic+0x4d/0x4f
      [   95.444404] Code: 4f 70 50 8b 87 c0 00 00 00 50 8b 87 bc 00 00 00 50
      ff b7 d0 00 00 00 4c 8b 8f c8 00 00 00 48 c7 c7 68 e8 be 9f e8 63 58 f9
      ff <0f> 0b 48 8b 14 24 48 c7 c1 d0 73 65 9f e8 a1 ff ff ff 48 8b 14 24
      [   95.447684] RSP: 0018:ffffa13340274dd0 EFLAGS: 00010246
      [   95.448762] RAX: 0000000000000089 RBX: ffff8fbbc72b02d0 RCX: 000000000000083f
      [   95.450148] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f
      [   95.451520] RBP: 000000000000002d R08: 0000000000000000 R09: ffffa13340274c60
      [   95.452886] R10: ffffffffa04ed468 R11: 0000000000000002 R12: 0000000000000000
      [   95.454293] R13: ffff8fbbdab3c2d0 R14: ffff8fbbdbd829e0 R15: ffff8fbbdbd809e0
      [   95.455682] FS:  0000000000000000(0000) GS:ffff8fbeefd80000(0000) knlGS:0000000000000000
      [   95.457178] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   95.458340] CR2: 00007fd0d1f650c8 CR3: 0000000115f28000 CR4: 00000000000406f0
      [   95.459791] Call Trace:
      [   95.460515]  <IRQ>
      [   95.461180]  ? __die_body.cold+0x19/0x27
      [   95.462150]  ? die+0x2e/0x50
      [   95.462976]  ? do_trap+0xca/0x110
      [   95.463973]  ? do_error_trap+0x6a/0x90
      [   95.464966]  ? skb_panic+0x4d/0x4f
      [   95.465901]  ? exc_invalid_op+0x50/0x70
      [   95.466849]  ? skb_panic+0x4d/0x4f
      [   95.467718]  ? asm_exc_invalid_op+0x1a/0x20
      [   95.468758]  ? skb_panic+0x4d/0x4f
      [   95.469655]  skb_put.cold+0x10/0x10
      [   95.470573]  vmxnet3_rq_rx_complete+0x862/0x11e0 [vmxnet3]
      [   95.471853]  vmxnet3_poll_rx_only+0x36/0xb0 [vmxnet3]
      [   95.473185]  __napi_poll+0x2b/0x160
      [   95.474145]  net_rx_action+0x2c6/0x3b0
      [   95.475115]  handle_softirqs+0xe7/0x2a0
      [   95.476122]  __irq_exit_rcu+0x97/0xb0
      [   95.477109]  common_interrupt+0x85/0xa0
      [   95.478102]  </IRQ>
      [   95.478846]  <TASK>
      [   95.479603]  asm_common_interrupt+0x26/0x40
      [   95.480657] RIP: 0010:pv_native_safe_halt+0xf/0x20
      [   95.481801] Code: 22 d7 e9 54 87 01 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 93 ba 3b 00 fb f4 <e9> 2c 87 01 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
      [   95.485563] RSP: 0018:ffffa133400ffe58 EFLAGS: 00000246
      [   95.486882] RAX: 0000000000004000 RBX: ffff8fbbc1d14064 RCX: 0000000000000000
      [   95.488477] RDX: ffff8fbeefd80000 RSI: ffff8fbbc1d14000 RDI: 0000000000000001
      [   95.490067] RBP: ffff8fbbc1d14064 R08: ffffffffa0652260 R09: 00000000000010d3
      [   95.491683] R10: 0000000000000018 R11: ffff8fbeefdb4764 R12: ffffffffa0652260
      [   95.493389] R13: ffffffffa06522e0 R14: 0000000000000001 R15: 0000000000000000
      [   95.495035]  acpi_safe_halt+0x14/0x20
      [   95.496127]  acpi_idle_do_entry+0x2f/0x50
      [   95.497221]  acpi_idle_enter+0x7f/0xd0
      [   95.498272]  cpuidle_enter_state+0x81/0x420
      [   95.499375]  cpuidle_enter+0x2d/0x40
      [   95.500400]  do_idle+0x1e5/0x240
      [   95.501385]  cpu_startup_entry+0x29/0x30
      [   95.502422]  start_secondary+0x11c/0x140
      [   95.503454]  common_startup_64+0x13e/0x141
      [   95.504466]  </TASK>
      [   95.505197] Modules linked in: nft_fib_inet nft_fib_ipv4
      nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
      nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
      nf_defrag_ipv4 rfkill ip_set nf_tables vsock_loopback
      vmw_vsock_virtio_transport_common qrtr vmw_vsock_vmci_transport vsock
      sunrpc binfmt_misc pktcdvd vmw_balloon pcspkr vmw_vmci i2c_piix4 joydev
      loop dm_multipath nfnetlink zram crct10dif_pclmul crc32_pclmul vmwgfx
      crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel
      sha512_ssse3 sha256_ssse3 vmxnet3 sha1_ssse3 drm_ttm_helper vmw_pvscsi
      ttm ata_generic pata_acpi serio_raw scsi_dh_rdac scsi_dh_emc
      scsi_dh_alua ip6_tables ip_tables fuse
      [   95.516536] ---[ end trace 0000000000000000 ]---
      
      Fixes: 6f483338 ("net: vmxnet3: Fix NULL pointer dereference in vmxnet3_rq_rx_complete()")
      Signed-off-by: default avatarMatthias Stocker <mstocker@barracuda.com>
      Reviewed-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Reviewed-by: default avatarRonak Doshi <ronak.doshi@broadcom.com>
      Link: https://lore.kernel.org/r/20240531103711.101961-1-mstocker@barracuda.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ffbe335b
  3. 03 Jun, 2024 1 commit
  4. 01 Jun, 2024 12 commits