1. 26 Jun, 2024 1 commit
    • Edward Adam Davis's avatar
      jfs: fix null ptr deref in dtInsertEntry · ce6dede9
      Edward Adam Davis authored
      [syzbot reported]
      general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
      KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
      CPU: 0 PID: 5061 Comm: syz-executor404 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      RIP: 0010:dtInsertEntry+0xd0c/0x1780 fs/jfs/jfs_dtree.c:3713
      ...
      [Analyze]
      In dtInsertEntry(), when the pointer h has the same value as p, after writing
      name in UniStrncpy_to_le(), p->header.flag will be cleared. This will cause the
      previously true judgment "p->header.flag & BT-LEAF" to change to no after writing
      the name operation, this leads to entering an incorrect branch and accessing the
      uninitialized object ih when judging this condition for the second time.
      
      [Fix]
      After got the page, check freelist first, if freelist == 0 then exit dtInsert()
      and return -EINVAL.
      
      Reported-by: syzbot+bba84aef3a26fb93deb9@syzkaller.appspotmail.com
      Signed-off-by: default avatarEdward Adam Davis <eadavis@qq.com>
      Signed-off-by: default avatarDave Kleikamp <dave.kleikamp@oracle.com>
      ce6dede9
  2. 28 May, 2024 8 commits
  3. 24 May, 2024 6 commits
  4. 23 May, 2024 25 commits
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 6d69b6c1
      Linus Torvalds authored
      Pull NFS client updates from Trond Myklebust:
       "Stable fixes:
         - nfs: fix undefined behavior in nfs_block_bits()
         - NFSv4.2: Fix READ_PLUS when server doesn't support OP_READ_PLUS
      
        Bugfixes:
         - Fix mixing of the lock/nolock and local_lock mount options
         - NFSv4: Fixup smatch warning for ambiguous return
         - NFSv3: Fix remount when using the legacy binary mount api
         - SUNRPC: Fix the handling of expired RPCSEC_GSS contexts
         - SUNRPC: fix the NFSACL RPC retries when soft mounts are enabled
         - rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
      
        Features and cleanups:
         - NFSv3: Use the atomic_open API to fix open(O_CREAT|O_TRUNC)
         - pNFS/filelayout: S layout segment range in LAYOUTGET
         - pNFS: rework pnfs_generic_pg_check_layout to check IO range
         - NFSv2: Turn off enabling of NFS v2 by default"
      
      * tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        nfs: fix undefined behavior in nfs_block_bits()
        pNFS: rework pnfs_generic_pg_check_layout to check IO range
        pNFS/filelayout: check layout segment range
        pNFS/filelayout: fixup pNfs allocation modes
        rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
        NFS: Don't enable NFS v2 by default
        NFS: Fix READ_PLUS when server doesn't support OP_READ_PLUS
        sunrpc: fix NFSACL RPC retry on soft mount
        SUNRPC: fix handling expired GSS context
        nfs: keep server info for remounts
        NFSv4: Fixup smatch warning for ambiguous return
        NFS: make sure lock/nolock overriding local_lock mount option
        NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.
        pNFS/filelayout: Specify the layout segment range in LAYOUTGET
        pNFS/filelayout: Remove the whole file layout requirement
      6d69b6c1
    • Linus Torvalds's avatar
      Merge tag 'block-6.10-20240523' of git://git.kernel.dk/linux · b4d88a60
      Linus Torvalds authored
      Pull more block updates from Jens Axboe:
       "Followup block updates, mostly due to NVMe being a bit late to the
        party. But nothing major in there, so not a big deal.
      
        In detail, this contains:
      
         - NVMe pull request via Keith:
             - Fabrics connection retries (Daniel, Hannes)
             - Fabrics logging enhancements (Tokunori)
             - RDMA delete optimization (Sagi)
      
         - ublk DMA alignment fix (me)
      
         - null_blk sparse warning fixes (Bart)
      
         - Discard support for brd (Keith)
      
         - blk-cgroup list corruption fixes (Ming)
      
         - blk-cgroup stat propagation fix (Waiman)
      
         - Regression fix for plugging stall with md (Yu)
      
         - Misc fixes or cleanups (David, Jeff, Justin)"
      
      * tag 'block-6.10-20240523' of git://git.kernel.dk/linux: (24 commits)
        null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues'
        blk-throttle: remove unused struct 'avg_latency_bucket'
        block: fix lost bio for plug enabled bio based device
        block: t10-pi: add MODULE_DESCRIPTION()
        blk-mq: add helper for checking if one CPU is mapped to specified hctx
        blk-cgroup: Properly propagate the iostat update up the hierarchy
        blk-cgroup: fix list corruption from reorder of WRITE ->lqueued
        blk-cgroup: fix list corruption from resetting io stat
        cdrom: rearrange last_media_change check to avoid unintentional overflow
        nbd: Fix signal handling
        nbd: Remove a local variable from nbd_send_cmd()
        nbd: Improve the documentation of the locking assumptions
        nbd: Remove superfluous casts
        nbd: Use NULL to represent a pointer
        brd: implement discard support
        null_blk: Fix two sparse warnings
        ublk_drv: set DMA alignment mask to 3
        nvme-rdma, nvme-tcp: include max reconnects for reconnect logging
        nvmet-rdma: Avoid o(n^2) loop in delete_ctrl
        nvme: do not retry authentication failures
        ...
      b4d88a60
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.10-20240523' of git://git.kernel.dk/linux · 483a351e
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "Single fix here for a regression in 6.9, and then a simple cleanup
        removing some dead code"
      
      * tag 'io_uring-6.10-20240523' of git://git.kernel.dk/linux:
        io_uring: remove checks for NULL 'sq_offset'
        io_uring/sqpoll: ensure that normal task_work is also run timely
      483a351e
    • Linus Torvalds's avatar
      Merge tag 'regulator-fix-v6.10-merge-window' of... · c2c80ecd
      Linus Torvalds authored
      Merge tag 'regulator-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
      
      Pull regulator fixes from Mark Brown:
       "A bunch of fixes that came in during the merge window.
      
        Matti found several issues with some of the more complexly configured
        Rohm regulators and the helpers they use and there were some errors in
        the specification of tps6594 when regulators are grouped together"
      
      * tag 'regulator-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: tps6594-regulator: Correct multi-phase configuration
        regulator: tps6287x: Force writing VSEL bit
        regulator: pickable ranges: don't always cache vsel
        regulator: rohm-regulator: warn if unsupported voltage is set
        regulator: bd71828: Don't overwrite runtime voltages
      c2c80ecd
    • Linus Torvalds's avatar
      Merge tag 'regmap-fix-v6.10-merge-window' of... · 09f8f2c4
      Linus Torvalds authored
      Merge tag 'regmap-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
      
      Pull regmap fix from Mark Brown:
       "Guenter ran with memory sanitisers and found an issue in the new KUnit
        tests that Richard added where an assumption in older test code was
        exposed, this was fixed quickly by Richard"
      
      * tag 'regmap-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        regmap: kunit: Fix array overflow in stride() test
      09f8f2c4
    • Linus Torvalds's avatar
      Merge tag 'net-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 66ad4829
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Quite smaller than usual. Notably it includes the fix for the unix
        regression from the past weeks. The TCP window fix will require some
        follow-up, already queued.
      
        Current release - regressions:
      
         - af_unix: fix garbage collection of embryos
      
        Previous releases - regressions:
      
         - af_unix: fix race between GC and receive path
      
         - ipv6: sr: fix missing sk_buff release in seg6_input_core
      
         - tcp: remove 64 KByte limit for initial tp->rcv_wnd value
      
         - eth: r8169: fix rx hangup
      
         - eth: lan966x: remove ptp traps in case the ptp is not enabled
      
         - eth: ixgbe: fix link breakage vs cisco switches
      
         - eth: ice: prevent ethtool from corrupting the channels
      
        Previous releases - always broken:
      
         - openvswitch: set the skbuff pkt_type for proper pmtud support
      
         - tcp: Fix shift-out-of-bounds in dctcp_update_alpha()
      
        Misc:
      
         - a bunch of selftests stabilization patches"
      
      * tag 'net-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (25 commits)
        r8169: Fix possible ring buffer corruption on fragmented Tx packets.
        idpf: Interpret .set_channels() input differently
        ice: Interpret .set_channels() input differently
        nfc: nci: Fix handling of zero-length payload packets in nci_rx_work()
        net: relax socket state check at accept time.
        tcp: remove 64 KByte limit for initial tp->rcv_wnd value
        net: ti: icssg_prueth: Fix NULL pointer dereference in prueth_probe()
        tls: fix missing memory barrier in tls_init
        net: fec: avoid lock evasion when reading pps_enable
        Revert "ixgbe: Manual AN-37 for troublesome link partners for X550 SFI"
        testing: net-drv: use stats64 for testing
        net: mana: Fix the extra HZ in mana_hwc_send_request
        net: lan966x: Remove ptp traps in case the ptp is not enabled.
        openvswitch: Set the skbuff pkt_type for proper pmtud support.
        selftest: af_unix: Make SCM_RIGHTS into OOB data.
        af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS
        tcp: Fix shift-out-of-bounds in dctcp_update_alpha().
        selftests/net: use tc rule to filter the na packet
        ipv6: sr: fix memleak in seg6_hmac_init_algo
        af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock.
        ...
      66ad4829
    • Linus Torvalds's avatar
      Merge tag 'trace-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 404001dd
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "Minor last minute fixes:
      
         - Fix a very tight race between the ring buffer readers and resizing
           the ring buffer
      
         - Correct some stale comments in the ring buffer code
      
         - Fix kernel-doc in the rv code
      
         - Add a MODULE_DESCRIPTION to preemptirq_delay_test"
      
      * tag 'trace-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        rv: Update rv_en(dis)able_monitor doc to match kernel-doc
        tracing: Add MODULE_DESCRIPTION() to preemptirq_delay_test
        ring-buffer: Fix a race between readers and resize checks
        ring-buffer: Correct stale comments related to non-consuming readers
      404001dd
    • Linus Torvalds's avatar
      Merge tag 'trace-tools-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · e82d2af5
      Linus Torvalds authored
      Pull tracing tool fix from Steven Rostedt:
       "Fix printf format warnings in latency-collector.
      
        Use the printf format string with %s to take a string instead of
        taking in a string directly"
      
      * tag 'trace-tools-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tools/latency-collector: Fix -Wformat-security compile warns
      e82d2af5
    • Linus Torvalds's avatar
      Merge tag 'trace-assign-str-v6.10' of... · d6a326d6
      Linus Torvalds authored
      Merge tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
      
      Pull tracing cleanup from Steven Rostedt:
       "Remove second argument of __assign_str()
      
        The __assign_str() macro logic of the TRACE_EVENT() macro was
        optimized so that it no longer needs the second argument. The
        __assign_str() is always matched with __string() field that takes a
        field name and the source for that field:
      
          __string(field, source)
      
        The TRACE_EVENT() macro logic will save off the source value and then
        use that value to copy into the ring buffer via the __assign_str().
      
        Before commit c1fa617c ("tracing: Rework __assign_str() and
        __string() to not duplicate getting the string"), the __assign_str()
        needed the second argument which would perform the same logic as the
        __string() source parameter did. Not only would this add overhead, but
        it was error prone as if the __assign_str() source produced something
        different, it may not have allocated enough for the string in the ring
        buffer (as the __string() source was used to determine how much to
        allocate)
      
        Now that the __assign_str() just uses the same string that was used in
        __string() it no longer needs the source parameter. It can now be
        removed"
      
      * tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing/treewide: Remove second parameter of __assign_str()
      d6a326d6
    • Linus Torvalds's avatar
      Merge tag 'sparc-for-6.10-tag1' of... · bca2a25d
      Linus Torvalds authored
      Merge tag 'sparc-for-6.10-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc
      
      Pull sparc updates from Andreas Larsson:
      
       - Avoid on-stack cpumask variables in a number of places
      
       - Move struct termio to asm/termios.h, matching other architectures and
         allowing certain user space applications to build also for sparc
      
       - Fix missing prototype warnings for sparc64
      
       - Fix version generation warnings for sparc32
      
       - Fix bug where non-consecutive CPU IDs lead to some CPUs not starting
      
       - Simplification using swap and cleanup using NULL for pointer
      
       - Convert sparc parport and chmc drivers to use remove callbacks
         returning void
      
      * tag 'sparc-for-6.10-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc:
        sparc/leon: Remove on-stack cpumask var
        sparc/pci_msi: Remove on-stack cpumask var
        sparc/of: Remove on-stack cpumask var
        sparc/irq: Remove on-stack cpumask var
        sparc/srmmu: Remove on-stack cpumask var
        sparc: chmc: Convert to platform remove callback returning void
        sparc: parport: Convert to platform remove callback returning void
        sparc: Compare pointers to NULL instead of 0
        sparc: Use swap() to fix Coccinelle warning
        sparc32: Fix version generation failed warnings
        sparc64: Fix number of online CPUs
        sparc64: Fix prototype warning for sched_clock
        sparc64: Fix prototype warnings in adi_64.c
        sparc64: Fix prototype warning for dma_4v_iotsb_bind
        sparc64: Fix prototype warning for uprobe_trap
        sparc64: Fix prototype warning for alloc_irqstack_bootmem
        sparc64: Fix prototype warning for vmemmap_free
        sparc64: Fix prototype warnings in traps_64.c
        sparc64: Fix prototype warning for init_vdso_image
        sparc: move struct termio to asm/termios.h
      bca2a25d
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 2b7ced10
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "The major fix here is for a filesystem corruption issue reported on
        Apple M1 as a result of buggy management of the floating point
        register state introduced in 6.8. I initially reverted one of the
        offending patches, but in the end Ard cooked a proper fix so there's a
        revert+reapply in the series.
      
        Aside from that, we've got some CPU errata workarounds and misc other
        fixes.
      
         - Fix broken FP register state tracking which resulted in filesystem
           corruption when dm-crypt is used
      
         - Workarounds for Arm CPU errata affecting the SSBS Spectre
           mitigation
      
         - Fix lockdep assertion in DMC620 memory controller PMU driver
      
         - Fix alignment of BUG table when CONFIG_DEBUG_BUGVERBOSE is
           disabled"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64/fpsimd: Avoid erroneous elide of user state reload
        Reapply "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD"
        arm64: asm-bug: Add .align 2 to the end of __BUG_ENTRY
        perf/arm-dmc620: Fix lockdep assert in ->event_init()
        Revert "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD"
        arm64: errata: Add workaround for Arm errata 3194386 and 3312417
        arm64: cputype: Add Neoverse-V3 definitions
        arm64: cputype: Add Cortex-X4 definitions
        arm64: barrier: Restore spec_bar() macro
      2b7ced10
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 2ef32ad2
      Linus Torvalds authored
      Pull virtio updates from Michael Tsirkin:
       "Several new features here:
      
         - virtio-net is finally supported in vduse
      
         - virtio (balloon and mem) interaction with suspend is improved
      
         - vhost-scsi now handles signals better/faster
      
        And fixes, cleanups all over the place"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits)
        virtio-pci: Check if is_avq is NULL
        virtio: delete vq in vp_find_vqs_msix() when request_irq() fails
        MAINTAINERS: add Eugenio Pérez as reviewer
        vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API
        vp_vdpa: don't allocate unused msix vectors
        sound: virtio: drop owner assignment
        fuse: virtio: drop owner assignment
        scsi: virtio: drop owner assignment
        rpmsg: virtio: drop owner assignment
        nvdimm: virtio_pmem: drop owner assignment
        wifi: mac80211_hwsim: drop owner assignment
        vsock/virtio: drop owner assignment
        net: 9p: virtio: drop owner assignment
        net: virtio: drop owner assignment
        net: caif: virtio: drop owner assignment
        misc: nsm: drop owner assignment
        iommu: virtio: drop owner assignment
        drm/virtio: drop owner assignment
        gpio: virtio: drop owner assignment
        firmware: arm_scmi: virtio: drop owner assignment
        ...
      2ef32ad2
    • Shuah Khan's avatar
      tools/latency-collector: Fix -Wformat-security compile warns · df73757c
      Shuah Khan authored
      Fix the following -Wformat-security compile warnings adding missing
      format arguments:
      
      latency-collector.c: In function ‘show_available’:
      latency-collector.c:938:17: warning: format not a string literal and
      no format arguments [-Wformat-security]
        938 |                 warnx(no_tracer_msg);
            |                 ^~~~~
      
      latency-collector.c:943:17: warning: format not a string literal and
      no format arguments [-Wformat-security]
        943 |                 warnx(no_latency_tr_msg);
            |                 ^~~~~
      
      latency-collector.c: In function ‘find_default_tracer’:
      latency-collector.c:986:25: warning: format not a string literal and
      no format arguments [-Wformat-security]
        986 |                         errx(EXIT_FAILURE, no_tracer_msg);
            |
                               ^~~~
      latency-collector.c: In function ‘scan_arguments’:
      latency-collector.c:1881:33: warning: format not a string literal and
      no format arguments [-Wformat-security]
       1881 |                                 errx(EXIT_FAILURE, no_tracer_msg);
            |                                 ^~~~
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240404011009.32945-1-skhan@linuxfoundation.org
      
      Cc: stable@vger.kernel.org
      Fixes: e23db805 ("tracing/tools: Add the latency-collector to tools directory")
      Signed-off-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      df73757c
    • Ken Milmore's avatar
      r8169: Fix possible ring buffer corruption on fragmented Tx packets. · c71e3a5c
      Ken Milmore authored
      An issue was found on the RTL8125b when transmitting small fragmented
      packets, whereby invalid entries were inserted into the transmit ring
      buffer, subsequently leading to calls to dma_unmap_single() with a null
      address.
      
      This was caused by rtl8169_start_xmit() not noticing changes to nr_frags
      which may occur when small packets are padded (to work around hardware
      quirks) in rtl8169_tso_csum_v2().
      
      To fix this, postpone inspecting nr_frags until after any padding has been
      applied.
      
      Fixes: 9020845f ("r8169: improve rtl8169_start_xmit")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKen Milmore <ken.milmore@gmail.com>
      Reviewed-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Link: https://lore.kernel.org/r/27ead18b-c23d-4f49-a020-1fc482c5ac95@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c71e3a5c
    • Yu Kuai's avatar
      null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues' · a2db328b
      Yu Kuai authored
      Writing 'power' and 'submit_queues' concurrently will trigger kernel
      panic:
      
      Test script:
      
      modprobe null_blk nr_devices=0
      mkdir -p /sys/kernel/config/nullb/nullb0
      while true; do echo 1 > submit_queues; echo 4 > submit_queues; done &
      while true; do echo 1 > power; echo 0 > power; done
      
      Test result:
      
      BUG: kernel NULL pointer dereference, address: 0000000000000148
      Oops: 0000 [#1] PREEMPT SMP
      RIP: 0010:__lock_acquire+0x41d/0x28f0
      Call Trace:
       <TASK>
       lock_acquire+0x121/0x450
       down_write+0x5f/0x1d0
       simple_recursive_removal+0x12f/0x5c0
       blk_mq_debugfs_unregister_hctxs+0x7c/0x100
       blk_mq_update_nr_hw_queues+0x4a3/0x720
       nullb_update_nr_hw_queues+0x71/0xf0 [null_blk]
       nullb_device_submit_queues_store+0x79/0xf0 [null_blk]
       configfs_write_iter+0x119/0x1e0
       vfs_write+0x326/0x730
       ksys_write+0x74/0x150
      
      This is because del_gendisk() can concurrent with
      blk_mq_update_nr_hw_queues():
      
      nullb_device_power_store	nullb_apply_submit_queues
       null_del_dev
       del_gendisk
      				 nullb_update_nr_hw_queues
      				  if (!dev->nullb)
      				  // still set while gendisk is deleted
      				   return 0
      				  blk_mq_update_nr_hw_queues
       dev->nullb = NULL
      
      Fix this problem by resuing the global mutex to protect
      nullb_device_power_store() and nullb_update_nr_hw_queues() from configfs.
      
      Fixes: 45919fbf ("null_blk: Enable modifying 'submit_queues' after an instance has been configured")
      Reported-and-tested-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Closes: https://lore.kernel.org/all/CAHj4cs9LgsHLnjg8z06LQ3Pr5cax-+Ps+xT7AP7TPnEjStuwZA@mail.gmail.com/Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarZhu Yanjun <yanjun.zhu@linux.dev>
      Link: https://lore.kernel.org/r/20240523153934.1937851-1-yukuai1@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a2db328b
    • Paolo Abeni's avatar
      Merge branch 'intel-interpret-set_channels-input-differently' · 3d8597d8
      Paolo Abeni authored
      Jacob Keller says:
      
      ====================
      intel: Interpret .set_channels() input differently
      
      The ice and idpf drivers can trigger a crash with AF_XDP due to incorrect
      interpretation of the asymmetric Tx and Rx parameters in their
      .set_channels() implementations:
      
      1. ethtool -l <IFNAME> -> combined: 40
      2. Attach AF_XDP to queue 30
      3. ethtool -L <IFNAME> rx 15 tx 15
         combined number is not specified, so command becomes {rx_count = 15,
         tx_count = 15, combined_count = 40}.
      4. ethnl_set_channels checks, if there are any AF_XDP of queues from the
         new (combined_count + rx_count) to the old one, so from 55 to 40, check
         does not trigger.
      5. the driver interprets `rx 15 tx 15` as 15 combined channels and deletes
         the queue that AF_XDP is attached to.
      
      This is fundamentally a problem with interpreting a request for asymmetric
      queues as symmetric combined queues.
      
      Fix the ice and idpf drivers to stop interpreting such requests as a
      request for combined queues. Due to current driver design for both ice and
      idpf, it is not possible to support requests of the same count of Tx and Rx
      queues with independent interrupts, (i.e. ethtool -L <IFNAME> rx 15 tx 15)
      so such requests are now rejected.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      ====================
      
      Link: https://lore.kernel.org/r/20240521-iwl-net-2024-05-14-set-channels-fixes-v2-0-7aa39e2e99f1@intel.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3d8597d8
    • Larysa Zaremba's avatar
      idpf: Interpret .set_channels() input differently · 5e7695e0
      Larysa Zaremba authored
      Unlike ice, idpf does not check, if user has requested at least 1 combined
      channel. Instead, it relies on a check in the core code. Unfortunately, the
      check does not trigger for us because of the hacky .set_channels()
      interpretation logic that is not consistent with the core code.
      
      This naturally leads to user being able to trigger a crash with an invalid
      input. This is how:
      
      1. ethtool -l <IFNAME> -> combined: 40
      2. ethtool -L <IFNAME> rx 0 tx 0
         combined number is not specified, so command becomes {rx_count = 0,
         tx_count = 0, combined_count = 40}.
      3. ethnl_set_channels checks, if there is at least 1 RX and 1 TX channel,
         comparing (combined_count + rx_count) and (combined_count + tx_count)
         to zero. Obviously, (40 + 0) is greater than zero, so the core code
         deems the input OK.
      4. idpf interprets `rx 0 tx 0` as 0 channels and tries to proceed with such
         configuration.
      
      The issue has to be solved fundamentally, as current logic is also known to
      cause AF_XDP problems in ice [0].
      
      Interpret the command in a way that is more consistent with ethtool
      manual [1] (--show-channels and --set-channels) and new ice logic.
      
      Considering that in the idpf driver only the difference between RX and TX
      queues forms dedicated channels, change the correct way to set number of
      channels to:
      
      ethtool -L <IFNAME> combined 10 /* For symmetric queues */
      ethtool -L <IFNAME> combined 8 tx 2 rx 0 /* For asymmetric queues */
      
      [0] https://lore.kernel.org/netdev/20240418095857.2827-1-larysa.zaremba@intel.com/
      [1] https://man7.org/linux/man-pages/man8/ethtool.8.html
      
      Fixes: 02cbfba1 ("idpf: add ethtool callbacks")
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Reviewed-by: default avatarIgor Bagnucki <igor.bagnucki@intel.com>
      Signed-off-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5e7695e0
    • Larysa Zaremba's avatar
      ice: Interpret .set_channels() input differently · 05d6f442
      Larysa Zaremba authored
      A bug occurs because a safety check guarding AF_XDP-related queues in
      ethnl_set_channels(), does not trigger. This happens, because kernel and
      ice driver interpret the ethtool command differently.
      
      How the bug occurs:
      1. ethtool -l <IFNAME> -> combined: 40
      2. Attach AF_XDP to queue 30
      3. ethtool -L <IFNAME> rx 15 tx 15
         combined number is not specified, so command becomes {rx_count = 15,
         tx_count = 15, combined_count = 40}.
      4. ethnl_set_channels checks, if there are any AF_XDP of queues from the
         new (combined_count + rx_count) to the old one, so from 55 to 40, check
         does not trigger.
      5. ice interprets `rx 15 tx 15` as 15 combined channels and deletes the
         queue that AF_XDP is attached to.
      
      Interpret the command in a way that is more consistent with ethtool
      manual [0] (--show-channels and --set-channels).
      
      Considering that in the ice driver only the difference between RX and TX
      queues forms dedicated channels, change the correct way to set number of
      channels to:
      
      ethtool -L <IFNAME> combined 10 /* For symmetric queues */
      ethtool -L <IFNAME> combined 8 tx 2 rx 0 /* For asymmetric queues */
      
      [0] https://man7.org/linux/man-pages/man8/ethtool.8.html
      
      Fixes: 87324e74 ("ice: Implement ethtool ops for channels")
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Tested-by: default avatarChandan Kumar Rout <chandanx.rout@intel.com>
      Tested-by: default avatarPucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
      Acked-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      05d6f442
    • Ryosuke Yasuoka's avatar
      nfc: nci: Fix handling of zero-length payload packets in nci_rx_work() · 6671e352
      Ryosuke Yasuoka authored
      When nci_rx_work() receives a zero-length payload packet, it should not
      discard the packet and exit the loop. Instead, it should continue
      processing subsequent packets.
      
      Fixes: d24b0353 ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet")
      Signed-off-by: default avatarRyosuke Yasuoka <ryasuoka@redhat.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Link: https://lore.kernel.org/r/20240521153444.535399-1-ryasuoka@redhat.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6671e352
    • Paolo Abeni's avatar
      net: relax socket state check at accept time. · 26afda78
      Paolo Abeni authored
      Christoph reported the following splat:
      
      WARNING: CPU: 1 PID: 772 at net/ipv4/af_inet.c:761 __inet_accept+0x1f4/0x4a0
      Modules linked in:
      CPU: 1 PID: 772 Comm: syz-executor510 Not tainted 6.9.0-rc7-g7da7119fe22b #56
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      RIP: 0010:__inet_accept+0x1f4/0x4a0 net/ipv4/af_inet.c:759
      Code: 04 38 84 c0 0f 85 87 00 00 00 41 c7 04 24 03 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ec b7 da fd <0f> 0b e9 7f fe ff ff e8 e0 b7 da fd 0f 0b e9 fe fe ff ff 89 d9 80
      RSP: 0018:ffffc90000c2fc58 EFLAGS: 00010293
      RAX: ffffffff836bdd14 RBX: 0000000000000000 RCX: ffff888104668000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: dffffc0000000000 R08: ffffffff836bdb89 R09: fffff52000185f64
      R10: dffffc0000000000 R11: fffff52000185f64 R12: dffffc0000000000
      R13: 1ffff92000185f98 R14: ffff88810754d880 R15: ffff8881007b7800
      FS:  000000001c772880(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fb9fcf2e178 CR3: 00000001045d2002 CR4: 0000000000770ef0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <TASK>
       inet_accept+0x138/0x1d0 net/ipv4/af_inet.c:786
       do_accept+0x435/0x620 net/socket.c:1929
       __sys_accept4_file net/socket.c:1969 [inline]
       __sys_accept4+0x9b/0x110 net/socket.c:1999
       __do_sys_accept net/socket.c:2016 [inline]
       __se_sys_accept net/socket.c:2013 [inline]
       __x64_sys_accept+0x7d/0x90 net/socket.c:2013
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0x58/0x100 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x76/0x7e
      RIP: 0033:0x4315f9
      Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab b4 fd ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffdb26d9c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
      RAX: ffffffffffffffda RBX: 0000000000400300 RCX: 00000000004315f9
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
      RBP: 00000000006e1018 R08: 0000000000400300 R09: 0000000000400300
      R10: 0000000000400300 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000040cdf0 R14: 000000000040ce80 R15: 0000000000000055
       </TASK>
      
      The reproducer invokes shutdown() before entering the listener status.
      After commit 94062790 ("tcp: defer shutdown(SEND_SHUTDOWN) for
      TCP_SYN_RECV sockets"), the above causes the child to reach the accept
      syscall in FIN_WAIT1 status.
      
      Eric noted we can relax the existing assertion in __inet_accept()
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/490Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: 94062790 ("tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets")
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/23ab880a44d8cfd967e84de8b93dbf48848e3d8c.1716299669.git.pabeni@redhat.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      26afda78
    • Jason Xing's avatar
      tcp: remove 64 KByte limit for initial tp->rcv_wnd value · 378979e9
      Jason Xing authored
      Recently, we had some servers upgraded to the latest kernel and noticed
      the indicator from the user side showed worse results than before. It is
      caused by the limitation of tp->rcv_wnd.
      
      In 2018 commit a337531b ("tcp: up initial rmem to 128KB and SYN rwin
      to around 64KB") limited the initial value of tp->rcv_wnd to 65535, most
      CDN teams would not benefit from this change because they cannot have a
      large window to receive a big packet, which will be slowed down especially
      in long RTT. Small rcv_wnd means slow transfer speed, to some extent. It's
      the side effect for the latency/time-sensitive users.
      
      To avoid future confusion, current change doesn't affect the initial
      receive window on the wire in a SYN or SYN+ACK packet which are set within
      65535 bytes according to RFC 7323 also due to the limit in
      __tcp_transmit_skb():
      
          th->window      = htons(min(tp->rcv_wnd, 65535U));
      
      In one word, __tcp_transmit_skb() already ensures that constraint is
      respected, no matter how large tp->rcv_wnd is. The change doesn't violate
      RFC.
      
      Let me provide one example if with or without the patch:
      Before:
      client   --- SYN: rwindow=65535 ---> server
      client   <--- SYN+ACK: rwindow=65535 ----  server
      client   --- ACK: rwindow=65536 ---> server
      Note: for the last ACK, the calculation is 512 << 7.
      
      After:
      client   --- SYN: rwindow=65535 ---> server
      client   <--- SYN+ACK: rwindow=65535 ----  server
      client   --- ACK: rwindow=175232 ---> server
      Note: I use the following command to make it work:
      ip route change default via [ip] dev eth0 metric 100 initrwnd 120
      For the last ACK, the calculation is 1369 << 7.
      
      When we apply such a patch, having a large rcv_wnd if the user tweak this
      knob can help transfer data more rapidly and save some rtts.
      
      Fixes: a337531b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
      Signed-off-by: default avatarJason Xing <kernelxing@tencent.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Link: https://lore.kernel.org/r/20240521134220.12510-1-kerneljasonxing@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      378979e9
    • Romain Gantois's avatar
      net: ti: icssg_prueth: Fix NULL pointer dereference in prueth_probe() · b31c7e78
      Romain Gantois authored
      In the prueth_probe() function, if one of the calls to emac_phy_connect()
      fails due to of_phy_connect() returning NULL, then the subsequent call to
      phy_attached_info() will dereference a NULL pointer.
      
      Check the return code of emac_phy_connect and fail cleanly if there is an
      error.
      
      Fixes: 128d5874 ("net: ti: icssg-prueth: Add ICSSG ethernet driver")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRomain Gantois <romain.gantois@bootlin.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarMD Danish Anwar <danishanwar@ti.com>
      Link: https://lore.kernel.org/r/20240521-icssg-prueth-fix-v1-1-b4b17b1433e9@bootlin.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b31c7e78
    • Dae R. Jeong's avatar
      tls: fix missing memory barrier in tls_init · 91e61dd7
      Dae R. Jeong authored
      In tls_init(), a write memory barrier is missing, and store-store
      reordering may cause NULL dereference in tls_{setsockopt,getsockopt}.
      
      CPU0                               CPU1
      -----                              -----
      // In tls_init()
      // In tls_ctx_create()
      ctx = kzalloc()
      ctx->sk_proto = READ_ONCE(sk->sk_prot) -(1)
      
      // In update_sk_prot()
      WRITE_ONCE(sk->sk_prot, tls_prots)     -(2)
      
                                         // In sock_common_setsockopt()
                                         READ_ONCE(sk->sk_prot)->setsockopt()
      
                                         // In tls_{setsockopt,getsockopt}()
                                         ctx->sk_proto->setsockopt()    -(3)
      
      In the above scenario, when (1) and (2) are reordered, (3) can observe
      the NULL value of ctx->sk_proto, causing NULL dereference.
      
      To fix it, we rely on rcu_assign_pointer() which implies the release
      barrier semantic. By moving rcu_assign_pointer() after ctx->sk_proto is
      initialized, we can ensure that ctx->sk_proto are visible when
      changing sk->sk_prot.
      
      Fixes: d5bee737 ("net/tls: Annotate access to sk_prot with READ_ONCE/WRITE_ONCE")
      Signed-off-by: default avatarYewon Choi <woni9911@gmail.com>
      Signed-off-by: default avatarDae R. Jeong <threeearcat@gmail.com>
      Link: https://lore.kernel.org/netdev/ZU4OJG56g2V9z_H7@dragonet/T/
      Link: https://lore.kernel.org/r/Zkx4vjSFp0mfpjQ2@libra05Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      91e61dd7
    • Wei Fang's avatar
      net: fec: avoid lock evasion when reading pps_enable · 3b1c92f8
      Wei Fang authored
      The assignment of pps_enable is protected by tmreg_lock, but the read
      operation of pps_enable is not. So the Coverity tool reports a lock
      evasion warning which may cause data race to occur when running in a
      multithread environment. Although this issue is almost impossible to
      occur, we'd better fix it, at least it seems more logically reasonable,
      and it also prevents Coverity from continuing to issue warnings.
      
      Fixes: 278d2404 ("net: fec: ptp: Enable PPS output based on ptp clock")
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Link: https://lore.kernel.org/r/20240521023800.17102-1-wei.fang@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3b1c92f8
    • Jacob Keller's avatar
      Revert "ixgbe: Manual AN-37 for troublesome link partners for X550 SFI" · b35b1c0b
      Jacob Keller authored
      This reverts commit 56573604.
      
      According to the commit, it implements a manual AN-37 for some
      "troublesome" Juniper MX5 switches. This appears to be a workaround for a
      particular switch.
      
      It has been reported that this causes a severe breakage for other switches,
      including a Cisco 3560CX-12PD-S.
      
      The code appears to be a workaround for a specific switch which fails to
      link in SFI mode. It expects to see AN-37 auto negotiation in order to
      link. The Cisco switch is not expecting AN-37 auto negotiation. When the
      device starts the manual AN-37, the Cisco switch decides that the port is
      confused and stops attempting to link with it. This persists until a power
      cycle. A simple driver unload and reload does not resolve the issue, even
      if loading with a version of the driver which lacks this workaround.
      
      The authors of the workaround commit have not responded with
      clarifications, and the result of the workaround is complete failure to
      connect with other switches.
      
      This appears to be a case where the driver can either "correctly" link with
      the Juniper MX5 switch, at the cost of bricking the link with the Cisco
      switch, or it can behave properly for the Cisco switch, but fail to link
      with the Junipir MX5 switch. I do not know enough about the standards
      involved to clearly determine whether either switch is at fault or behaving
      incorrectly. Nor do I know whether there exists some alternative fix which
      corrects behavior with both switches.
      
      Revert the workaround for the Juniper switch.
      
      Fixes: 56573604 ("ixgbe: Manual AN-37 for troublesome link partners for X550 SFI")
      Link: https://lore.kernel.org/netdev/cbe874db-9ac9-42b8-afa0-88ea910e1e99@intel.com/T/
      Link: https://forum.proxmox.com/threads/intel-x553-sfp-ixgbe-no-go-on-pve8.135129/#post-612291Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Cc: Jeff Daly <jeffd@silicom-usa.com>
      Cc: kernel.org-fo5k2w@ycharbi.fr
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240520-net-2024-05-20-revert-silicom-switch-workaround-v1-1-50f80f261c94@intel.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b35b1c0b