- 28 May, 2024 6 commits
-
-
Matthew Wilcox (Oracle) authored
Convert all the users to operate on a folio. Saves sixteen calls to compound_head(). We still use sizeof(struct page) in print_hex_dump, otherwise it will go into the second and third pages of the folio which won't exist for jfs folios (since they are not large). This needs a better solution, but finding it can be postponed. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
-
Matthew Wilcox (Oracle) authored
Convert the mp->page to a folio and operate on it. That lets us convert metapage_write_one() to take a folio. Replaces five calls to compound_head() with one. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
-
Matthew Wilcox (Oracle) authored
All their callers now have a folio, so pass it in. Remove mp_anchor() as inc_io() was the last user. No savings here, just cleaning up some remnants. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
-
Matthew Wilcox (Oracle) authored
Access folio->private directly instead of testing the page private flag. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
-
Matthew Wilcox (Oracle) authored
Retrieve a folio from the page cache instead of a page. Saves a couple of calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
-
Matthew Wilcox (Oracle) authored
This means also converting the two handlers to take a folio. Saves four calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
-
- 24 May, 2024 6 commits
-
-
Matthew Wilcox (Oracle) authored
All callers now have a folio, so pass it in instead of the page. Removes a couple of calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
-
Matthew Wilcox (Oracle) authored
Convert mp->page to a folio and remove 7 hidden calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
-
Matthew Wilcox (Oracle) authored
Both of its callers now have a folio, so convert this function. Use folio_attach_private() instead of manually setting folio->private. This also gets the expected refcount of the folio correct. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
-
Matthew Wilcox (Oracle) authored
Remove four hidden calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
-
Matthew Wilcox (Oracle) authored
Implement writepages rather than writepage by using write_cache_pages() to call metapage_write_folio(). Use bio_add_folio_nofail() as we know we just allocated the bio. Replace the call to SetPageError (which is never checked) with a call to mapping_set_error (which ... might be checked somewhere?) Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
-
Matthew Wilcox (Oracle) authored
Use bio_add_folio_nofail() as we just allocated the bio and know it cannot fail. Other than that, this is a 1:1 conversion from page APIs to folio APIs. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
-
- 23 May, 2024 28 commits
-
-
git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds authored
Pull NFS client updates from Trond Myklebust: "Stable fixes: - nfs: fix undefined behavior in nfs_block_bits() - NFSv4.2: Fix READ_PLUS when server doesn't support OP_READ_PLUS Bugfixes: - Fix mixing of the lock/nolock and local_lock mount options - NFSv4: Fixup smatch warning for ambiguous return - NFSv3: Fix remount when using the legacy binary mount api - SUNRPC: Fix the handling of expired RPCSEC_GSS contexts - SUNRPC: fix the NFSACL RPC retries when soft mounts are enabled - rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL Features and cleanups: - NFSv3: Use the atomic_open API to fix open(O_CREAT|O_TRUNC) - pNFS/filelayout: S layout segment range in LAYOUTGET - pNFS: rework pnfs_generic_pg_check_layout to check IO range - NFSv2: Turn off enabling of NFS v2 by default" * tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: fix undefined behavior in nfs_block_bits() pNFS: rework pnfs_generic_pg_check_layout to check IO range pNFS/filelayout: check layout segment range pNFS/filelayout: fixup pNfs allocation modes rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL NFS: Don't enable NFS v2 by default NFS: Fix READ_PLUS when server doesn't support OP_READ_PLUS sunrpc: fix NFSACL RPC retry on soft mount SUNRPC: fix handling expired GSS context nfs: keep server info for remounts NFSv4: Fixup smatch warning for ambiguous return NFS: make sure lock/nolock overriding local_lock mount option NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly. pNFS/filelayout: Specify the layout segment range in LAYOUTGET pNFS/filelayout: Remove the whole file layout requirement
-
git://git.kernel.dk/linuxLinus Torvalds authored
Pull more block updates from Jens Axboe: "Followup block updates, mostly due to NVMe being a bit late to the party. But nothing major in there, so not a big deal. In detail, this contains: - NVMe pull request via Keith: - Fabrics connection retries (Daniel, Hannes) - Fabrics logging enhancements (Tokunori) - RDMA delete optimization (Sagi) - ublk DMA alignment fix (me) - null_blk sparse warning fixes (Bart) - Discard support for brd (Keith) - blk-cgroup list corruption fixes (Ming) - blk-cgroup stat propagation fix (Waiman) - Regression fix for plugging stall with md (Yu) - Misc fixes or cleanups (David, Jeff, Justin)" * tag 'block-6.10-20240523' of git://git.kernel.dk/linux: (24 commits) null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues' blk-throttle: remove unused struct 'avg_latency_bucket' block: fix lost bio for plug enabled bio based device block: t10-pi: add MODULE_DESCRIPTION() blk-mq: add helper for checking if one CPU is mapped to specified hctx blk-cgroup: Properly propagate the iostat update up the hierarchy blk-cgroup: fix list corruption from reorder of WRITE ->lqueued blk-cgroup: fix list corruption from resetting io stat cdrom: rearrange last_media_change check to avoid unintentional overflow nbd: Fix signal handling nbd: Remove a local variable from nbd_send_cmd() nbd: Improve the documentation of the locking assumptions nbd: Remove superfluous casts nbd: Use NULL to represent a pointer brd: implement discard support null_blk: Fix two sparse warnings ublk_drv: set DMA alignment mask to 3 nvme-rdma, nvme-tcp: include max reconnects for reconnect logging nvmet-rdma: Avoid o(n^2) loop in delete_ctrl nvme: do not retry authentication failures ...
-
git://git.kernel.dk/linuxLinus Torvalds authored
Pull io_uring fixes from Jens Axboe: "Single fix here for a regression in 6.9, and then a simple cleanup removing some dead code" * tag 'io_uring-6.10-20240523' of git://git.kernel.dk/linux: io_uring: remove checks for NULL 'sq_offset' io_uring/sqpoll: ensure that normal task_work is also run timely
-
Linus Torvalds authored
Merge tag 'regulator-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "A bunch of fixes that came in during the merge window. Matti found several issues with some of the more complexly configured Rohm regulators and the helpers they use and there were some errors in the specification of tps6594 when regulators are grouped together" * tag 'regulator-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: tps6594-regulator: Correct multi-phase configuration regulator: tps6287x: Force writing VSEL bit regulator: pickable ranges: don't always cache vsel regulator: rohm-regulator: warn if unsupported voltage is set regulator: bd71828: Don't overwrite runtime voltages
-
Linus Torvalds authored
Merge tag 'regmap-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fix from Mark Brown: "Guenter ran with memory sanitisers and found an issue in the new KUnit tests that Richard added where an assumption in older test code was exposed, this was fixed quickly by Richard" * tag 'regmap-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: kunit: Fix array overflow in stride() test
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from Paolo Abeni: "Quite smaller than usual. Notably it includes the fix for the unix regression from the past weeks. The TCP window fix will require some follow-up, already queued. Current release - regressions: - af_unix: fix garbage collection of embryos Previous releases - regressions: - af_unix: fix race between GC and receive path - ipv6: sr: fix missing sk_buff release in seg6_input_core - tcp: remove 64 KByte limit for initial tp->rcv_wnd value - eth: r8169: fix rx hangup - eth: lan966x: remove ptp traps in case the ptp is not enabled - eth: ixgbe: fix link breakage vs cisco switches - eth: ice: prevent ethtool from corrupting the channels Previous releases - always broken: - openvswitch: set the skbuff pkt_type for proper pmtud support - tcp: Fix shift-out-of-bounds in dctcp_update_alpha() Misc: - a bunch of selftests stabilization patches" * tag 'net-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (25 commits) r8169: Fix possible ring buffer corruption on fragmented Tx packets. idpf: Interpret .set_channels() input differently ice: Interpret .set_channels() input differently nfc: nci: Fix handling of zero-length payload packets in nci_rx_work() net: relax socket state check at accept time. tcp: remove 64 KByte limit for initial tp->rcv_wnd value net: ti: icssg_prueth: Fix NULL pointer dereference in prueth_probe() tls: fix missing memory barrier in tls_init net: fec: avoid lock evasion when reading pps_enable Revert "ixgbe: Manual AN-37 for troublesome link partners for X550 SFI" testing: net-drv: use stats64 for testing net: mana: Fix the extra HZ in mana_hwc_send_request net: lan966x: Remove ptp traps in case the ptp is not enabled. openvswitch: Set the skbuff pkt_type for proper pmtud support. selftest: af_unix: Make SCM_RIGHTS into OOB data. af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS tcp: Fix shift-out-of-bounds in dctcp_update_alpha(). selftests/net: use tc rule to filter the na packet ipv6: sr: fix memleak in seg6_hmac_init_algo af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock. ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-traceLinus Torvalds authored
Pull tracing fixes from Steven Rostedt: "Minor last minute fixes: - Fix a very tight race between the ring buffer readers and resizing the ring buffer - Correct some stale comments in the ring buffer code - Fix kernel-doc in the rv code - Add a MODULE_DESCRIPTION to preemptirq_delay_test" * tag 'trace-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rv: Update rv_en(dis)able_monitor doc to match kernel-doc tracing: Add MODULE_DESCRIPTION() to preemptirq_delay_test ring-buffer: Fix a race between readers and resize checks ring-buffer: Correct stale comments related to non-consuming readers
-
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-traceLinus Torvalds authored
Pull tracing tool fix from Steven Rostedt: "Fix printf format warnings in latency-collector. Use the printf format string with %s to take a string instead of taking in a string directly" * tag 'trace-tools-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tools/latency-collector: Fix -Wformat-security compile warns
-
Linus Torvalds authored
Merge tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing cleanup from Steven Rostedt: "Remove second argument of __assign_str() The __assign_str() macro logic of the TRACE_EVENT() macro was optimized so that it no longer needs the second argument. The __assign_str() is always matched with __string() field that takes a field name and the source for that field: __string(field, source) The TRACE_EVENT() macro logic will save off the source value and then use that value to copy into the ring buffer via the __assign_str(). Before commit c1fa617c ("tracing: Rework __assign_str() and __string() to not duplicate getting the string"), the __assign_str() needed the second argument which would perform the same logic as the __string() source parameter did. Not only would this add overhead, but it was error prone as if the __assign_str() source produced something different, it may not have allocated enough for the string in the ring buffer (as the __string() source was used to determine how much to allocate) Now that the __assign_str() just uses the same string that was used in __string() it no longer needs the source parameter. It can now be removed" * tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/treewide: Remove second parameter of __assign_str()
-
Linus Torvalds authored
Merge tag 'sparc-for-6.10-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc Pull sparc updates from Andreas Larsson: - Avoid on-stack cpumask variables in a number of places - Move struct termio to asm/termios.h, matching other architectures and allowing certain user space applications to build also for sparc - Fix missing prototype warnings for sparc64 - Fix version generation warnings for sparc32 - Fix bug where non-consecutive CPU IDs lead to some CPUs not starting - Simplification using swap and cleanup using NULL for pointer - Convert sparc parport and chmc drivers to use remove callbacks returning void * tag 'sparc-for-6.10-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc: sparc/leon: Remove on-stack cpumask var sparc/pci_msi: Remove on-stack cpumask var sparc/of: Remove on-stack cpumask var sparc/irq: Remove on-stack cpumask var sparc/srmmu: Remove on-stack cpumask var sparc: chmc: Convert to platform remove callback returning void sparc: parport: Convert to platform remove callback returning void sparc: Compare pointers to NULL instead of 0 sparc: Use swap() to fix Coccinelle warning sparc32: Fix version generation failed warnings sparc64: Fix number of online CPUs sparc64: Fix prototype warning for sched_clock sparc64: Fix prototype warnings in adi_64.c sparc64: Fix prototype warning for dma_4v_iotsb_bind sparc64: Fix prototype warning for uprobe_trap sparc64: Fix prototype warning for alloc_irqstack_bootmem sparc64: Fix prototype warning for vmemmap_free sparc64: Fix prototype warnings in traps_64.c sparc64: Fix prototype warning for init_vdso_image sparc: move struct termio to asm/termios.h
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds authored
Pull arm64 fixes from Will Deacon: "The major fix here is for a filesystem corruption issue reported on Apple M1 as a result of buggy management of the floating point register state introduced in 6.8. I initially reverted one of the offending patches, but in the end Ard cooked a proper fix so there's a revert+reapply in the series. Aside from that, we've got some CPU errata workarounds and misc other fixes. - Fix broken FP register state tracking which resulted in filesystem corruption when dm-crypt is used - Workarounds for Arm CPU errata affecting the SSBS Spectre mitigation - Fix lockdep assertion in DMC620 memory controller PMU driver - Fix alignment of BUG table when CONFIG_DEBUG_BUGVERBOSE is disabled" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64/fpsimd: Avoid erroneous elide of user state reload Reapply "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD" arm64: asm-bug: Add .align 2 to the end of __BUG_ENTRY perf/arm-dmc620: Fix lockdep assert in ->event_init() Revert "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD" arm64: errata: Add workaround for Arm errata 3194386 and 3312417 arm64: cputype: Add Neoverse-V3 definitions arm64: cputype: Add Cortex-X4 definitions arm64: barrier: Restore spec_bar() macro
-
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds authored
Pull virtio updates from Michael Tsirkin: "Several new features here: - virtio-net is finally supported in vduse - virtio (balloon and mem) interaction with suspend is improved - vhost-scsi now handles signals better/faster And fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits) virtio-pci: Check if is_avq is NULL virtio: delete vq in vp_find_vqs_msix() when request_irq() fails MAINTAINERS: add Eugenio Pérez as reviewer vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API vp_vdpa: don't allocate unused msix vectors sound: virtio: drop owner assignment fuse: virtio: drop owner assignment scsi: virtio: drop owner assignment rpmsg: virtio: drop owner assignment nvdimm: virtio_pmem: drop owner assignment wifi: mac80211_hwsim: drop owner assignment vsock/virtio: drop owner assignment net: 9p: virtio: drop owner assignment net: virtio: drop owner assignment net: caif: virtio: drop owner assignment misc: nsm: drop owner assignment iommu: virtio: drop owner assignment drm/virtio: drop owner assignment gpio: virtio: drop owner assignment firmware: arm_scmi: virtio: drop owner assignment ...
-
Shuah Khan authored
Fix the following -Wformat-security compile warnings adding missing format arguments: latency-collector.c: In function ‘show_available’: latency-collector.c:938:17: warning: format not a string literal and no format arguments [-Wformat-security] 938 | warnx(no_tracer_msg); | ^~~~~ latency-collector.c:943:17: warning: format not a string literal and no format arguments [-Wformat-security] 943 | warnx(no_latency_tr_msg); | ^~~~~ latency-collector.c: In function ‘find_default_tracer’: latency-collector.c:986:25: warning: format not a string literal and no format arguments [-Wformat-security] 986 | errx(EXIT_FAILURE, no_tracer_msg); | ^~~~ latency-collector.c: In function ‘scan_arguments’: latency-collector.c:1881:33: warning: format not a string literal and no format arguments [-Wformat-security] 1881 | errx(EXIT_FAILURE, no_tracer_msg); | ^~~~ Link: https://lore.kernel.org/linux-trace-kernel/20240404011009.32945-1-skhan@linuxfoundation.org Cc: stable@vger.kernel.org Fixes: e23db805 ("tracing/tools: Add the latency-collector to tools directory") Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Ken Milmore authored
An issue was found on the RTL8125b when transmitting small fragmented packets, whereby invalid entries were inserted into the transmit ring buffer, subsequently leading to calls to dma_unmap_single() with a null address. This was caused by rtl8169_start_xmit() not noticing changes to nr_frags which may occur when small packets are padded (to work around hardware quirks) in rtl8169_tso_csum_v2(). To fix this, postpone inspecting nr_frags until after any padding has been applied. Fixes: 9020845f ("r8169: improve rtl8169_start_xmit") Cc: stable@vger.kernel.org Signed-off-by: Ken Milmore <ken.milmore@gmail.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/27ead18b-c23d-4f49-a020-1fc482c5ac95@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Yu Kuai authored
Writing 'power' and 'submit_queues' concurrently will trigger kernel panic: Test script: modprobe null_blk nr_devices=0 mkdir -p /sys/kernel/config/nullb/nullb0 while true; do echo 1 > submit_queues; echo 4 > submit_queues; done & while true; do echo 1 > power; echo 0 > power; done Test result: BUG: kernel NULL pointer dereference, address: 0000000000000148 Oops: 0000 [#1] PREEMPT SMP RIP: 0010:__lock_acquire+0x41d/0x28f0 Call Trace: <TASK> lock_acquire+0x121/0x450 down_write+0x5f/0x1d0 simple_recursive_removal+0x12f/0x5c0 blk_mq_debugfs_unregister_hctxs+0x7c/0x100 blk_mq_update_nr_hw_queues+0x4a3/0x720 nullb_update_nr_hw_queues+0x71/0xf0 [null_blk] nullb_device_submit_queues_store+0x79/0xf0 [null_blk] configfs_write_iter+0x119/0x1e0 vfs_write+0x326/0x730 ksys_write+0x74/0x150 This is because del_gendisk() can concurrent with blk_mq_update_nr_hw_queues(): nullb_device_power_store nullb_apply_submit_queues null_del_dev del_gendisk nullb_update_nr_hw_queues if (!dev->nullb) // still set while gendisk is deleted return 0 blk_mq_update_nr_hw_queues dev->nullb = NULL Fix this problem by resuing the global mutex to protect nullb_device_power_store() and nullb_update_nr_hw_queues() from configfs. Fixes: 45919fbf ("null_blk: Enable modifying 'submit_queues' after an instance has been configured") Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/all/CAHj4cs9LgsHLnjg8z06LQ3Pr5cax-+Ps+xT7AP7TPnEjStuwZA@mail.gmail.com/Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20240523153934.1937851-1-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Paolo Abeni authored
Jacob Keller says: ==================== intel: Interpret .set_channels() input differently The ice and idpf drivers can trigger a crash with AF_XDP due to incorrect interpretation of the asymmetric Tx and Rx parameters in their .set_channels() implementations: 1. ethtool -l <IFNAME> -> combined: 40 2. Attach AF_XDP to queue 30 3. ethtool -L <IFNAME> rx 15 tx 15 combined number is not specified, so command becomes {rx_count = 15, tx_count = 15, combined_count = 40}. 4. ethnl_set_channels checks, if there are any AF_XDP of queues from the new (combined_count + rx_count) to the old one, so from 55 to 40, check does not trigger. 5. the driver interprets `rx 15 tx 15` as 15 combined channels and deletes the queue that AF_XDP is attached to. This is fundamentally a problem with interpreting a request for asymmetric queues as symmetric combined queues. Fix the ice and idpf drivers to stop interpreting such requests as a request for combined queues. Due to current driver design for both ice and idpf, it is not possible to support requests of the same count of Tx and Rx queues with independent interrupts, (i.e. ethtool -L <IFNAME> rx 15 tx 15) so such requests are now rejected. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> ==================== Link: https://lore.kernel.org/r/20240521-iwl-net-2024-05-14-set-channels-fixes-v2-0-7aa39e2e99f1@intel.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Larysa Zaremba authored
Unlike ice, idpf does not check, if user has requested at least 1 combined channel. Instead, it relies on a check in the core code. Unfortunately, the check does not trigger for us because of the hacky .set_channels() interpretation logic that is not consistent with the core code. This naturally leads to user being able to trigger a crash with an invalid input. This is how: 1. ethtool -l <IFNAME> -> combined: 40 2. ethtool -L <IFNAME> rx 0 tx 0 combined number is not specified, so command becomes {rx_count = 0, tx_count = 0, combined_count = 40}. 3. ethnl_set_channels checks, if there is at least 1 RX and 1 TX channel, comparing (combined_count + rx_count) and (combined_count + tx_count) to zero. Obviously, (40 + 0) is greater than zero, so the core code deems the input OK. 4. idpf interprets `rx 0 tx 0` as 0 channels and tries to proceed with such configuration. The issue has to be solved fundamentally, as current logic is also known to cause AF_XDP problems in ice [0]. Interpret the command in a way that is more consistent with ethtool manual [1] (--show-channels and --set-channels) and new ice logic. Considering that in the idpf driver only the difference between RX and TX queues forms dedicated channels, change the correct way to set number of channels to: ethtool -L <IFNAME> combined 10 /* For symmetric queues */ ethtool -L <IFNAME> combined 8 tx 2 rx 0 /* For asymmetric queues */ [0] https://lore.kernel.org/netdev/20240418095857.2827-1-larysa.zaremba@intel.com/ [1] https://man7.org/linux/man-pages/man8/ethtool.8.html Fixes: 02cbfba1 ("idpf: add ethtool callbacks") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Larysa Zaremba authored
A bug occurs because a safety check guarding AF_XDP-related queues in ethnl_set_channels(), does not trigger. This happens, because kernel and ice driver interpret the ethtool command differently. How the bug occurs: 1. ethtool -l <IFNAME> -> combined: 40 2. Attach AF_XDP to queue 30 3. ethtool -L <IFNAME> rx 15 tx 15 combined number is not specified, so command becomes {rx_count = 15, tx_count = 15, combined_count = 40}. 4. ethnl_set_channels checks, if there are any AF_XDP of queues from the new (combined_count + rx_count) to the old one, so from 55 to 40, check does not trigger. 5. ice interprets `rx 15 tx 15` as 15 combined channels and deletes the queue that AF_XDP is attached to. Interpret the command in a way that is more consistent with ethtool manual [0] (--show-channels and --set-channels). Considering that in the ice driver only the difference between RX and TX queues forms dedicated channels, change the correct way to set number of channels to: ethtool -L <IFNAME> combined 10 /* For symmetric queues */ ethtool -L <IFNAME> combined 8 tx 2 rx 0 /* For asymmetric queues */ [0] https://man7.org/linux/man-pages/man8/ethtool.8.html Fixes: 87324e74 ("ice: Implement ethtool ops for channels") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Ryosuke Yasuoka authored
When nci_rx_work() receives a zero-length payload packet, it should not discard the packet and exit the loop. Instead, it should continue processing subsequent packets. Fixes: d24b0353 ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet") Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240521153444.535399-1-ryasuoka@redhat.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Paolo Abeni authored
Christoph reported the following splat: WARNING: CPU: 1 PID: 772 at net/ipv4/af_inet.c:761 __inet_accept+0x1f4/0x4a0 Modules linked in: CPU: 1 PID: 772 Comm: syz-executor510 Not tainted 6.9.0-rc7-g7da7119fe22b #56 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 RIP: 0010:__inet_accept+0x1f4/0x4a0 net/ipv4/af_inet.c:759 Code: 04 38 84 c0 0f 85 87 00 00 00 41 c7 04 24 03 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ec b7 da fd <0f> 0b e9 7f fe ff ff e8 e0 b7 da fd 0f 0b e9 fe fe ff ff 89 d9 80 RSP: 0018:ffffc90000c2fc58 EFLAGS: 00010293 RAX: ffffffff836bdd14 RBX: 0000000000000000 RCX: ffff888104668000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: dffffc0000000000 R08: ffffffff836bdb89 R09: fffff52000185f64 R10: dffffc0000000000 R11: fffff52000185f64 R12: dffffc0000000000 R13: 1ffff92000185f98 R14: ffff88810754d880 R15: ffff8881007b7800 FS: 000000001c772880(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb9fcf2e178 CR3: 00000001045d2002 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> inet_accept+0x138/0x1d0 net/ipv4/af_inet.c:786 do_accept+0x435/0x620 net/socket.c:1929 __sys_accept4_file net/socket.c:1969 [inline] __sys_accept4+0x9b/0x110 net/socket.c:1999 __do_sys_accept net/socket.c:2016 [inline] __se_sys_accept net/socket.c:2013 [inline] __x64_sys_accept+0x7d/0x90 net/socket.c:2013 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x58/0x100 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x4315f9 Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab b4 fd ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffdb26d9c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b RAX: ffffffffffffffda RBX: 0000000000400300 RCX: 00000000004315f9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004 RBP: 00000000006e1018 R08: 0000000000400300 R09: 0000000000400300 R10: 0000000000400300 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000040cdf0 R14: 000000000040ce80 R15: 0000000000000055 </TASK> The reproducer invokes shutdown() before entering the listener status. After commit 94062790 ("tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets"), the above causes the child to reach the accept syscall in FIN_WAIT1 status. Eric noted we can relax the existing assertion in __inet_accept() Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/490Suggested-by: Eric Dumazet <edumazet@google.com> Fixes: 94062790 ("tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/23ab880a44d8cfd967e84de8b93dbf48848e3d8c.1716299669.git.pabeni@redhat.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Jason Xing authored
Recently, we had some servers upgraded to the latest kernel and noticed the indicator from the user side showed worse results than before. It is caused by the limitation of tp->rcv_wnd. In 2018 commit a337531b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") limited the initial value of tp->rcv_wnd to 65535, most CDN teams would not benefit from this change because they cannot have a large window to receive a big packet, which will be slowed down especially in long RTT. Small rcv_wnd means slow transfer speed, to some extent. It's the side effect for the latency/time-sensitive users. To avoid future confusion, current change doesn't affect the initial receive window on the wire in a SYN or SYN+ACK packet which are set within 65535 bytes according to RFC 7323 also due to the limit in __tcp_transmit_skb(): th->window = htons(min(tp->rcv_wnd, 65535U)); In one word, __tcp_transmit_skb() already ensures that constraint is respected, no matter how large tp->rcv_wnd is. The change doesn't violate RFC. Let me provide one example if with or without the patch: Before: client --- SYN: rwindow=65535 ---> server client <--- SYN+ACK: rwindow=65535 ---- server client --- ACK: rwindow=65536 ---> server Note: for the last ACK, the calculation is 512 << 7. After: client --- SYN: rwindow=65535 ---> server client <--- SYN+ACK: rwindow=65535 ---- server client --- ACK: rwindow=175232 ---> server Note: I use the following command to make it work: ip route change default via [ip] dev eth0 metric 100 initrwnd 120 For the last ACK, the calculation is 1369 << 7. When we apply such a patch, having a large rcv_wnd if the user tweak this knob can help transfer data more rapidly and save some rtts. Fixes: a337531b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20240521134220.12510-1-kerneljasonxing@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Romain Gantois authored
In the prueth_probe() function, if one of the calls to emac_phy_connect() fails due to of_phy_connect() returning NULL, then the subsequent call to phy_attached_info() will dereference a NULL pointer. Check the return code of emac_phy_connect and fail cleanly if there is an error. Fixes: 128d5874 ("net: ti: icssg-prueth: Add ICSSG ethernet driver") Cc: stable@vger.kernel.org Signed-off-by: Romain Gantois <romain.gantois@bootlin.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Link: https://lore.kernel.org/r/20240521-icssg-prueth-fix-v1-1-b4b17b1433e9@bootlin.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Dae R. Jeong authored
In tls_init(), a write memory barrier is missing, and store-store reordering may cause NULL dereference in tls_{setsockopt,getsockopt}. CPU0 CPU1 ----- ----- // In tls_init() // In tls_ctx_create() ctx = kzalloc() ctx->sk_proto = READ_ONCE(sk->sk_prot) -(1) // In update_sk_prot() WRITE_ONCE(sk->sk_prot, tls_prots) -(2) // In sock_common_setsockopt() READ_ONCE(sk->sk_prot)->setsockopt() // In tls_{setsockopt,getsockopt}() ctx->sk_proto->setsockopt() -(3) In the above scenario, when (1) and (2) are reordered, (3) can observe the NULL value of ctx->sk_proto, causing NULL dereference. To fix it, we rely on rcu_assign_pointer() which implies the release barrier semantic. By moving rcu_assign_pointer() after ctx->sk_proto is initialized, we can ensure that ctx->sk_proto are visible when changing sk->sk_prot. Fixes: d5bee737 ("net/tls: Annotate access to sk_prot with READ_ONCE/WRITE_ONCE") Signed-off-by: Yewon Choi <woni9911@gmail.com> Signed-off-by: Dae R. Jeong <threeearcat@gmail.com> Link: https://lore.kernel.org/netdev/ZU4OJG56g2V9z_H7@dragonet/T/ Link: https://lore.kernel.org/r/Zkx4vjSFp0mfpjQ2@libra05Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Wei Fang authored
The assignment of pps_enable is protected by tmreg_lock, but the read operation of pps_enable is not. So the Coverity tool reports a lock evasion warning which may cause data race to occur when running in a multithread environment. Although this issue is almost impossible to occur, we'd better fix it, at least it seems more logically reasonable, and it also prevents Coverity from continuing to issue warnings. Fixes: 278d2404 ("net: fec: ptp: Enable PPS output based on ptp clock") Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://lore.kernel.org/r/20240521023800.17102-1-wei.fang@nxp.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Jacob Keller authored
This reverts commit 56573604. According to the commit, it implements a manual AN-37 for some "troublesome" Juniper MX5 switches. This appears to be a workaround for a particular switch. It has been reported that this causes a severe breakage for other switches, including a Cisco 3560CX-12PD-S. The code appears to be a workaround for a specific switch which fails to link in SFI mode. It expects to see AN-37 auto negotiation in order to link. The Cisco switch is not expecting AN-37 auto negotiation. When the device starts the manual AN-37, the Cisco switch decides that the port is confused and stops attempting to link with it. This persists until a power cycle. A simple driver unload and reload does not resolve the issue, even if loading with a version of the driver which lacks this workaround. The authors of the workaround commit have not responded with clarifications, and the result of the workaround is complete failure to connect with other switches. This appears to be a case where the driver can either "correctly" link with the Juniper MX5 switch, at the cost of bricking the link with the Cisco switch, or it can behave properly for the Cisco switch, but fail to link with the Junipir MX5 switch. I do not know enough about the standards involved to clearly determine whether either switch is at fault or behaving incorrectly. Nor do I know whether there exists some alternative fix which corrects behavior with both switches. Revert the workaround for the Juniper switch. Fixes: 56573604 ("ixgbe: Manual AN-37 for troublesome link partners for X550 SFI") Link: https://lore.kernel.org/netdev/cbe874db-9ac9-42b8-afa0-88ea910e1e99@intel.com/T/ Link: https://forum.proxmox.com/threads/intel-x553-sfp-ixgbe-no-go-on-pve8.135129/#post-612291Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Cc: Jeff Daly <jeffd@silicom-usa.com> Cc: kernel.org-fo5k2w@ycharbi.fr Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240520-net-2024-05-20-revert-silicom-switch-workaround-v1-1-50f80f261c94@intel.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Joe Damato authored
Testing a network device that has large numbers of bytes/packets may overflow. Using stats64 when comparing fixes this problem. I tripped on this while iterating on a qstats patch for mlx5. See below for confirmation without my added code that this is a bug. Before this patch (with added debugging output): $ NETIF=eth0 tools/testing/selftests/drivers/net/stats.py KTAP version 1 1..4 ok 1 stats.check_pause ok 2 stats.check_fec rstat: 481708634 qstat: 666201639514 key: tx-bytes not ok 3 stats.pkt_byte_sum ok 4 stats.qstat_by_ifindex Note the huge delta above ^^^ in the rtnl vs qstats. After this patch: $ NETIF=eth0 tools/testing/selftests/drivers/net/stats.py KTAP version 1 1..4 ok 1 stats.check_pause ok 2 stats.check_fec ok 3 stats.pkt_byte_sum ok 4 stats.qstat_by_ifindex It looks like rtnl_fill_stats in net/core/rtnetlink.c will attempt to copy the 64bit stats into a 32bit structure which is probably why this behavior is occurring. To show this is happening, you can get the underlying stats that the stats.py test uses like this: $ ./cli.py --spec ../../../Documentation/netlink/specs/rt_link.yaml \ --do getlink --json '{"ifi-index": 7}' And examine the output (heavily snipped to show relevant fields): 'stats': { 'multicast': 3739197, 'rx-bytes': 1201525399, 'rx-packets': 56807158, 'tx-bytes': 492404458, 'tx-packets': 1200285371, 'stats64': { 'multicast': 3739197, 'rx-bytes': 35561263767, 'rx-packets': 56807158, 'tx-bytes': 666212335338, 'tx-packets': 1200285371, The stats.py test prior to this patch was using the 'stats' structure above, which matches the failure output on my system. Comparing side by side, rx-bytes and tx-bytes, and getting ethtool -S output: rx-bytes stats: 1201525399 rx-bytes stats64: 35561263767 rx-bytes ethtool: 36203402638 tx-bytes stats: 492404458 tx-bytes stats64: 666212335338 tx-bytes ethtool: 666215360113 Note that the above was taken from a system with an mlx5 NIC, which only exposes ndo_get_stats64. Based on the ethtool output and qstat output, it appears that stats.py should be updated to use the 'stats64' structure for accurate comparisons when packet/byte counters get very large. To confirm that this was not related to the qstats code I was iterating on, I booted a kernel without my driver changes and re-ran the test which shows the qstats are skipped (as they don't exist for mlx5): NETIF=eth0 tools/testing/selftests/drivers/net/stats.py KTAP version 1 1..4 ok 1 stats.check_pause ok 2 stats.check_fec ok 3 stats.pkt_byte_sum # SKIP qstats not supported by the device ok 4 stats.qstat_by_ifindex # SKIP No ifindex supports qstats But, fetching the stats using the CLI $ ./cli.py --spec ../../../Documentation/netlink/specs/rt_link.yaml \ --do getlink --json '{"ifi-index": 7}' Shows the same issue (heavily snipped for relevant fields only): 'stats': { 'multicast': 105489, 'rx-bytes': 530879526, 'rx-packets': 751415, 'tx-bytes': 2510191396, 'tx-packets': 27700323, 'stats64': { 'multicast': 105489, 'rx-bytes': 530879526, 'rx-packets': 751415, 'tx-bytes': 15395093284, 'tx-packets': 27700323, Comparing side by side with ethtool -S on the unmodified mlx5 driver: tx-bytes stats: 2510191396 tx-bytes stats64: 15395093284 tx-bytes ethtool: 17718435810 Fixes: f0e6c86e ("testing: net-drv: add a driver test for stats reporting") Signed-off-by: Joe Damato <jdamato@fastly.com> Link: https://lore.kernel.org/r/20240520235850.190041-1-jdamato@fastly.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Linus Torvalds authored
Merge tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more non-mm updates from Andrew Morton: - A series ("kbuild: enable more warnings by default") from Arnd Bergmann which enables a number of additional build-time warnings. We fixed all the fallout which we could find, there may still be a few stragglers. - Samuel Holland has developed the series "Unified cross-architecture kernel-mode FPU API". This does a lot of consolidation of per-architecture kernel-mode FPU usage and enables the use of newer AMD GPUs on RISC-V. - Tao Su has fixed some selftests build warnings in the series "Selftests: Fix compilation warnings due to missing _GNU_SOURCE definition". - This pull also includes a nilfs2 fixup from Ryusuke Konishi. * tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits) nilfs2: make block erasure safe in nilfs_finish_roll_forward() selftests/harness: use 1024 in place of LINE_MAX Revert "selftests/harness: remove use of LINE_MAX" selftests/fpu: allow building on other architectures selftests/fpu: move FP code to a separate translation unit drm/amd/display: use ARCH_HAS_KERNEL_FPU_SUPPORT drm/amd/display: only use hard-float, not altivec on powerpc riscv: add support for kernel-mode FPU x86: implement ARCH_HAS_KERNEL_FPU_SUPPORT powerpc: implement ARCH_HAS_KERNEL_FPU_SUPPORT LoongArch: implement ARCH_HAS_KERNEL_FPU_SUPPORT lib/raid6: use CC_FLAGS_FPU for NEON CFLAGS arm64: crypto: use CC_FLAGS_FPU for NEON CFLAGS arm64: implement ARCH_HAS_KERNEL_FPU_SUPPORT ARM: crypto: use CC_FLAGS_FPU for NEON CFLAGS ARM: implement ARCH_HAS_KERNEL_FPU_SUPPORT arch: add ARCH_HAS_KERNEL_FPU_SUPPORT x86/fpu: fix asm/fpu/types.h include guard kbuild: enable -Wcast-function-type-strict unconditionally kbuild: enable -Wformat-truncation on clang ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mmLinus Torvalds authored
Pull more mm updates from Andrew Morton: "A series from Dave Chinner which cleans up and fixes the handling of nested allocations within stackdepot and page-owner" * tag 'mm-stable-2024-05-22-17-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/page-owner: use gfp_nested_mask() instead of open coded masking stackdepot: use gfp_nested_mask() instead of open coded masking mm: lift gfp_kmemleak_mask() to gfp.h
-