1. 28 Nov, 2023 1 commit
  2. 27 Nov, 2023 1 commit
  3. 26 Nov, 2023 1 commit
    • Jesper Dangaard Brouer's avatar
      mm/page_pool: catch page_pool memory leaks · dba1b8a7
      Jesper Dangaard Brouer authored
      Pages belonging to a page_pool (PP) instance must be freed through the
      PP APIs in-order to correctly release any DMA mappings and release
      refcnt on the DMA device when freeing PP instance. When PP release a
      page (page_pool_release_page) the page->pp_magic value is cleared.
      
      This patch detect a leaked PP page in free_page_is_bad() via
      unexpected state of page->pp_magic value being PP_SIGNATURE.
      
      We choose to report and treat it as a bad page. It would be possible
      to release the page via returning it to the PP instance as the
      page->pp pointer is likely still valid.
      
      Notice this code is only activated when either compiled with
      CONFIG_DEBUG_VM or boot cmdline debug_pagealloc=on, and
      CONFIG_PAGE_POOL.
      
      Reduced example output of leak with PP_SIGNATURE = dead000000000040:
      
       BUG: Bad page state in process swapper/4  pfn:141fa6
       page:000000006dbf8062 refcount:0 mapcount:0 mapping:0000000000000000 index:0x141fa6000 pfn:0x141fa6
       flags: 0x2fffff80000000(node=0|zone=2|lastcpupid=0x1fffff)
       page_type: 0xffffffff()
       raw: 002fffff80000000 dead000000000040 ffff88814888a000 0000000000000000
       raw: 0000000141fa6000 0000000000000001 00000000ffffffff 0000000000000000
       page dumped because: page_pool leak
       [...]
       Call Trace:
        <IRQ>
        dump_stack_lvl+0x32/0x50
        bad_page+0x70/0xf0
        free_unref_page_prepare+0x263/0x430
        free_unref_page+0x34/0x130
        mlx5e_free_rx_mpwqe+0x190/0x1c0 [mlx5_core]
        mlx5e_post_rx_mpwqes+0x1ac/0x280 [mlx5_core]
        mlx5e_napi_poll+0x12b/0x710 [mlx5_core]
        ? skb_free_head+0x4f/0x90
        __napi_poll+0x2b/0x1c0
        net_rx_action+0x27b/0x360
      
      The advantage is the Call Trace directly points to the function
      leaking the PP page, which in this case is an on purpose bug
      introduced into the mlx5 driver to test this code change.
      
      Currently PP will periodically in page_pool_release_retry()
      printk warning "stalled pool shutdown" which cannot be directly
      corrolated to leaking and might as well be a false positive
      due to SKBs being stuck on a socket for an extended period.
      After this patch we should be able to remove this printk.
      Signed-off-by: default avatarJesper Dangaard Brouer <hawk@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dba1b8a7
  4. 25 Nov, 2023 4 commits
  5. 24 Nov, 2023 21 commits
  6. 23 Nov, 2023 12 commits
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 45c226dd
      Jakub Kicinski authored
      Cross-merge networking fixes after downstream PR.
      
      Conflicts:
      
      drivers/net/ethernet/intel/ice/ice_main.c
        c9663f79 ("ice: adjust switchdev rebuild path")
        77580179 ("ice: restore timestamp configuration after device reset")
      https://lore.kernel.org/all/20231121211259.3348630-1-anthony.l.nguyen@intel.com/
      
      Adjacent changes:
      
      kernel/bpf/verifier.c
        bb124da6 ("bpf: keep track of max number of bpf_loop callback iterations")
        5f99f312 ("bpf: add register bounds sanity checks and sanitization")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      45c226dd
    • Linus Torvalds's avatar
      Merge tag 'net-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · d3fa86b1
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bpf.
      
        Current release - regressions:
      
         - Revert "net: r8169: Disable multicast filter for RTL8168H and
           RTL8107E"
      
         - kselftest: rtnetlink: fix ip route command typo
      
        Current release - new code bugs:
      
         - s390/ism: make sure ism driver implies smc protocol in kconfig
      
         - two build fixes for tools/net
      
        Previous releases - regressions:
      
         - rxrpc: couple of ACK/PING/RTT handling fixes
      
        Previous releases - always broken:
      
         - bpf: verify bpf_loop() callbacks as if they are called unknown
           number of times
      
         - improve stability of auto-bonding with Hyper-V
      
         - account BPF-neigh-redirected traffic in interface statistics
      
        Misc:
      
         - net: fill in some more MODULE_DESCRIPTION()s"
      
      * tag 'net-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits)
        tools: ynl: fix duplicate op name in devlink
        tools: ynl: fix header path for nfsd
        net: ipa: fix one GSI register field width
        tls: fix NULL deref on tls_sw_splice_eof() with empty record
        net: axienet: Fix check for partial TX checksum
        vsock/test: fix SEQPACKET message bounds test
        i40e: Fix adding unsupported cloud filters
        ice: restore timestamp configuration after device reset
        ice: unify logic for programming PFINT_TSYN_MSK
        ice: remove ptp_tx ring parameter flag
        amd-xgbe: propagate the correct speed and duplex status
        amd-xgbe: handle the corner-case during tx completion
        amd-xgbe: handle corner-case during sfp hotplug
        net: veth: fix ethtool stats reporting
        octeontx2-pf: Fix ntuple rule creation to direct packet to VF with higher Rx queue than its PF
        net: usb: qmi_wwan: claim interface 4 for ZTE MF290
        Revert "net: r8169: Disable multicast filter for RTL8168H and RTL8107E"
        net/smc: avoid data corruption caused by decline
        nfc: virtual_ncidev: Add variable to check if ndev is running
        dpll: Fix potential msg memleak when genlmsg_put_reply failed
        ...
      d3fa86b1
    • Jakub Kicinski's avatar
      tools: ynl: fix duplicate op name in devlink · 39f04b14
      Jakub Kicinski authored
      We don't support CRUD-inspired message types in YNL too well.
      One aspect that currently trips us up is the fact that single
      message ID can be used in multiple commands (as the response).
      This leads to duplicate entries in the id-to-string tables:
      
      devlink-user.c:19:34: warning: initialized field overwritten [-Woverride-init]
         19 |         [DEVLINK_CMD_PORT_NEW] = "port-new",
            |                                  ^~~~~~~~~~
      devlink-user.c:19:34: note: (near initialization for ‘devlink_op_strmap[7]’)
      
      Fixes tag points at where the code was generated, the "real" problem
      is that the code generator does not support CRUD.
      
      Fixes: f2f9dd16 ("netlink: specs: devlink: add the remaining command to generate complete split_ops")
      Link: https://lore.kernel.org/r/20231123030558.1611831-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      39f04b14
    • Jakub Kicinski's avatar
      tools: ynl: fix header path for nfsd · 2be35a61
      Jakub Kicinski authored
      The makefile dependency is trying to include the wrong header:
      
      <command-line>: fatal error: ../../../../include/uapi//linux/nfsd.h: No such file or directory
      
      The guard also looks wrong.
      
      Fixes: f14122b2 ("tools: ynl: Add source files for nfsd netlink protocol")
      Reviewed-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Link: https://lore.kernel.org/r/20231123030624.1611925-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2be35a61
    • Alex Elder's avatar
      net: ipa: fix one GSI register field width · 37f02055
      Alex Elder authored
      The width of the R_LENGTH field of the EV_CH_E_CNTXT_1 GSI register
      is 24 bits (not 20 bits) starting with IPA v5.0.  Fix this.
      
      Fixes: faf0678e ("net: ipa: add IPA v5.0 GSI register definitions")
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Link: https://lore.kernel.org/r/20231122231708.896632-1-elder@linaro.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      37f02055
    • Jann Horn's avatar
      tls: fix NULL deref on tls_sw_splice_eof() with empty record · 53f2cb49
      Jann Horn authored
      syzkaller discovered that if tls_sw_splice_eof() is executed as part of
      sendfile() when the plaintext/ciphertext sk_msg are empty, the send path
      gets confused because the empty ciphertext buffer does not have enough
      space for the encryption overhead. This causes tls_push_record() to go on
      the `split = true` path (which is only supposed to be used when interacting
      with an attached BPF program), and then get further confused and hit the
      tls_merge_open_record() path, which then assumes that there must be at
      least one populated buffer element, leading to a NULL deref.
      
      It is possible to have empty plaintext/ciphertext buffers if we previously
      bailed from tls_sw_sendmsg_locked() via the tls_trim_both_msgs() path.
      tls_sw_push_pending_record() already handles this case correctly; let's do
      the same check in tls_sw_splice_eof().
      
      Fixes: df720d28 ("tls/sw: Use splice_eof() to flush")
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+40d43509a099ea756317@syzkaller.appspotmail.com
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Link: https://lore.kernel.org/r/20231122214447.675768-1-jannh@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      53f2cb49
    • Samuel Holland's avatar
      net: axienet: Fix check for partial TX checksum · fd0413bb
      Samuel Holland authored
      Due to a typo, the code checked the RX checksum feature in the TX path.
      
      Fixes: 8a3b7a25 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
      Signed-off-by: default avatarSamuel Holland <samuel.holland@sifive.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarRadhey Shyam Pandey <radhey.shyam.pandey@amd.com>
      Link: https://lore.kernel.org/r/20231122004219.3504219-1-samuel.holland@sifive.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fd0413bb
    • Arseniy Krasnov's avatar
      vsock/test: fix SEQPACKET message bounds test · f0863888
      Arseniy Krasnov authored
      Tune message length calculation to make this test work on machines
      where 'getpagesize()' returns >32KB. Now maximum message length is not
      hardcoded (on machines above it was smaller than 'getpagesize()' return
      value, thus we get negative value and test fails), but calculated at
      runtime and always bigger than 'getpagesize()' result. Reproduced on
      aarch64 with 64KB page size.
      
      Fixes: 5c338112 ("test/vsock: rework message bounds test")
      Signed-off-by: default avatarArseniy Krasnov <avkrasnov@salutedevices.com>
      Reported-by: default avatarBogdan Marcynkov <bmarcynk@redhat.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Link: https://lore.kernel.org/r/20231121211642.163474-1-avkrasnov@salutedevices.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f0863888
    • Ivan Vecera's avatar
      i40e: Fix adding unsupported cloud filters · 4e20655e
      Ivan Vecera authored
      If a VF tries to add unsupported cloud filter through virtchnl
      then i40e_add_del_cloud_filter(_big_buf) returns -ENOTSUPP but
      this error code is stored in 'ret' instead of 'aq_ret' that
      is used as error code sent back to VF. In this scenario where
      one of the mentioned functions fails the value of 'aq_ret'
      is zero so the VF will incorrectly receive a 'success'.
      
      Use 'aq_ret' to store return value and remove 'ret' local
      variable. Additionally fix the issue when filter allocation
      fails, in this case no notification is sent back to the VF.
      
      Fixes: e284fc28 ("i40e: Add and delete cloud filter")
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20231121211338.3348677-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4e20655e
    • Paolo Abeni's avatar
      Merge branch 'ice-restore-timestamp-config-after-reset' · e50a8061
      Paolo Abeni authored
      Tony Nguyen says:
      
      ====================
      ice: restore timestamp config after reset
      
      Jake Keller says:
      
      We recently discovered during internal validation that the ice driver has
      not been properly restoring Tx timestamp configuration after a device reset,
      which resulted in application failures after a device reset.
      
      After some digging, it turned out this problem is two-fold. Since the
      introduction of the PTP support the driver has been clobbering the storage
      of the current timestamp configuration during reset. Thus after a reset, the
      driver will no longer perform Tx or Rx timestamps, and will report
      timestamp configuration as disabled if SIOCGHWTSTAMP ioctl is issued.
      
      In addition, the recently merged auxiliary bus support code missed that
      PFINT_TSYN_MSK must be reprogrammed on the clock owner for E822 devices.
      Failure to restore this register configuration results in the driver no
      longer responding to interrupts from other ports. Depending on the traffic
      pattern, this can either result in increased latency responding to
      timestamps on the non-owner ports, or it can result in the driver never
      reporting any timestamps. The configuration of PFINT_TSYN_MSK was only done
      during initialization. Due to this, the Tx timestamp issue persists even if
      userspace reconfigures timestamping.
      
      This series fixes both issues, as well as removes a redundant Tx ring field
      since we can rely on the skb flag as the primary detector for a Tx timestamp
      request.
      
      Note that I don't think this series will directly apply to older stable
      releases (even v6.6) as we recently refactored a lot of the PTP code to
      support auxiliary bus. Patch 2/3 only matters for the post-auxiliary bus
      implementation. The principle of patch 1/3 and 3/3 could apply as far back
      as the initial PTP support, but I don't think it will apply cleanly as-is.
      ====================
      
      Link: https://lore.kernel.org/r/20231121211259.3348630-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e50a8061
    • Jacob Keller's avatar
      ice: restore timestamp configuration after device reset · 77580179
      Jacob Keller authored
      The driver calls ice_ptp_cfg_timestamp() during ice_ptp_prepare_for_reset()
      to disable timestamping while the device is resetting. This operation
      destroys the user requested configuration. While the driver does call
      ice_ptp_cfg_timestamp in ice_rebuild() to restore some hardware settings
      after a reset, it unconditionally passes true or false, resulting in
      failure to restore previous user space configuration.
      
      This results in a device reset forcibly disabling timestamp configuration
      regardless of current user settings.
      
      This was not detected previously due to a quirk of the LinuxPTP ptp4l
      application. If ptp4l detects a missing timestamp, it enters a fault state
      and performs recovery logic which includes executing SIOCSHWTSTAMP again,
      restoring the now accidentally cleared configuration.
      
      Not every application does this, and for these applications, timestamps
      will mysteriously stop after a PF reset, without being restored until an
      application restart.
      
      Fix this by replacing ice_ptp_cfg_timestamp() with two new functions:
      
      1) ice_ptp_disable_timestamp_mode() which unconditionally disables the
         timestamping logic in ice_ptp_prepare_for_reset() and ice_ptp_release()
      
      2) ice_ptp_restore_timestamp_mode() which calls
         ice_ptp_restore_tx_interrupt() to restore Tx timestamping configuration,
         calls ice_set_rx_tstamp() to restore Rx timestamping configuration, and
         issues an immediate TSYN_TX interrupt to ensure that timestamps which
         may have occurred during the device reset get processed.
      
      Modify the ice_ptp_set_timestamp_mode to directly save the user
      configuration and then call ice_ptp_restore_timestamp_mode. This way, reset
      no longer destroys the saved user configuration.
      
      This obsoletes the ice_set_tx_tstamp() function which can now be safely
      removed.
      
      With this change, all devices should now restore Tx and Rx timestamping
      functionality correctly after a PF reset without application intervention.
      
      Fixes: 77a78115 ("ice: enable receive hardware timestamping")
      Fixes: ea9b847c ("ice: enable transmit timestamps for E810 devices")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      77580179
    • Jacob Keller's avatar
      ice: unify logic for programming PFINT_TSYN_MSK · 7d606a1e
      Jacob Keller authored
      Commit d938a8cc ("ice: Auxbus devices & driver for E822 TS") modified
      how Tx timestamps are handled for E822 devices. On these devices, only the
      clock owner handles reading the Tx timestamp data from firmware. To do
      this, the PFINT_TSYN_MSK register is modified from the default value to one
      which enables reacting to a Tx timestamp on all PHY ports.
      
      The driver currently programs PFINT_TSYN_MSK in different places depending
      on whether the port is the clock owner or not. For the clock owner, the
      PFINT_TSYN_MSK value is programmed during ice_ptp_init_owner just before
      calling ice_ptp_tx_ena_intr to program the PHY ports.
      
      For the non-clock owner ports, the PFINT_TSYN_MSK is programmed during
      ice_ptp_init_port.
      
      If a large enough device reset occurs, the PFINT_TSYN_MSK register will be
      reset to the default value in which only the PHY associated directly with
      the PF will cause the Tx timestamp interrupt to trigger.
      
      The driver lacks logic to reprogram the PFINT_TSYN_MSK register after a
      device reset. For the E822 device, this results in the PF no longer
      responding to interrupts for other ports. This results in failure to
      deliver Tx timestamps to user space applications.
      
      Rename ice_ptp_configure_tx_tstamp to ice_ptp_cfg_tx_interrupt, and unify
      the logic for programming PFINT_TSYN_MSK and PFINT_OICR_ENA into one place.
      This function will program both registers according to the combination of
      user configuration and device requirements.
      
      This ensures that PFINT_TSYN_MSK is always restored when we configure the
      Tx timestamp interrupt.
      
      Fixes: d938a8cc ("ice: Auxbus devices & driver for E822 TS")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7d606a1e