1. 05 Feb, 2022 3 commits
  2. 04 Feb, 2022 24 commits
    • Jakub Kicinski's avatar
    • David S. Miller's avatar
      Merge branch 'ipa-RX-replenish' · c531adaf
      David S. Miller authored
      Alex Elder says:
      
      ====================
      net: ipa: improve RX buffer replenishing
      
      This series revises the algorithm used for replenishing receive
      buffers on RX endpoints.  Currently there are two atomic variables
      that track how many receive buffers can be sent to the hardware.
      The new algorithm obviates the need for those, by just assuming we
      always want to provide the hardware with buffers until it can hold
      no more.
      
      The first patch eliminates an atomic variable that's not required.
      The next moves some code into the main replenish function's caller,
      making one of the called function's arguments unnecessary.   The
      next six refactor things a bit more, adding a new helper function
      that allows us to eliminate an additional atomic variable.  And the
      final two implement two more minor improvements.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c531adaf
    • Alex Elder's avatar
      net: ipa: determine replenish doorbell differently · 9654d8c4
      Alex Elder authored
      Rather than tracking the number of receive buffer transactions that
      have been submitted without a doorbell, just track the total number
      of transactions that have been issued.  Then ring the doorbell when
      that number modulo the replenish batch size is 0.
      
      The effect is roughly the same, but the new count is slightly more
      interesting, and this approach will someday allow the replenish
      batch size to be tuned at runtime.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9654d8c4
    • Alex Elder's avatar
      net: ipa: replenish after delivering payload · 5d6ac24f
      Alex Elder authored
      Replenishing is now solely driven by whether transactions are
      available for a channel, and it doesn't really matter whether
      we replenish before or after we deliver received packets to the
      network stack.
      
      Replenishing before delivering the payload adds a little latency.
      Eliminate that by requesting a replenish after the payload is
      delivered.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d6ac24f
    • Alex Elder's avatar
      net: ipa: kill replenish_backlog · 09b337de
      Alex Elder authored
      We no longer use the replenish_backlog atomic variable to decide
      when we've got work to do providing receive buffers to hardware.
      Basically, we try to keep the hardware as full as possible, all the
      time.  We keep supplying buffers until the hardware has no more
      space for them.
      
      As a result, we can get rid of the replenish_backlog field and the
      atomic operations performed on it.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09b337de
    • Alex Elder's avatar
      net: ipa: introduce gsi_channel_trans_idle() · 5fc7f9ba
      Alex Elder authored
      Create a new function that returns true if all transactions for a
      channel are available for use.
      
      Use it in ipa_endpoint_replenish_enable() to see whether to start
      replenishing, and in ipa_endpoint_replenish() to determine whether
      it's necessary after a failure to schedule delayed work to ensure a
      future replenish attempt occurs.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fc7f9ba
    • Alex Elder's avatar
      net: ipa: don't use replenish_backlog · d0ac30e7
      Alex Elder authored
      Rather than determining when to stop replenishing using the
      replenish backlog, just stop when we have exhausted all available
      transactions.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0ac30e7
    • Alex Elder's avatar
      net: ipa: allocate transaction in replenish loop · 6a606b90
      Alex Elder authored
      When replenishing, have ipa_endpoint_replenish() allocate a
      transaction, and pass that to ipa_endpoint_replenish_one() to fill.
      Then, if that produces no error, commit the transaction within the
      replenish loop as well.  In this way we can distinguish between
      transaction failures and buffer allocation/mapping failures.
      
      Failure to allocate a transaction simply means the hardware already
      has as many receive buffers as it can hold.  In that case we can
      break out of the replenish loop because there's nothing more to do.
      
      If we fail to allocate or map pages for the receive buffer, just
      try again later.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a606b90
    • Alex Elder's avatar
      net: ipa: decide on doorbell in replenish loop · b9dbabc5
      Alex Elder authored
      Decide whether the doorbell should be signaled when committing a
      replenish transaction in the main replenish loop, rather than in
      ipa_endpoint_replenish_one().  This is a step to facilitate the
      next patch.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9dbabc5
    • Alex Elder's avatar
      net: ipa: increment backlog in replenish caller · 4b22d841
      Alex Elder authored
      Three spots call ipa_endpoint_replenish(), and just one of those
      requests that the backlog be incremented after completing the
      replenish operation.
      
      Instead, have the caller increment the backlog, and get rid of the
      add_one argument to ipa_endpoint_replenish().
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b22d841
    • Alex Elder's avatar
      net: ipa: allocate transaction before pages when replenishing · b4061c13
      Alex Elder authored
      A transaction failure only occurs if no more transactions are
      available for an endpoint.  It's a very cheap test.
      
      When replenishing an RX endpoint buffer, there's no point in
      allocating pages if transactions are exhausted.  So don't bother
      doing so unless the transaction allocation succeeds.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4061c13
    • Alex Elder's avatar
      net: ipa: kill replenish_saved · a9bec7ae
      Alex Elder authored
      The replenish_saved field keeps track of the number of times a new
      buffer is added to the backlog when replenishing is disabled.  We
      don't really use it though, so there's no need for us to track it
      separately.  Whether replenishing is enabled or not, we can simply
      increment the backlog.
      
      Get rid of replenish_saved, and initialize and increment the backlog
      where it would have otherwise been used.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9bec7ae
    • Jakub Kicinski's avatar
      tls: cap the output scatter list to something reasonable · b93235e6
      Jakub Kicinski authored
      TLS recvmsg() passes user pages as destination for decrypt.
      The decrypt operation is repeated record by record, each
      record being 16kB, max. TLS allocates an sg_table and uses
      iov_iter_get_pages() to populate it with enough pages to
      fit the decrypted record.
      
      Even though we decrypt a single message at a time we size
      the sg_table based on the entire length of the iovec.
      This leads to unnecessarily large allocations, risking
      triggering OOM conditions.
      
      Use iov_iter_truncate() / iov_iter_reexpand() to construct
      a "capped" version of iov_iter_npages(). Alternatively we
      could parametrize iov_iter_npages() to take the size as
      arg instead of using i->count, or do something else..
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b93235e6
    • Russell King (Oracle)'s avatar
      net: dsa: realtek: convert to phylink_generic_validate() · 6ff60646
      Russell King (Oracle) authored
      Populate the supported interfaces and MAC capabilities for the Realtek
      rtl8365 DSA switch and remove the old validate implementation to allow
      DSA to use phylink_generic_validate() for this switch driver.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ff60646
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · eace555b
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2022-02-03
      
      This series contains updates to the i40e client header file and driver.
      
      Mateusz disables HW TC offload by default.
      
      Joe Damato removes a no longer used statistic.
      
      Jakub Kicinski removes an unused enum from the client header file.
      
      Jedrzej changes some admin queue commands to occur under atomic context
      and adds new functions for admin queue MAC VLAN filters to avoid a
      potential race that could occur due storing results in a structure that
      could be overwritten by the next admin queue call.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eace555b
    • Horatiu Vultur's avatar
      net: lan966x: use .mac_select_pcs() interface · 41414c9b
      Horatiu Vultur authored
      Convert lan966x to use the mac_select_interface instead of
      phylink_set_pcs.
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://lore.kernel.org/r/20220202114949.833075-1-horatiu.vultur@microchip.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      41414c9b
    • Guillaume Nault's avatar
      selftests: rtnetlink: Use more sensible tos values · 95eb6ef8
      Guillaume Nault authored
      Using tos 0x1 with 'ip route get <IPv4 address> ...' doesn't test much
      of the tos option handling: 0x1 just sets an ECN bit, which is cleared
      by inet_rtm_getroute() before doing the fib lookup. Let's use 0x10
      instead, which is actually taken into account in the route lookup (and
      is less surprising for the reader).
      
      For consistency, use 0x10 for the IPv6 route lookup too (IPv6 currently
      doesn't clear ECN bits, but might do so in the future).
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Link: https://lore.kernel.org/r/d61119e68d01ba7ef3ba50c1345a5123a11de123.1643815297.git.gnault@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      95eb6ef8
    • Guillaume Nault's avatar
      selftests: fib offload: use sensible tos values · bafe517a
      Guillaume Nault authored
      Although both iproute2 and the kernel accept 1 and 2 as tos values for
      new routes, those are invalid. These values only set ECN bits, which
      are ignored during IPv4 fib lookups. Therefore, no packet can actually
      match such routes. This selftest therefore only succeeds because it
      doesn't verify that the new routes do actually work in practice (it
      just checks if the routes are offloaded or not).
      
      It makes more sense to use tos values that don't conflict with ECN.
      This way, the selftest won't be affected if we later decide to warn or
      even reject invalid tos configurations for new routes.
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/5e43b343720360a1c0e4f5947d9e917b26f30fbf.1643826556.git.gnault@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bafe517a
    • Eric Dumazet's avatar
      net: minor __dev_alloc_name() optimization · 25ee1660
      Eric Dumazet authored
      __dev_alloc_name() allocates a private zeroed page,
      then sets bits in it while iterating through net devices.
      
      It can use __set_bit() to avoid unnecessary locked operations.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20220203064609.3242863-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      25ee1660
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · c59400a6
      Jakub Kicinski authored
      No conflicts.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c59400a6
    • Kees Cook's avatar
      gcc-plugins/stackleak: Use noinstr in favor of notrace · dcb85f85
      Kees Cook authored
      While the stackleak plugin was already using notrace, objtool is now a
      bit more picky.  Update the notrace uses to noinstr.  Silences the
      following objtool warnings when building with:
      
      CONFIG_DEBUG_ENTRY=y
      CONFIG_STACK_VALIDATION=y
      CONFIG_VMLINUX_VALIDATION=y
      CONFIG_GCC_PLUGIN_STACKLEAK=y
      
        vmlinux.o: warning: objtool: do_syscall_64()+0x9: call to stackleak_track_stack() leaves .noinstr.text section
        vmlinux.o: warning: objtool: do_int80_syscall_32()+0x9: call to stackleak_track_stack() leaves .noinstr.text section
        vmlinux.o: warning: objtool: exc_general_protection()+0x22: call to stackleak_track_stack() leaves .noinstr.text section
        vmlinux.o: warning: objtool: fixup_bad_iret()+0x20: call to stackleak_track_stack() leaves .noinstr.text section
        vmlinux.o: warning: objtool: do_machine_check()+0x27: call to stackleak_track_stack() leaves .noinstr.text section
        vmlinux.o: warning: objtool: .text+0x5346e: call to stackleak_erase() leaves .noinstr.text section
        vmlinux.o: warning: objtool: .entry.text+0x143: call to stackleak_erase() leaves .noinstr.text section
        vmlinux.o: warning: objtool: .entry.text+0x10eb: call to stackleak_erase() leaves .noinstr.text section
        vmlinux.o: warning: objtool: .entry.text+0x17f9: call to stackleak_erase() leaves .noinstr.text section
      
      Note that the plugin's addition of calls to stackleak_track_stack() from
      noinstr functions is expected to be safe, as it isn't runtime
      instrumentation and is self-contained.
      
      Cc: Alexander Popov <alex.popov@linux.com>
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dcb85f85
    • Linus Torvalds's avatar
      Merge tag 'net-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · eb2eb516
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bpf, netfilter, and ieee802154.
      
        Current release - regressions:
      
         - Partially revert "net/smc: Add netlink net namespace support", fix
           uABI breakage
      
         - netfilter:
            - nft_ct: fix use after free when attaching zone template
            - nft_byteorder: track register operations
      
        Previous releases - regressions:
      
         - ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback
      
         - phy: qca8081: fix speeds lower than 2.5Gb/s
      
         - sched: fix use-after-free in tc_new_tfilter()
      
        Previous releases - always broken:
      
         - tcp: fix mem under-charging with zerocopy sendmsg()
      
         - tcp: add missing tcp_skb_can_collapse() test in
           tcp_shift_skb_data()
      
         - neigh: do not trigger immediate probes on NUD_FAILED from
           neigh_managed_work, avoid a deadlock
      
         - bpf: use VM_MAP instead of VM_ALLOC for ringbuf, avoid KASAN
           false-positives
      
         - netfilter: nft_reject_bridge: fix for missing reply from prerouting
      
         - smc: forward wakeup to smc socket waitqueue after fallback
      
         - ieee802154:
            - return meaningful error codes from the netlink helpers
            - mcr20a: fix lifs/sifs periods
            - at86rf230, ca8210: stop leaking skbs on error paths
      
         - macsec: add missing un-offload call for NETDEV_UNREGISTER of parent
      
         - ax25: add refcount in ax25_dev to avoid UAF bugs
      
         - eth: mlx5e:
            - fix SFP module EEPROM query
            - fix broken SKB allocation in HW-GRO
            - IPsec offload: fix tunnel mode crypto for non-TCP/UDP flows
      
         - eth: amd-xgbe:
            - fix skb data length underflow
            - ensure reset of the tx_timer_active flag, avoid Tx timeouts
      
         - eth: stmmac: fix runtime pm use in stmmac_dvr_remove()
      
         - eth: e1000e: handshake with CSME starts from Alder Lake platforms"
      
      * tag 'net-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
        ax25: fix reference count leaks of ax25_dev
        net: stmmac: ensure PTP time register reads are consistent
        net: ipa: request IPA register values be retained
        dt-bindings: net: qcom,ipa: add optional qcom,qmp property
        tools/resolve_btfids: Do not print any commands when building silently
        bpf: Use VM_MAP instead of VM_ALLOC for ringbuf
        net, neigh: Do not trigger immediate probes on NUD_FAILED from neigh_managed_work
        tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()
        net: sparx5: do not refer to skb after passing it on
        Partially revert "net/smc: Add netlink net namespace support"
        net/mlx5e: Avoid field-overflowing memcpy()
        net/mlx5e: Use struct_group() for memcpy() region
        net/mlx5e: Avoid implicit modify hdr for decap drop rule
        net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP traffic
        net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic
        net/mlx5e: Don't treat small ceil values as unlimited in HTB offload
        net/mlx5: E-Switch, Fix uninitialized variable modact
        net/mlx5e: Fix handling of wrong devices during bond netevent
        net/mlx5e: Fix broken SKB allocation in HW-GRO
        net/mlx5e: Fix wrong calculation of header index in HW_GRO
        ...
      eb2eb516
    • Linus Torvalds's avatar
      Merge tag 'selinux-pr-20220203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · 551007a8
      Linus Torvalds authored
      Pull selinux fix from Paul Moore:
       "One small SELinux patch to ensure that a policy structure field is
        properly reset after freeing so that we don't inadvertently do a
        double-free on certain error conditions"
      
      * tag 'selinux-pr-20220203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
        selinux: fix double free of cond_list on error paths
      551007a8
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-fixes-5.17-rc3' of... · 25b20ae8
      Linus Torvalds authored
      Merge tag 'linux-kselftest-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull Kselftest fixes from Shuah Khan:
       "Important fixes to several tests and documentation clarification on
        running mainline kselftest on stable releases. A few notable fixes:
      
         - fix kselftest run hang due to child processes that haven't been
           terminated. Fix signals all child processes
      
         - fix false pass/fail results from vdso_test_abi, openat2, mincore
      
         - build failures when using -j (multiple jobs) option
      
         - exec test build failure due to incorrect build rule for a run-time
           created "pipe"
      
         - zram test fixes related to interaction with zram-generator to make
           sure zram test to coordinate deleted with zram-generator
      
         - zram test compression ratio calculation fix and skipping
           max_comp_streams.
      
         - increasing rtc test timeout
      
         - cpufreq test to write test results to stdout which will necessary
           on automated test systems"
      
      * tag 'linux-kselftest-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        kselftest: Fix vdso_test_abi return status
        selftests: skip mincore.check_file_mmap when fs lacks needed support
        selftests: openat2: Skip testcases that fail with EOPNOTSUPP
        selftests: openat2: Add missing dependency in Makefile
        selftests: openat2: Print also errno in failure messages
        selftests: futex: Use variable MAKE instead of make
        selftests/exec: Remove pipe from TEST_GEN_FILES
        selftests/zram: Adapt the situation that /dev/zram0 is being used
        selftests/zram01.sh: Fix compression ratio calculation
        selftests/zram: Skip max_comp_streams interface on newer kernel
        docs/kselftest: clarify running mainline tests on stables
        kselftest: signal all child processes
        selftests: cpufreq: Write test output to stdout as well
        selftests: rtc: Increase test timeout so that all tests run
      25b20ae8
  3. 03 Feb, 2022 13 commits