1. 08 Apr, 2022 1 commit
    • Linus Torvalds's avatar
      Merge tag 'net-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 73b193f2
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from bpf and netfilter.
      
        Current release - new code bugs:
      
         - mctp: correct mctp_i2c_header_create result
      
         - eth: fungible: fix reference to __udivdi3 on 32b builds
      
         - eth: micrel: remove latencies support lan8814
      
        Previous releases - regressions:
      
         - bpf: resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT
      
         - vrf: fix packet sniffing for traffic originating from ip tunnels
      
         - rxrpc: fix a race in rxrpc_exit_net()
      
         - dsa: revert "net: dsa: stop updating master MTU from master.c"
      
         - eth: ice: fix MAC address setting
      
        Previous releases - always broken:
      
         - tls: fix slab-out-of-bounds bug in decrypt_internal
      
         - bpf: support dual-stack sockets in bpf_tcp_check_syncookie
      
         - xdp: fix coalescing for page_pool fragment recycling
      
         - ovs: fix leak of nested actions
      
         - eth: sfc:
            - add missing xdp queue reinitialization
            - fix using uninitialized xdp tx_queue
      
         - eth: ice:
            - clear default forwarding VSI during VSI release
            - fix broken IFF_ALLMULTI handling
            - synchronize_rcu() when terminating rings
      
         - eth: qede: confirm skb is allocated before using
      
         - eth: aqc111: fix out-of-bounds accesses in RX fixup
      
         - eth: slip: fix NPD bug in sl_tx_timeout()"
      
      * tag 'net-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
        drivers: net: slip: fix NPD bug in sl_tx_timeout()
        bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets
        bpf: Support dual-stack sockets in bpf_tcp_check_syncookie
        myri10ge: fix an incorrect free for skb in myri10ge_sw_tso
        net: usb: aqc111: Fix out-of-bounds accesses in RX fixup
        qede: confirm skb is allocated before using
        net: ipv6mr: fix unused variable warning with CONFIG_IPV6_PIMSM_V2=n
        net: phy: mscc-miim: reject clause 45 register accesses
        net: axiemac: use a phandle to reference pcs_phy
        dt-bindings: net: add pcs-handle attribute
        net: axienet: factor out phy_node in struct axienet_local
        net: axienet: setup mdio unconditionally
        net: sfc: fix using uninitialized xdp tx_queue
        rxrpc: fix a race in rxrpc_exit_net()
        net: openvswitch: fix leak of nested actions
        net: ethernet: mv643xx: Fix over zealous checking of_get_mac_address()
        net: openvswitch: don't send internal clone attribute to the userspace.
        net: micrel: Fix KS8851 Kconfig
        ice: clear cmd_type_offset_bsz for TX rings
        ice: xsk: fix VSI state check in ice_xsk_wakeup()
        ...
      73b193f2
  2. 07 Apr, 2022 5 commits
    • Linus Torvalds's avatar
      Merge tag 'hyperv-fixes-signed-20220407' of... · 42e7a03d
      Linus Torvalds authored
      Merge tag 'hyperv-fixes-signed-20220407' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
      
      Pull hyperv fixes from Wei Liu:
      
       - Correctly propagate coherence information for VMbus devices (Michael
         Kelley)
      
       - Disable balloon and memory hot-add on ARM64 temporarily (Boqun Feng)
      
       - Use barrier to prevent reording when reading ring buffer (Michael
         Kelley)
      
       - Use virt_store_mb in favour of smp_store_mb (Andrea Parri)
      
       - Fix VMbus device object initialization (Andrea Parri)
      
       - Deactivate sysctl_record_panic_msg on isolated guest (Andrea Parri)
      
       - Fix a crash when unloading VMbus module (Guilherme G. Piccoli)
      
      * tag 'hyperv-fixes-signed-20220407' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
        Drivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb()
        Drivers: hv: balloon: Disable balloon and hot-add accordingly
        Drivers: hv: balloon: Support status report for larger page sizes
        Drivers: hv: vmbus: Prevent load re-ordering when reading ring buffer
        PCI: hv: Propagate coherence from VMbus device to PCI device
        Drivers: hv: vmbus: Propagate VMbus coherence to each VMbus device
        Drivers: hv: vmbus: Fix potential crash on module unload
        Drivers: hv: vmbus: Fix initialization of device object in vmbus_device_register()
        Drivers: hv: vmbus: Deactivate sysctl_record_panic_msg by default in isolated guests
      42e7a03d
    • Linus Torvalds's avatar
      Merge tag 'random-5.18-rc2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random · 3638bd90
      Linus Torvalds authored
      Pull random number generator fixes from Jason Donenfeld:
      
       - Another fixup to the fast_init/crng_init split, this time in how much
         entropy is being credited, from Jan Varho.
      
       - As discussed, we now opportunistically call try_to_generate_entropy()
         in /dev/urandom reads, as a replacement for the reverted commit. I
         opted to not do the more invasive wait_for_random_bytes() change at
         least for now, preferring to do something smaller and more obvious
         for the time being, but maybe that can be revisited as things evolve
         later.
      
       - Userspace can use FUSE or userfaultfd or simply move a process to
         idle priority in order to make a read from the random device never
         complete, which breaks forward secrecy, fixed by overwriting
         sensitive bytes early on in the function.
      
       - Jann Horn noticed that /dev/urandom reads were only checking for
         pending signals if need_resched() was true, a bug going back to the
         genesis commit, now fixed by always checking for signal_pending() and
         calling cond_resched(). This explains various noticeable signal
         delivery delays I've seen in programs over the years that do long
         reads from /dev/urandom.
      
       - In order to be more like other devices (e.g. /dev/zero) and to
         mitigate the impact of fixing the above bug, which has been around
         forever (users have never really needed to check the return value of
         read() for medium-sized reads and so perhaps many didn't), we now
         move signal checking to the bottom part of the loop, and do so every
         PAGE_SIZE-bytes.
      
      * tag 'random-5.18-rc2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
        random: check for signals every PAGE_SIZE chunk of /dev/[u]random
        random: check for signal_pending() outside of need_resched() check
        random: do not allow user to keep crng key around on stack
        random: opportunistically initialize on /dev/urandom reads
        random: do not split fast init input in add_hwgenerator_randomness()
      3638bd90
    • Linus Torvalds's avatar
      Merge tag 'ata-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · 640b5037
      Linus Torvalds authored
      Pull ata fixes from Damien Le Moal:
      
       - Fix a compilation warning due to an uninitialized variable in
         ata_sff_lost_interrupt(), from me.
      
       - Fix invalid internal command tag handling in the sata_dwc_460ex
         driver, from Christian.
      
       - Disable READ LOG DMA EXT with Samsung 840 EVO SSDs as this command
         causes the drives to hang, from Christian.
      
       - Change the config option CONFIG_SATA_LPM_POLICY back to its original
         name CONFIG_SATA_LPM_MOBILE_POLICY to avoid potential problems with
         users losing their configuration (as discussed during the merge
         window), from Mario.
      
      * tag 'ata-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
        ata: ahci: Rename CONFIG_SATA_LPM_POLICY configuration item back
        ata: libata-core: Disable READ LOG DMA EXT for Samsung 840 EVOs
        ata: sata_dwc_460ex: Fix crash due to OOB write
        ata: libata-sff: Fix compilation warning in ata_sff_lost_interrupt()
      640b5037
    • Duoming Zhou's avatar
      drivers: net: slip: fix NPD bug in sl_tx_timeout() · ec4eb8a8
      Duoming Zhou authored
      When a slip driver is detaching, the slip_close() will act to
      cleanup necessary resources and sl->tty is set to NULL in
      slip_close(). Meanwhile, the packet we transmit is blocked,
      sl_tx_timeout() will be called. Although slip_close() and
      sl_tx_timeout() use sl->lock to synchronize, we don`t judge
      whether sl->tty equals to NULL in sl_tx_timeout() and the
      null pointer dereference bug will happen.
      
         (Thread 1)                 |      (Thread 2)
                                    | slip_close()
                                    |   spin_lock_bh(&sl->lock)
                                    |   ...
      ...                           |   sl->tty = NULL //(1)
      sl_tx_timeout()               |   spin_unlock_bh(&sl->lock)
        spin_lock(&sl->lock);       |
        ...                         |   ...
        tty_chars_in_buffer(sl->tty)|
          if (tty->ops->..) //(2)   |
          ...                       |   synchronize_rcu()
      
      We set NULL to sl->tty in position (1) and dereference sl->tty
      in position (2).
      
      This patch adds check in sl_tx_timeout(). If sl->tty equals to
      NULL, sl_tx_timeout() will goto out.
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Reviewed-by: default avatarJiri Slaby <jirislaby@kernel.org>
      Link: https://lore.kernel.org/r/20220405132206.55291-1-duoming@zju.edu.cnSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ec4eb8a8
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 8e9d0d7a
      Jakub Kicinski authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2022-04-06
      
      We've added 8 non-merge commits during the last 8 day(s) which contain
      a total of 9 files changed, 139 insertions(+), 36 deletions(-).
      
      The main changes are:
      
      1) rethook related fixes, from Jiri and Masami.
      
      2) Fix the case when tracing bpf prog is attached to struct_ops, from Martin.
      
      3) Support dual-stack sockets in bpf_tcp_check_syncookie, from Maxim.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets
        bpf: Support dual-stack sockets in bpf_tcp_check_syncookie
        bpf: selftests: Test fentry tracing a struct_ops program
        bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT
        rethook: Fix to use WRITE_ONCE() for rethook:: Handler
        selftests/bpf: Fix warning comparing pointer to 0
        bpf: Fix sparse warnings in kprobe_multi_resolve_syms
        bpftool: Explicit errno handling in skeletons
      ====================
      
      Link: https://lore.kernel.org/r/20220407031245.73026-1-alexei.starovoitov@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8e9d0d7a
  3. 06 Apr, 2022 26 commits
    • Jason A. Donenfeld's avatar
      random: check for signals every PAGE_SIZE chunk of /dev/[u]random · e3c1c4fd
      Jason A. Donenfeld authored
      In 1448769c ("random: check for signal_pending() outside of
      need_resched() check"), Jann pointed out that we previously were only
      checking the TIF_NOTIFY_SIGNAL and TIF_SIGPENDING flags if the process
      had TIF_NEED_RESCHED set, which meant in practice, super long reads to
      /dev/[u]random would delay signal handling by a long time. I tried this
      using the below program, and indeed I wasn't able to interrupt a
      /dev/urandom read until after several megabytes had been read. The bug
      he fixed has always been there, and so code that reads from /dev/urandom
      without checking the return value of read() has mostly worked for a long
      time, for most sizes, not just for <= 256.
      
      Maybe it makes sense to keep that code working. The reason it was so
      small prior, ignoring the fact that it didn't work anyway, was likely
      because /dev/random used to block, and that could happen for pretty
      large lengths of time while entropy was gathered. But now, it's just a
      chacha20 call, which is extremely fast and is just operating on pure
      data, without having to wait for some external event. In that sense,
      /dev/[u]random is a lot more like /dev/zero.
      
      Taking a page out of /dev/zero's read_zero() function, it always returns
      at least one chunk, and then checks for signals after each chunk. Chunk
      sizes there are of length PAGE_SIZE. Let's just copy the same thing for
      /dev/[u]random, and check for signals and cond_resched() for every
      PAGE_SIZE amount of data. This makes the behavior more consistent with
      expectations, and should mitigate the impact of Jann's fix for the
      age-old signal check bug.
      
      ---- test program ----
      
        #include <unistd.h>
        #include <signal.h>
        #include <stdio.h>
        #include <sys/random.h>
      
        static unsigned char x[~0U];
      
        static void handle(int) { }
      
        int main(int argc, char *argv[])
        {
          pid_t pid = getpid(), child;
          signal(SIGUSR1, handle);
          if (!(child = fork())) {
            for (;;)
              kill(pid, SIGUSR1);
          }
          pause();
          printf("interrupted after reading %zd bytes\n", getrandom(x, sizeof(x), 0));
          kill(child, SIGTERM);
          return 0;
        }
      
      Cc: Jann Horn <jannh@google.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      e3c1c4fd
    • Maxim Mikityanskiy's avatar
      bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets · 53968daf
      Maxim Mikityanskiy authored
      The previous commit fixed support for dual-stack sockets in
      bpf_tcp_check_syncookie. This commit adjusts the selftest to verify the
      fixed functionality.
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarArthur Fabre <afabre@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20220406124113.2795730-2-maximmi@nvidia.com
      53968daf
    • Maxim Mikityanskiy's avatar
      bpf: Support dual-stack sockets in bpf_tcp_check_syncookie · 2e8702cc
      Maxim Mikityanskiy authored
      bpf_tcp_gen_syncookie looks at the IP version in the IP header and
      validates the address family of the socket. It supports IPv4 packets in
      AF_INET6 dual-stack sockets.
      
      On the other hand, bpf_tcp_check_syncookie looks only at the address
      family of the socket, ignoring the real IP version in headers, and
      validates only the packet size. This implementation has some drawbacks:
      
      1. Packets are not validated properly, allowing a BPF program to trick
         bpf_tcp_check_syncookie into handling an IPv6 packet on an IPv4
         socket.
      
      2. Dual-stack sockets fail the checks on IPv4 packets. IPv4 clients end
         up receiving a SYNACK with the cookie, but the following ACK gets
         dropped.
      
      This patch fixes these issues by changing the checks in
      bpf_tcp_check_syncookie to match the ones in bpf_tcp_gen_syncookie. IP
      version from the header is taken into account, and it is validated
      properly with address family.
      
      Fixes: 39904084 ("bpf: add helper to check for a valid SYN cookie")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Acked-by: default avatarArthur Fabre <afabre@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20220406124113.2795730-1-maximmi@nvidia.com
      2e8702cc
    • Xiaomeng Tong's avatar
      myri10ge: fix an incorrect free for skb in myri10ge_sw_tso · b423e54b
      Xiaomeng Tong authored
      All remaining skbs should be released when myri10ge_xmit fails to
      transmit a packet. Fix it within another skb_list_walk_safe.
      Signed-off-by: default avatarXiaomeng Tong <xiam0nd.tong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b423e54b
    • Marcin Kozlowski's avatar
      net: usb: aqc111: Fix out-of-bounds accesses in RX fixup · afb8e246
      Marcin Kozlowski authored
      aqc111_rx_fixup() contains several out-of-bounds accesses that can be
      triggered by a malicious (or defective) USB device, in particular:
      
       - The metadata array (desc_offset..desc_offset+2*pkt_count) can be out of bounds,
         causing OOB reads and (on big-endian systems) OOB endianness flips.
       - A packet can overlap the metadata array, causing a later OOB
         endianness flip to corrupt data used by a cloned SKB that has already
         been handed off into the network stack.
       - A packet SKB can be constructed whose tail is far beyond its end,
         causing out-of-bounds heap data to be considered part of the SKB's
         data.
      
      Found doing variant analysis. Tested it with another driver (ax88179_178a), since
      I don't have a aqc111 device to test it, but the code looks very similar.
      Signed-off-by: default avatarMarcin Kozlowski <marcinguy@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      afb8e246
    • Jamie Bainbridge's avatar
      qede: confirm skb is allocated before using · 4e910dbe
      Jamie Bainbridge authored
      qede_build_skb() assumes build_skb() always works and goes straight
      to skb_reserve(). However, build_skb() can fail under memory pressure.
      This results in a kernel panic because the skb to reserve is NULL.
      
      Add a check in case build_skb() failed to allocate and return NULL.
      
      The NULL return is handled correctly in callers to qede_build_skb().
      
      Fixes: 8a863397 ("qede: Add build_skb() support.")
      Signed-off-by: default avatarJamie Bainbridge <jamie.bainbridge@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e910dbe
    • Florian Westphal's avatar
      net: ipv6mr: fix unused variable warning with CONFIG_IPV6_PIMSM_V2=n · a3ebe92a
      Florian Westphal authored
      net/ipv6/ip6mr.c:1656:14: warning: unused variable 'do_wrmifwhole'
      
      Move it to the CONFIG_IPV6_PIMSM_V2 scope where its used.
      
      Fixes: 4b340a5a ("net: ip6mr: add support for passing full packet on wrong mif")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3ebe92a
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 74edbe9e
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2022-04-05
      
      Maciej Fijalkowski says:
      
      We were solving issues around AF_XDP busy poll's not-so-usual scenarios,
      such as very big busy poll budgets applied to very small HW rings. This
      set carries the things that were found during that work that apply to
      net tree.
      
      One thing that was fixed for all in-tree ZC drivers was missing on ice
      side all the time - it's about syncing RCU before destroying XDP
      resources. Next one fixes the bit that is checked in ice_xsk_wakeup and
      third one avoids false setting of DD bits on Tx descriptors.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74edbe9e
    • Andrea Parri (Microsoft)'s avatar
      Drivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb() · eaa03d34
      Andrea Parri (Microsoft) authored
      Following the recommendation in Documentation/memory-barriers.txt for
      virtual machine guests.
      
      Fixes: 8b6a877c ("Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels")
      Signed-off-by: default avatarAndrea Parri (Microsoft) <parri.andrea@gmail.com>
      Link: https://lore.kernel.org/r/20220328154457.100872-1-parri.andrea@gmail.comSigned-off-by: default avatarWei Liu <wei.liu@kernel.org>
      eaa03d34
    • Boqun Feng's avatar
      Drivers: hv: balloon: Disable balloon and hot-add accordingly · be580279
      Boqun Feng authored
      Currently there are known potential issues for balloon and hot-add on
      ARM64:
      
      *	Unballoon requests from Hyper-V should only unballoon ranges
      	that are guest page size aligned, otherwise guests cannot handle
      	because it's impossible to partially free a page. This is a
      	problem when guest page size > 4096 bytes.
      
      *	Memory hot-add requests from Hyper-V should provide the NUMA
      	node id of the added ranges or ARM64 should have a functional
      	memory_add_physaddr_to_nid(), otherwise the node id is missing
      	for add_memory().
      
      These issues require discussions on design and implementation. In the
      meanwhile, post_status() is working and essential to guest monitoring.
      Therefore instead of disabling the entire hv_balloon driver, the
      ballooning (when page size > 4096 bytes) and hot-add are disabled
      accordingly for now. Once the issues are fixed, they can be re-enable in
      these cases.
      Signed-off-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Link: https://lore.kernel.org/r/20220325023212.1570049-3-boqun.feng@gmail.comSigned-off-by: default avatarWei Liu <wei.liu@kernel.org>
      be580279
    • Boqun Feng's avatar
      Drivers: hv: balloon: Support status report for larger page sizes · b3d6dd09
      Boqun Feng authored
      DM_STATUS_REPORT expects the numbers of pages in the unit of 4k pages
      (HV_HYP_PAGE) instead of guest pages, so to make it work when guest page
      sizes are larger than 4k, convert the numbers of guest pages into the
      numbers of HV_HYP_PAGEs.
      
      Note that the numbers of guest pages are still used for tracing because
      tracing is internal to the guest kernel.
      Reported-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Link: https://lore.kernel.org/r/20220325023212.1570049-2-boqun.feng@gmail.comSigned-off-by: default avatarWei Liu <wei.liu@kernel.org>
      b3d6dd09
    • Jann Horn's avatar
      random: check for signal_pending() outside of need_resched() check · 1448769c
      Jann Horn authored
      signal_pending() checks TIF_NOTIFY_SIGNAL and TIF_SIGPENDING, which
      signal that the task should bail out of the syscall when possible. This
      is a separate concept from need_resched(), which checks
      TIF_NEED_RESCHED, signaling that the task should preempt.
      
      In particular, with the current code, the signal_pending() bailout
      probably won't work reliably.
      
      Change this to look like other functions that read lots of data, such as
      read_zero().
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      1448769c
    • Jason A. Donenfeld's avatar
      random: do not allow user to keep crng key around on stack · aba120cc
      Jason A. Donenfeld authored
      The fast key erasure RNG design relies on the key that's used to be used
      and then discarded. We do this, making judicious use of
      memzero_explicit().  However, reads to /dev/urandom and calls to
      getrandom() involve a copy_to_user(), and userspace can use FUSE or
      userfaultfd, or make a massive call, dynamically remap memory addresses
      as it goes, and set the process priority to idle, in order to keep a
      kernel stack alive indefinitely. By probing
      /proc/sys/kernel/random/entropy_avail to learn when the crng key is
      refreshed, a malicious userspace could mount this attack every 5 minutes
      thereafter, breaking the crng's forward secrecy.
      
      In order to fix this, we just overwrite the stack's key with the first
      32 bytes of the "free" fast key erasure output. If we're returning <= 32
      bytes to the user, then we can still return those bytes directly, so
      that short reads don't become slower. And for long reads, the difference
      is hopefully lost in the amortization, so it doesn't change much, with
      that amortization helping variously for medium reads.
      
      We don't need to do this for get_random_bytes() and the various
      kernel-space callers, and later, if we ever switch to always batching,
      this won't be necessary either, so there's no need to change the API of
      these functions.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJann Horn <jannh@google.com>
      Fixes: c92e040d ("random: add backtracking protection to the CRNG")
      Fixes: 186873c5 ("random: use simpler fast key erasure flow on per-cpu keys")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      aba120cc
    • Michael Walle's avatar
      net: phy: mscc-miim: reject clause 45 register accesses · 8d90991e
      Michael Walle authored
      The driver doesn't support clause 45 register access yet, but doesn't
      check if the access is a c45 one either. This leads to spurious register
      reads and writes. Add the check.
      
      Fixes: 542671fe ("net: phy: mscc-miim: Add MDIO driver")
      Signed-off-by: default avatarMichael Walle <michael@walle.cc>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d90991e
    • David S. Miller's avatar
      Merge branch 'axienet-broken-link' · 9386d181
      David S. Miller authored
      Andy Chiu says:
      
      ====================
      Fix broken link on Xilinx's AXI Ethernet in SGMII mode
      
      The Ethernet driver use phy-handle to reference the PCS/PMA PHY. This
      could be a problem if one wants to configure an external PHY via phylink,
      since it use the same phandle to get the PHY. To fix this, introduce a
      dedicated pcs-handle to point to the PCS/PMA PHY and deprecate the use
      of pointing it with phy-handle. A similar use case of pcs-handle can be
      seen on dpaa2 as well.
      
      --- patch v5 ---
       - Re-apply the v4 patch on the net tree.
       - Describe the pcs-handle DT binding at ethernet-controller level.
      --- patch v6 ---
       - Remove "preferrably" to clearify usage of pcs_handle.
      --- patch v7 ---
       - Rebase the patch on latest net/master
      --- patch v8 ---
       - Rebase the patch on net-next/master
       - Add "reviewed-by" tag in PATCH 3/4: dt-bindings: net: add pcs-handle
         attribute
       - Remove "fix" tag in last commit message since this is not a critical
         bug and will not be back ported to stable.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9386d181
    • Andy Chiu's avatar
      net: axiemac: use a phandle to reference pcs_phy · 19c7a439
      Andy Chiu authored
      In some SGMII use cases where both a fixed link external PHY and the
      internal PCS/PMA PHY need to be configured, we should explicitly use a
      phandle "pcs-phy" to get the reference to the PCS/PMA PHY. Otherwise, the
      driver would use "phy-handle" in the DT as the reference to both the
      external and the internal PCS/PMA PHY.
      
      In other cases where the core is connected to a SFP cage, we could still
      point phy-handle to the intenal PCS/PMA PHY, and let the driver connect
      to the SFP module, if exist, via phylink.
      Signed-off-by: default avatarAndy Chiu <andy.chiu@sifive.com>
      Reviewed-by: default avatarGreentime Hu <greentime.hu@sifive.com>
      Reviewed-by: default avatarRobert Hancock <robert.hancock@calian.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarRadhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19c7a439
    • Andy Chiu's avatar
      dt-bindings: net: add pcs-handle attribute · dc48f04f
      Andy Chiu authored
      Document the new pcs-handle attribute to support connecting to an
      external PHY. For Xilinx's AXI Ethernet, this is used when the core
      operates in SGMII or 1000Base-X modes and links through the internal
      PCS/PMA PHY.
      Signed-off-by: default avatarAndy Chiu <andy.chiu@sifive.com>
      Reviewed-by: default avatarGreentime Hu <greentime.hu@sifive.com>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc48f04f
    • Andy Chiu's avatar
      net: axienet: factor out phy_node in struct axienet_local · ab3a5d4c
      Andy Chiu authored
      the struct member `phy_node` of struct axienet_local is not used by the
      driver anymore after initialization. It might be a remnent of old code
      and could be removed.
      Signed-off-by: default avatarAndy Chiu <andy.chiu@sifive.com>
      Reviewed-by: default avatarGreentime Hu <greentime.hu@sifive.com>
      Reviewed-by: default avatarRobert Hancock <robert.hancock@calian.com>
      Reviewed-by: default avatarRadhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab3a5d4c
    • Andy Chiu's avatar
      net: axienet: setup mdio unconditionally · d1c4f93e
      Andy Chiu authored
      The call to axienet_mdio_setup should not depend on whether "phy-node"
      pressents on the DT. Besides, since `lp->phy_node` is used if PHY is in
      SGMII or 100Base-X modes, move it into the if statement. And the next patch
      will remove `lp->phy_node` from driver's private structure and do an
      of_node_put on it right away after use since it is not used elsewhere.
      Signed-off-by: default avatarAndy Chiu <andy.chiu@sifive.com>
      Reviewed-by: default avatarGreentime Hu <greentime.hu@sifive.com>
      Reviewed-by: default avatarRobert Hancock <robert.hancock@calian.com>
      Reviewed-by: default avatarRadhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1c4f93e
    • Taehee Yoo's avatar
      net: sfc: fix using uninitialized xdp tx_queue · fb5833d8
      Taehee Yoo authored
      In some cases, xdp tx_queue can get used before initialization.
      1. interface up/down
      2. ring buffer size change
      
      When CPU cores are lower than maximum number of channels of sfc driver,
      it creates new channels only for XDP.
      
      When an interface is up or ring buffer size is changed, all channels
      are initialized.
      But xdp channels are always initialized later.
      So, the below scenario is possible.
      Packets are received to rx queue of normal channels and it is acted
      XDP_TX and tx_queue of xdp channels get used.
      But these tx_queues are not initialized yet.
      If so, TX DMA or queue error occurs.
      
      In order to avoid this problem.
      1. initializes xdp tx_queues earlier than other rx_queue in
      efx_start_channels().
      2. checks whether tx_queue is initialized or not in efx_xdp_tx_buffers().
      
      Splat looks like:
         sfc 0000:08:00.1 enp8s0f1np1: TX queue 10 spurious TX completion id 250
         sfc 0000:08:00.1 enp8s0f1np1: resetting (RECOVER_OR_ALL)
         sfc 0000:08:00.1 enp8s0f1np1: MC command 0x80 inlen 100 failed rc=-22
         (raw=22) arg=789
         sfc 0000:08:00.1 enp8s0f1np1: has been disabled
      
      Fixes: f28100cb ("sfc: fix lack of XDP TX queues - error XDP TX failed (-22)")
      Acked-by: default avatarMartin Habets <habetsm.xilinx@gmail.com>
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb5833d8
    • Eric Dumazet's avatar
      rxrpc: fix a race in rxrpc_exit_net() · 1946014c
      Eric Dumazet authored
      Current code can lead to the following race:
      
      CPU0                                                 CPU1
      
      rxrpc_exit_net()
                                                           rxrpc_peer_keepalive_worker()
                                                             if (rxnet->live)
      
        rxnet->live = false;
        del_timer_sync(&rxnet->peer_keepalive_timer);
      
                                                                   timer_reduce(&rxnet->peer_keepalive_timer, jiffies + delay);
      
        cancel_work_sync(&rxnet->peer_keepalive_work);
      
      rxrpc_exit_net() exits while peer_keepalive_timer is still armed,
      leading to use-after-free.
      
      syzbot report was:
      
      ODEBUG: free active (active state 0) object type: timer_list hint: rxrpc_peer_keepalive_timeout+0x0/0xb0
      WARNING: CPU: 0 PID: 3660 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505
      Modules linked in:
      CPU: 0 PID: 3660 Comm: kworker/u4:6 Not tainted 5.17.0-syzkaller-13993-g88e6c020 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: netns cleanup_net
      RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505
      Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd 00 1c 26 8a 4c 89 ee 48 c7 c7 00 10 26 8a e8 b1 e7 28 05 <0f> 0b 83 05 15 eb c5 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3
      RSP: 0018:ffffc9000353fb00 EFLAGS: 00010082
      RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
      RDX: ffff888029196140 RSI: ffffffff815efad8 RDI: fffff520006a7f52
      RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff815ea4ae R11: 0000000000000000 R12: ffffffff89ce23e0
      R13: ffffffff8a2614e0 R14: ffffffff816628c0 R15: dffffc0000000000
      FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fe1f2908924 CR3: 0000000043720000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       __debug_check_no_obj_freed lib/debugobjects.c:992 [inline]
       debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1023
       kfree+0xd6/0x310 mm/slab.c:3809
       ops_free_list.part.0+0x119/0x370 net/core/net_namespace.c:176
       ops_free_list net/core/net_namespace.c:174 [inline]
       cleanup_net+0x591/0xb00 net/core/net_namespace.c:598
       process_one_work+0x996/0x1610 kernel/workqueue.c:2289
       worker_thread+0x665/0x1080 kernel/workqueue.c:2436
       kthread+0x2e9/0x3a0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
       </TASK>
      
      Fixes: ace45bec ("rxrpc: Fix firewall route keepalive")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Marc Dionne <marc.dionne@auristor.com>
      Cc: linux-afs@lists.infradead.org
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1946014c
    • Ilya Maximets's avatar
      net: openvswitch: fix leak of nested actions · 1f30fb91
      Ilya Maximets authored
      While parsing user-provided actions, openvswitch module may dynamically
      allocate memory and store pointers in the internal copy of the actions.
      So this memory has to be freed while destroying the actions.
      
      Currently there are only two such actions: ct() and set().  However,
      there are many actions that can hold nested lists of actions and
      ovs_nla_free_flow_actions() just jumps over them leaking the memory.
      
      For example, removal of the flow with the following actions will lead
      to a leak of the memory allocated by nf_ct_tmpl_alloc():
      
        actions:clone(ct(commit),0)
      
      Non-freed set() action may also leak the 'dst' structure for the
      tunnel info including device references.
      
      Under certain conditions with a high rate of flow rotation that may
      cause significant memory leak problem (2MB per second in reporter's
      case).  The problem is also hard to mitigate, because the user doesn't
      have direct control over the datapath flows generated by OVS.
      
      Fix that by iterating over all the nested actions and freeing
      everything that needs to be freed recursively.
      
      New build time assertion should protect us from this problem if new
      actions will be added in the future.
      
      Unfortunately, openvswitch module doesn't use NLA_F_NESTED, so all
      attributes has to be explicitly checked.  sample() and clone() actions
      are mixing extra attributes into the user-provided action list.  That
      prevents some code generalization too.
      
      Fixes: 34ae932a ("openvswitch: Make tunnel set action attach a metadata dst")
      Link: https://mail.openvswitch.org/pipermail/ovs-dev/2022-March/392922.htmlReported-by: default avatarStéphane Graber <stgraber@ubuntu.com>
      Signed-off-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Acked-by: default avatarAaron Conole <aconole@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f30fb91
    • Mario Limonciello's avatar
      ata: ahci: Rename CONFIG_SATA_LPM_POLICY configuration item back · 55b01415
      Mario Limonciello authored
      CONFIG_SATA_LPM_MOBILE_POLICY was renamed to CONFIG_SATA_LPM_POLICY in
      commit 4dd4d3de ("ata: ahci: Rename CONFIG_SATA_LPM_MOBILE_POLICY
      configuration item").
      
      This can potentially cause problems as users would invisibly lose
      configuration policy defaults when they built the new kernel. To
      avoid such problems, switch back to the old name (even if it's wrong).
      Suggested-by: default avatarChristoph Hellwig <hch@infradead.org>
      Suggested-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Signed-off-by: default avatarMario Limonciello <mario.limonciello@amd.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      55b01415
    • Andrew Lunn's avatar
      net: ethernet: mv643xx: Fix over zealous checking of_get_mac_address() · 11f8e7c1
      Andrew Lunn authored
      There is often not a MAC address available in an EEPROM accessible by
      Linux with Marvell devices. Instead the bootload has the MAC address
      and directly programs it into the hardware. So don't consider an error
      from of_get_mac_address() has fatal. However, the check was added for
      the case where there is a MAC address in an the EEPROM, but the EEPROM
      has not probed yet, and -EPROBE_DEFER is returned. In that case the
      error should be returned. So make the check specific to this error
      code.
      
      Cc: Mauri Sandberg <maukka@ext.kapsi.fi>
      Reported-by: default avatarThomas Walther <walther-it@gmx.de>
      Fixes: 42404d8f ("net: mv643xx_eth: process retval from of_get_mac_address")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20220405000404.3374734-1-andrew@lunn.chSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      11f8e7c1
    • Ilya Maximets's avatar
      net: openvswitch: don't send internal clone attribute to the userspace. · 3f2a3050
      Ilya Maximets authored
      'OVS_CLONE_ATTR_EXEC' is an internal attribute that is used for
      performance optimization inside the kernel.  It's added by the kernel
      while parsing user-provided actions and should not be sent during the
      flow dump as it's not part of the uAPI.
      
      The issue doesn't cause any significant problems to the ovs-vswitchd
      process, because reported actions are not really used in the
      application lifecycle and only supposed to be shown to a human via
      ovs-dpctl flow dump.  However, the action list is still incorrect
      and causes the following error if the user wants to look at the
      datapath flows:
      
        # ovs-dpctl add-dp system@ovs-system
        # ovs-dpctl add-flow "<flow match>" "clone(ct(commit),0)"
        # ovs-dpctl dump-flows
        <flow match>, packets:0, bytes:0, used:never,
          actions:clone(bad length 4, expected -1 for: action0(01 00 00 00),
                        ct(commit),0)
      
      With the fix:
      
        # ovs-dpctl dump-flows
        <flow match>, packets:0, bytes:0, used:never,
          actions:clone(ct(commit),0)
      
      Additionally fixed an incorrect attribute name in the comment.
      
      Fixes: b2335040 ("openvswitch: kernel datapath clone action")
      Signed-off-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Acked-by: default avatarAaron Conole <aconole@redhat.com>
      Link: https://lore.kernel.org/r/20220404104150.2865736-1-i.maximets@ovn.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3f2a3050
    • Horatiu Vultur's avatar
      net: micrel: Fix KS8851 Kconfig · 1d7e4fd7
      Horatiu Vultur authored
      KS8851 selects MICREL_PHY, which depends on PTP_1588_CLOCK_OPTIONAL, so
      make KS8851 also depend on PTP_1588_CLOCK_OPTIONAL.
      
      Fixes kconfig warning and build errors:
      
      WARNING: unmet direct dependencies detected for MICREL_PHY
        Depends on [m]: NETDEVICES [=y] && PHYLIB [=y] && PTP_1588_CLOCK_OPTIONAL [=m]
          Selected by [y]:
            - KS8851 [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_MICREL [=y] && SPI [=y]
      
      ld.lld: error: undefined symbol: ptp_clock_register referenced by micrel.c
      net/phy/micrel.o:(lan8814_probe) in archive drivers/built-in.a
      ld.lld: error: undefined symbol: ptp_clock_index referenced by micrel.c
      net/phy/micrel.o:(lan8814_ts_info) in archive drivers/built-in.a
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Fixes: ece19502 ("net: phy: micrel: 1588 support for LAN8814 phy")
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Tested-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Acked-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Link: https://lore.kernel.org/r/20220405065936.4105272-1-horatiu.vultur@microchip.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1d7e4fd7
  4. 05 Apr, 2022 8 commits