1. 17 Jun, 2023 5 commits
    • Petr Oros's avatar
      devlink: report devlink_port_type_warn source device · a52305a8
      Petr Oros authored
      devlink_port_type_warn is scheduled for port devlink and warning
      when the port type is not set. But from this warning it is not easy
      found out which device (driver) has no devlink port set.
      
      [ 3709.975552] Type was not set for devlink port.
      [ 3709.975579] WARNING: CPU: 1 PID: 13092 at net/devlink/leftover.c:6775 devlink_port_type_warn+0x11/0x20
      [ 3709.993967] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nfnetlink bluetooth rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs vhost_net vhost vhost_iotlb tap tun bridge stp llc qrtr intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm x86_pkg_temp_thermal mlx5_ib intel_powerclamp coretemp dell_wmi ledtrig_audio sparse_keymap ipmi_ssif kvm_intel ib_uverbs rfkill ib_core video kvm iTCO_wdt acpi_ipmi intel_vsec irqbypass ipmi_si iTCO_vendor_support dcdbas ipmi_devintf mei_me ipmi_msghandler rapl mei intel_cstate isst_if_mmio isst_if_mbox_pci dell_smbios intel_uncore isst_if_common i2c_i801 dell_wmi_descriptor wmi_bmof i2c_smbus intel_pch_thermal pcspkr acpi_power_meter xfs libcrc32c sd_mod sg nvme_tcp mgag200 i2c_algo_bit nvme_fabrics drm_shmem_helper drm_kms_helper nvme syscopyarea ahci sysfillrect sysimgblt nvme_core fb_sys_fops crct10dif_pclmul libahci mlx5_core sfc crc32_pclmul nvme_common drm
      [ 3709.994030]  crc32c_intel mtd t10_pi mlxfw libata tg3 mdio megaraid_sas psample ghash_clmulni_intel pci_hyperv_intf wmi dm_multipath sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse
      [ 3710.108431] CPU: 1 PID: 13092 Comm: kworker/1:1 Kdump: loaded Not tainted 5.14.0-319.el9.x86_64 #1
      [ 3710.108435] Hardware name: Dell Inc. PowerEdge R750/0PJ80M, BIOS 1.8.2 09/14/2022
      [ 3710.108437] Workqueue: events devlink_port_type_warn
      [ 3710.108440] RIP: 0010:devlink_port_type_warn+0x11/0x20
      [ 3710.108443] Code: 84 76 fe ff ff 48 c7 03 20 0e 1a ad 31 c0 e9 96 fd ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 48 c7 c7 18 24 4e ad e8 ef 71 62 ff <0f> 0b c3 cc cc cc cc 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f6 87
      [ 3710.108445] RSP: 0018:ff3b6d2e8b3c7e90 EFLAGS: 00010282
      [ 3710.108447] RAX: 0000000000000000 RBX: ff366d6580127080 RCX: 0000000000000027
      [ 3710.108448] RDX: 0000000000000027 RSI: 00000000ffff86de RDI: ff366d753f41f8c8
      [ 3710.108449] RBP: ff366d658ff5a0c0 R08: ff366d753f41f8c0 R09: ff3b6d2e8b3c7e18
      [ 3710.108450] R10: 0000000000000001 R11: 0000000000000023 R12: ff366d753f430600
      [ 3710.108451] R13: ff366d753f436900 R14: 0000000000000000 R15: ff366d753f436905
      [ 3710.108452] FS:  0000000000000000(0000) GS:ff366d753f400000(0000) knlGS:0000000000000000
      [ 3710.108453] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 3710.108454] CR2: 00007f1c57bc74e0 CR3: 000000111d26a001 CR4: 0000000000773ee0
      [ 3710.108456] PKRU: 55555554
      [ 3710.108457] Call Trace:
      [ 3710.108458]  <TASK>
      [ 3710.108459]  process_one_work+0x1e2/0x3b0
      [ 3710.108466]  ? rescuer_thread+0x390/0x390
      [ 3710.108468]  worker_thread+0x50/0x3a0
      [ 3710.108471]  ? rescuer_thread+0x390/0x390
      [ 3710.108473]  kthread+0xdd/0x100
      [ 3710.108477]  ? kthread_complete_and_exit+0x20/0x20
      [ 3710.108479]  ret_from_fork+0x1f/0x30
      [ 3710.108485]  </TASK>
      [ 3710.108486] ---[ end trace 1b4b23cd0c65d6a0 ]---
      
      After patch:
      [  402.473064] ice 0000:41:00.0: Type was not set for devlink port.
      [  402.473064] ice 0000:41:00.1: Type was not set for devlink port.
      Signed-off-by: default avatarPetr Oros <poros@redhat.com>
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Link: https://lore.kernel.org/r/20230615095447.8259-1-poros@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a52305a8
    • Lin Ma's avatar
      net: mctp: remove redundant RTN_UNICAST check · f60ce8a4
      Lin Ma authored
      Current mctp_newroute() contains two exactly same check against
      rtm->rtm_type
      
      static int mctp_newroute(...)
      {
      ...
          if (rtm->rtm_type != RTN_UNICAST) { // (1)
              NL_SET_ERR_MSG(extack, "rtm_type must be RTN_UNICAST");
              return -EINVAL;
          }
      ...
          if (rtm->rtm_type != RTN_UNICAST) // (2)
              return -EINVAL;
      ...
      }
      
      This commits removes the (2) check as it is redundant.
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Acked-by: default avatarJeremy Kerr <jk@codeconstruct.com.au>
      Link: https://lore.kernel.org/r/20230615152240.1749428-1-linma@zju.edu.cnSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f60ce8a4
    • Donald Hunter's avatar
      netlink: specs: fixup openvswitch specs for code generation · 6907217a
      Donald Hunter authored
      Refine the ovs_* specs to align exactly with the ovs netlink UAPI
      definitions to enable code generation.
      Signed-off-by: default avatarDonald Hunter <donald.hunter@gmail.com>
      Link: https://lore.kernel.org/r/20230615151405.77649-1-donald.hunter@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6907217a
    • YueHaibing's avatar
      net: sched: Remove unused qdisc_l2t() · e16ad981
      YueHaibing authored
      This is unused since switch to psched_l2t_ns().
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230615124810.34020-1-yuehaibing@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e16ad981
    • David Howells's avatar
      kcm: Fix unnecessary psock unreservation. · 9f8d0dc0
      David Howells authored
      kcm_write_msgs() calls unreserve_psock() to release its hold on the
      underlying TCP socket if it has run out of things to transmit, but if we
      have nothing in the write queue on entry (e.g. because someone did a
      zero-length sendmsg), we don't actually go into the transmission loop and
      as a consequence don't call reserve_psock().
      
      Fix this by skipping the call to unreserve_psock() if we didn't reserve a
      psock.
      
      Fixes: c31a25e1 ("kcm: Send multiple frags in one sendmsg()")
      Reported-by: syzbot+dd1339599f1840e4cc65@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/r/000000000000a61ffe05fe0c3d08@google.com/Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: syzbot+dd1339599f1840e4cc65@syzkaller.appspotmail.com
      cc: Tom Herbert <tom@herbertland.com>
      cc: Tom Herbert <tom@quantonium.net>
      cc: Jens Axboe <axboe@kernel.dk>
      cc: Matthew Wilcox <willy@infradead.org>
      Link: https://lore.kernel.org/r/20787.1686828722@warthog.procyon.org.ukSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9f8d0dc0
  2. 16 Jun, 2023 22 commits
    • David Howells's avatar
      ip, ip6: Fix splice to raw and ping sockets · 5a6f6873
      David Howells authored
      Splicing to SOCK_RAW sockets may set MSG_SPLICE_PAGES, but in such a case,
      __ip_append_data() will call skb_splice_from_iter() to access the 'from'
      data, assuming it to point to a msghdr struct with an iter, instead of
      using the provided getfrag function to access it.
      
      In the case of raw_sendmsg(), however, this is not the case and 'from' will
      point to a raw_frag_vec struct and raw_getfrag() will be the frag-getting
      function.  A similar issue may occur with rawv6_sendmsg().
      
      Fix this by ignoring MSG_SPLICE_PAGES if getfrag != ip_generic_getfrag as
      ip_generic_getfrag() expects "from" to be a msghdr*, but the other getfrags
      don't.  Note that this will prevent MSG_SPLICE_PAGES from being effective
      for udplite.
      
      This likely affects ping sockets too.  udplite looks like it should be okay
      as it expects "from" to be a msghdr.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reported-by: syzbot+d8486855ef44506fd675@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/r/000000000000ae4cbf05fdeb8349@google.com/
      Fixes: 2dc334f1 ("splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()")
      Tested-by: syzbot+d8486855ef44506fd675@syzkaller.appspotmail.com
      cc: David Ahern <dsahern@kernel.org>
      cc: Jens Axboe <axboe@kernel.dk>
      cc: Matthew Wilcox <willy@infradead.org>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/1410156.1686729856@warthog.procyon.org.ukSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5a6f6873
    • Edward Cree's avatar
      sfc: do not try to call tc functions when CONFIG_SFC_SRIOV=n · c08afcdc
      Edward Cree authored
      Functions efx_tc_netdev_event and efx_tc_netevent_event do not exist
       in that case as object files tc_bindings.o and tc_encap_actions.o
       are not built, so the calls to them from ef100_netdev_event and
       ef100_netevent_event cause link errors.
      Wrap the corresponding header files (tc_bindings.h, tc_encap_actions.h)
       with #if IS_ENABLED(CONFIG_SFC_SRIOV), and add an #else with static
       inline stubs for these two functions.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202306102026.ISK5JfUQ-lkp@intel.com/
      Fixes: 7e5e7d80 ("sfc: neighbour lookup for TC encap action offload")
      Signed-off-by: default avatarEdward Cree <ecree.xilinx@gmail.com>
      Reviewed-by: default avatarMartin Habets <habetsm.xilinx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c08afcdc
    • Randy Dunlap's avatar
      s390/net: lcs: use IS_ENABLED() for kconfig detection · 12827233
      Randy Dunlap authored
      When CONFIG_ETHERNET=m or CONFIG_FDDI=m, lcs.s has build errors or
      warnings:
      
      ../drivers/s390/net/lcs.c:40:2: error: #error Cannot compile lcs.c without some net devices switched on.
         40 | #error Cannot compile lcs.c without some net devices switched on.
      ../drivers/s390/net/lcs.c: In function 'lcs_startlan_auto':
      ../drivers/s390/net/lcs.c:1601:13: warning: unused variable 'rc' [-Wunused-variable]
       1601 |         int rc;
      
      Solve this by using IS_ENABLED(CONFIG_symbol) instead of ifdef
      CONFIG_symbol. The latter only works for builtin (=y) values
      while IS_ENABLED() works for builtin or modular values.
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Alexandra Winter <wintera@linux.ibm.com>
      Cc: Wenjia Zhang <wenjia@linux.ibm.com>
      Cc: linux-s390@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      12827233
    • Jisheng Zhang's avatar
      net: ethernet: litex: add support for 64 bit stats · 18da174d
      Jisheng Zhang authored
      Implement 64 bit per cpu stats to fix the overflow of netdev->stats
      on 32 bit platforms. To simplify the code, we use net core
      pcpu_sw_netstats infrastructure. One small drawback is some memory
      overhead because litex uses just one queue, but we allocate the
      counters per cpu.
      Signed-off-by: default avatarJisheng Zhang <jszhang@kernel.org>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Acked-by: default avatarGabriel Somlo <gsomlo@gmail.com>
      Link: https://lore.kernel.org/r/20230614162035.300-1-jszhang@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      18da174d
    • Jakub Kicinski's avatar
      Merge branch 'optimize-procedure-of-changing-mac-address-on-interface' · 7deb0c3c
      Jakub Kicinski authored
      Piotr Gardocki says:
      
      ====================
      optimize procedure of changing MAC address on interface
      
      The first patch adds an if statement in core to skip early when
      the MAC address is not being changes.
      The remaining patches remove such checks from Intel drivers
      as they're redundant at this point.
      ====================
      
      Link: https://lore.kernel.org/r/20230614145302.902301-1-piotrx.gardocki@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7deb0c3c
    • Piotr Gardocki's avatar
      ice: remove unnecessary check for old MAC == new MAC · 96868cca
      Piotr Gardocki authored
      The check has been moved to core. The ndo_set_mac_address callback
      is not being called with new MAC address equal to the old one anymore.
      Signed-off-by: default avatarPiotr Gardocki <piotrx.gardocki@intel.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      96868cca
    • Piotr Gardocki's avatar
      i40e: remove unnecessary check for old MAC == new MAC · c45a6d1a
      Piotr Gardocki authored
      The check has been moved to core. The ndo_set_mac_address callback
      is not being called with new MAC address equal to the old one anymore.
      Signed-off-by: default avatarPiotr Gardocki <piotrx.gardocki@intel.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c45a6d1a
    • Piotr Gardocki's avatar
      net: add check for current MAC address in dev_set_mac_address · ad72c4a0
      Piotr Gardocki authored
      In some cases it is possible for kernel to come with request
      to change primary MAC address to the address that is already
      set on the given interface.
      
      Add proper check to return fast from the function in these cases.
      
      An example of such case is adding an interface to bonding
      channel in balance-alb mode:
      modprobe bonding mode=balance-alb miimon=100 max_bonds=1
      ip link set bond0 up
      ifenslave bond0 <eth>
      Signed-off-by: default avatarPiotr Gardocki <piotrx.gardocki@intel.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ad72c4a0
    • Jakub Kicinski's avatar
      eth: fs_enet: fix print format for resource size · 8f72fb15
      Jakub Kicinski authored
      Randy reported that linux-next build warns on PowerPC:
      
      drivers/net/ethernet/freescale/fs_enet/mii-fec.c: In function 'fs_enet_mdio_probe':
      drivers/net/ethernet/freescale/fs_enet/mii-fec.c:130:50: warning: format '%x' expects argument of type 'unsigned int', but argument 4 has type 'resource_size_t' {aka 'long long unsigned int'} [-Wformat=]
        130 |         snprintf(new_bus->id, MII_BUS_ID_SIZE, "%x", res.start);
            |                                                 ~^   ~~~~~~~~~
            |                                                  |      |
            |                                                  |      resource_size_t {aka long long unsigned int}
            |                                                  unsigned int
            |                                                 %llx
      
      Use the right print format.
      
      Link: https://lore.kernel.org/all/8f9f8d38-d9c7-9f1b-feb0-103d76902d14@infradead.org/Reported-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Acked-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
      Link: https://lore.kernel.org/r/20230615035231.2184880-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8f72fb15
    • David Howells's avatar
      splice, net: Fix splice_to_socket() to handle pipe bufs larger than a page · ca2d49f7
      David Howells authored
      splice_to_socket() assumes that a pipe_buffer won't hold more than a single
      page of data - but this assumption can be violated by skb_splice_bits()
      when it splices from a socket into a pipe.
      
      The problem is that splice_to_socket() doesn't advance the pipe_buffer
      length and offset when transcribing from the pipe buf into a bio_vec, so if
      the buf is >PAGE_SIZE, it keeps repeating the same initial chunk and
      doesn't advance the tail index.  It then subtracts this from "remain" and
      overcounts the amount of data to be sent.
      
      The cleanup phase then tries to overclean the pipe, hits an unused pipe buf
      and a NULL-pointer dereference occurs.
      
      Fix this by not restricting the bio_vec size to PAGE_SIZE and instead
      transcribing the entirety of each pipe_buffer into a single bio_vec and
      advancing the tail index if remain hasn't hit zero yet.
      
      Large bio_vecs will then be split up by iterator functions such as
      iov_iter_extract_pages().
      
      This resulted in a KASAN report looking like:
      
      general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
      ...
      RIP: 0010:pipe_buf_release include/linux/pipe_fs_i.h:203 [inline]
      RIP: 0010:splice_to_socket+0xa91/0xe30 fs/splice.c:933
      
      Fixes: 2dc334f1 ("splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()")
      Reported-by: syzbot+f9e28a23426ac3b24f20@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/r/0000000000000900e905fdeb8e39@google.com/
      Tested-by: syzbot+f9e28a23426ac3b24f20@syzkaller.appspotmail.com
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      cc: David Ahern <dsahern@kernel.org>
      cc: Jens Axboe <axboe@kernel.dk>
      cc: Matthew Wilcox <willy@infradead.org>
      cc: Christian Brauner <brauner@kernel.org>
      cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Link: https://lore.kernel.org/r/1428985.1686737388@warthog.procyon.org.ukSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ca2d49f7
    • Stephen Rothwell's avatar
      sunvnet: fix sparc64 build error after gso code split · d9ffa069
      Stephen Rothwell authored
      After merging the net-next tree, today's linux-next build (sparc64
      defconfig) failed like this:
      
      drivers/net/ethernet/sun/sunvnet_common.c: In function 'vnet_handle_offloads':
      drivers/net/ethernet/sun/sunvnet_common.c:1277:16: error: implicit declaration of function 'skb_gso_segment'; did you mean 'skb_gso_reset'? [-Werror=implicit-function-declaration]
       1277 |         segs = skb_gso_segment(skb, dev->features & ~NETIF_F_TSO);
            |                ^~~~~~~~~~~~~~~
            |                skb_gso_reset
      drivers/net/ethernet/sun/sunvnet_common.c:1277:14: warning: assignment to 'struct sk_buff *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
       1277 |         segs = skb_gso_segment(skb, dev->features & ~NETIF_F_TSO);
            |              ^
      
      Fixes: d457a0e3 ("net: move gso declarations and functions to their own files")
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230613164639.164b2991@canb.auug.org.auSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d9ffa069
    • David Arinzon's avatar
      net: ena: Add dynamic recycling mechanism for rx buffers · f7d625ad
      David Arinzon authored
      The current implementation allocates page-sized rx buffers.
      As traffic may consist of different types and sizes of packets,
      in various cases, buffers are not fully used.
      
      This change (Dynamic RX Buffers - DRB) uses part of the allocated rx
      page needed for the incoming packet, and returns the rest of the
      unused page to be used again as an rx buffer for future packets.
      A threshold of 2K for unused space has been set in order to declare
      whether the remainder of the page can be reused again as an rx buffer.
      
      As a page may be reused, dma_sync_single_for_cpu() is added in order
      to sync the memory to the CPU side after it was owned by the HW.
      In addition, when the rx page can no longer be reused, it is being
      unmapped using dma_page_unmap(), which implicitly syncs and then
      unmaps the entire page. In case the kernel still handles the skbs
      pointing to the previous buffers from that rx page, it may access
      garbage pointers, caused by the implicit sync overwriting them.
      The implicit dma sync is removed by replacing dma_page_unmap() with
      dma_unmap_page_attrs() with DMA_ATTR_SKIP_CPU_SYNC flag.
      
      The functionality is disabled for XDP traffic to avoid handling
      several descriptors per packet.
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Link: https://lore.kernel.org/r/20230612121448.28829-1-darinzon@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f7d625ad
    • Breno Leitao's avatar
      net: ioctl: Use kernel memory on protocol ioctl callbacks · e1d001fa
      Breno Leitao authored
      Most of the ioctls to net protocols operates directly on userspace
      argument (arg). Usually doing get_user()/put_user() directly in the
      ioctl callback.  This is not flexible, because it is hard to reuse these
      functions without passing userspace buffers.
      
      Change the "struct proto" ioctls to avoid touching userspace memory and
      operate on kernel buffers, i.e., all protocol's ioctl callbacks is
      adapted to operate on a kernel memory other than on userspace (so, no
      more {put,get}_user() and friends being called in the ioctl callback).
      
      This changes the "struct proto" ioctl format in the following way:
      
          int                     (*ioctl)(struct sock *sk, int cmd,
      -                                        unsigned long arg);
      +                                        int *karg);
      
      (Important to say that this patch does not touch the "struct proto_ops"
      protocols)
      
      So, the "karg" argument, which is passed to the ioctl callback, is a
      pointer allocated to kernel space memory (inside a function wrapper).
      This buffer (karg) may contain input argument (copied from userspace in
      a prep function) and it might return a value/buffer, which is copied
      back to userspace if necessary. There is not one-size-fits-all format
      (that is I am using 'may' above), but basically, there are three type of
      ioctls:
      
      1) Do not read from userspace, returns a result to userspace
      2) Read an input parameter from userspace, and does not return anything
        to userspace
      3) Read an input from userspace, and return a buffer to userspace.
      
      The default case (1) (where no input parameter is given, and an "int" is
      returned to userspace) encompasses more than 90% of the cases, but there
      are two other exceptions. Here is a list of exceptions:
      
      * Protocol RAW:
         * cmd = SIOCGETVIFCNT:
           * input and output = struct sioc_vif_req
         * cmd = SIOCGETSGCNT
           * input and output = struct sioc_sg_req
         * Explanation: for the SIOCGETVIFCNT case, userspace passes the input
           argument, which is struct sioc_vif_req. Then the callback populates
           the struct, which is copied back to userspace.
      
      * Protocol RAW6:
         * cmd = SIOCGETMIFCNT_IN6
           * input and output = struct sioc_mif_req6
         * cmd = SIOCGETSGCNT_IN6
           * input and output = struct sioc_sg_req6
      
      * Protocol PHONET:
        * cmd == SIOCPNADDRESOURCE | SIOCPNDELRESOURCE
           * input int (4 bytes)
        * Nothing is copied back to userspace.
      
      For the exception cases, functions sock_sk_ioctl_inout() will
      copy the userspace input, and copy it back to kernel space.
      
      The wrapper that prepare the buffer and put the buffer back to user is
      sk_ioctl(), so, instead of calling sk->sk_prot->ioctl(), the callee now
      calls sk_ioctl(), which will handle all cases.
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230609152800.830401-1-leitao@debian.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e1d001fa
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 173780ff
      Jakub Kicinski authored
      Cross-merge networking fixes after downstream PR.
      
      Conflicts:
      
      include/linux/mlx5/driver.h
        617f5db1 ("RDMA/mlx5: Fix affinity assignment")
        dc131808 ("net/mlx5: Enable devlink port for embedded cpu VF vports")
      https://lore.kernel.org/all/20230613125939.595e50b8@canb.auug.org.au/
      
      tools/testing/selftests/net/mptcp/mptcp_join.sh
        47867f0a ("selftests: mptcp: join: skip check if MIB counter not supported")
        425ba803 ("selftests: mptcp: join: support RM_ADDR for used endpoints or not")
        45b1a122 ("mptcp: introduces more address related mibs")
        0639fa23 ("selftests: mptcp: add explicit check for new mibs")
      https://lore.kernel.org/netdev/20230609-upstream-net-20230610-mptcp-selftests-support-old-kernels-part-3-v1-0-2896fe2ee8a3@tessares.net/
      
      No adjacent changes.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      173780ff
    • Linus Torvalds's avatar
      Merge tag 'net-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 40f71e7c
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from wireless, and netfilter.
      
        Selftests excluded - we have 58 patches and diff of +442/-199, which
        isn't really small but perhaps with the exception of the WiFi locking
        change it's old(ish) bugs.
      
        We have no known problems with v6.4.
      
        The selftest changes are rather large as MPTCP folks try to apply
        Greg's guidance that selftest from torvalds/linux should be able to
        run against stable kernels.
      
        Last thing I should call out is the DCCP/UDP-lite deprecation notices.
        We are fairly sure those are dead, but if we're wrong reverting them
        back in won't be fun.
      
        Current release - regressions:
      
         - wifi:
            - cfg80211: fix double lock bug in reg_wdev_chan_valid()
            - iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
      
        Current release - new code bugs:
      
         - handshake: remove fput() that causes use-after-free
      
        Previous releases - regressions:
      
         - sched: cls_u32: fix reference counter leak leading to overflow
      
         - sched: cls_api: fix lockup on flushing explicitly created chain
      
        Previous releases - always broken:
      
         - nf_tables: integrate pipapo into commit protocol
      
         - nf_tables: incorrect error path handling with NFT_MSG_NEWRULE, fix
           dangling pointer on failure
      
         - ping6: fix send to link-local addresses with VRF
      
         - sched: act_pedit: parse L3 header for L4 offset, the skb may not
           have the offset saved
      
         - sched: act_ct: fix promotion of offloaded unreplied tuple
      
         - sched: refuse to destroy an ingress and clsact Qdiscs if there are
           lockless change operations in flight
      
         - wifi: mac80211: fix handful of bugs in multi-link operation
      
         - ipvlan: fix bound dev checking for IPv6 l3s mode
      
         - eth: enetc: correct the indexes of highest and 2nd highest TCs
      
         - eth: ice: fix XDP memory leak when NIC is brought up and down
      
        Misc:
      
         - add deprecation notices for UDP-lite and DCCP
      
         - selftests: mptcp: skip tests not supported by old kernels
      
         - sctp: handle invalid error codes without calling BUG()"
      
      * tag 'net-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
        dccp: Print deprecation notice.
        udplite: Print deprecation notice.
        octeon_ep: Add missing check for ioremap
        selftests/ptp: Fix timestamp printf format for PTP_SYS_OFFSET
        net: ethernet: stmicro: stmmac: fix possible memory leak in __stmmac_open
        net: tipc: resize nlattr array to correct size
        sfc: fix XDP queues mode with legacy IRQ
        net: macsec: fix double free of percpu stats
        net: lapbether: only support ethernet devices
        MAINTAINERS: add reviewers for SMC Sockets
        s390/ism: Fix trying to free already-freed IRQ by repeated ism_dev_exit()
        net: dsa: felix: fix taprio guard band overflow at 10Mbps with jumbo frames
        net/sched: cls_api: Fix lockup on flushing explicitly created chain
        ice: Fix ice module unload
        net/handshake: remove fput() that causes use-after-free
        selftests: forwarding: hw_stats_l3: Set addrgenmode in a separate step
        net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
        net/sched: Refactor qdisc_graft() for ingress and clsact Qdiscs
        net/sched: act_ct: Fix promotion of offloaded unreplied tuple
        wifi: iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
        ...
      40f71e7c
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-6.4-1' of... · 627d8586
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai Chen:
       "Some trivial bug fixes for v6.4-rc7"
      
      * tag 'loongarch-fixes-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        LoongArch: Fix debugfs_create_dir() error checking
        LoongArch: Avoid uninitialized alignment_mask
        LoongArch: Fix perf event id calculation
        LoongArch: Fix the write_fcsr() macro
        LoongArch: Let pmd_present() return true when splitting pmd
      627d8586
    • Linus Torvalds's avatar
      Merge tag 'for-6.4/dm-fixes' of... · 0e306952
      Linus Torvalds authored
      Merge tag 'for-6.4/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper fixes from Mike Snitzer:
      
       - Fix DM thinp discard performance regression introduced during this
         merge window where DM core was splitting large discards every 128K
         (max_sectors_kb) rather than every 64M (discard_max_bytes).
      
       - Extend DM core LOCKFS fix, made during 6.4 merge, to also fix race
         between do_mount and dm's do_suspend (in addition to the earlier
         fix's do_mount race with dm's do_resume).
      
       - Fix DM thin metadata operations to first check if the thin-pool is in
         "fail_io" mode; otherwise UAF can occur.
      
       - Fix DM thinp's call to __blkdev_issue_discard to use GFP_NOIO rather
         than GFP_NOWAIT (__blkdev_issue_discard cannot handle NULL return
         from bio_alloc).
      
      * tag 'for-6.4/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm: use op specific max_sectors when splitting abnormal io
        dm thin: fix issue_discard to pass GFP_NOIO to __blkdev_issue_discard
        dm thin metadata: check fail_io before using data_sm
        dm: don't lock fs when the map is NULL during suspend or resume
      0e306952
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 93fd8eb0
      Linus Torvalds authored
      Pull rdma fixes from Jason Gunthorpe:
       "This is an unusually large bunch of bug fixes for the later rc cycle,
        rxe and mlx5 both dumped a lot of things at once. rxe continues to fix
        itself, and mlx5 is fixing a bunch of "queue counters" related bugs.
      
        There is one highly notable bug fix regarding the qkey. This small
        security check was missed in the original 2005 implementation and it
        allows some significant issues.
      
        Summary:
      
         - Two rtrs bug fixes for error unwind bugs
      
         - Several rxe bug fixes:
            * Incorrect Rx packet validation
            * Using memory without a refcount
            * Syzkaller found use before initialization
            * Regression fix for missing locking with the tasklet conversion
              from this merge window
      
         - Have bnxt report the correct link properties to userspace, this was
           a regression in v6.3
      
         - Several mlx5 bug fixes:
            * Kernel crash triggerable by userspace for the RAW ethernet
              profile
            * Defend against steering refcounting issues created by userspace
            * Incorrect change of QP port affinity parameters in some LAG
              configurations
      
         - Fix mlx5 Q counters:
            * Do not over allocate Q counters to allow userspace to use the
              full port capacity
            * Kernel crash triggered by eswitch due to mis-use of Q counters
            * Incorrect mlx5_device for Q counters in some LAG configurations
      
         - Properly implement the IBA spec restricting privileged qkeys to
           root
      
         - Always an error when reading from a disassociated device's event
           queue
      
         - isert bug fixes:
            * Avoid a deadlock with the CM handler and CM ID destruction
            * Correct list corruption due to incorrect locking
            * Fix a use after free around connection tear down"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
        RDMA/rxe: Fix rxe_cq_post
        IB/isert: Fix incorrect release of isert connection
        IB/isert: Fix possible list corruption in CMA handler
        IB/isert: Fix dead lock in ib_isert
        RDMA/mlx5: Fix affinity assignment
        IB/uverbs: Fix to consider event queue closing also upon non-blocking mode
        RDMA/uverbs: Restrict usage of privileged QKEYs
        RDMA/cma: Always set static rate to 0 for RoCE
        RDMA/mlx5: Fix Q-counters query in LAG mode
        RDMA/mlx5: Remove vport Q-counters dependency on normal Q-counters
        RDMA/mlx5: Fix Q-counters per vport allocation
        RDMA/mlx5: Create an indirect flow table for steering anchor
        RDMA/mlx5: Initiate dropless RQ for RAW Ethernet functions
        RDMA/rxe: Fix the use-before-initialization error of resp_pkts
        RDMA/bnxt_re: Fix reporting active_{speed,width} attributes
        RDMA/rxe: Fix ref count error in check_rkey()
        RDMA/rxe: Fix packet length checks
        RDMA/rtrs: Fix rxe_dealloc_pd warning
        RDMA/rtrs: Fix the last iu->buf leak in err path
      93fd8eb0
    • Linus Torvalds's avatar
      Merge tag 'spi-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · b7feaa49
      Linus Torvalds authored
      Pull spi fixes from Mark Brown:
       "A few more driver specific fixes.
      
        The DesignWare fix is for an issue introduced by conversion to the
        chip select accessor functions and is pretty important but the other
        two are less severe"
      
      * tag 'spi-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi: dw: Replace incorrect spi_get_chipselect with set
        spi: fsl-dspi: avoid SCK glitches with continuous transfers
        spi: cadence-quadspi: Add missing check for dma_set_mask
      b7feaa49
    • Linus Torvalds's avatar
      Merge tag 'regulator-fix-v6.4-rc6' of... · eee71c34
      Linus Torvalds authored
      Merge tag 'regulator-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
      
      Pull regulator fix from Mark Brown:
       "The set of regulators described for the Qualcomm PM8550 just seems to
        have been completely wrong and would likely not have worked at all if
        anything tried to actually configure anything except for enabling and
        disabling at runtime"
      
      * tag 'regulator-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: qcom-rpmh: Fix regulators for PM8550
      eee71c34
    • Linus Torvalds's avatar
      Merge tag 'regmap-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · 231a1e31
      Linus Torvalds authored
      Pull regmap fix from Mark Brown:
       "Another fix for the maple tree cache, Takashi noticed that unlike
        other caches the maple tree cache didn't check for read only registers
        before trying to sync which would result in spurious syncs for read
        only registers where we don't have a default.
      
        This was due to the check being open coded in the caches, we now check
        in the shared 'does this register need sync' function so that is fixed
        for this and future caches"
      
      * tag 'regmap-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        regmap: regcache: Don't sync read-only registers
      231a1e31
    • Linus Torvalds's avatar
      Merge tag 'media/v6.4-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · c926a55f
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
       "A fix for dvb-core to avoid a race condition during DVB board
        registration"
      
      * tag 'media/v6.4-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        Revert "media: dvb-core: Fix use-after-free on race condition at dvb_frontend"
      c926a55f
  3. 15 Jun, 2023 13 commits