1. 22 Feb, 2024 14 commits
  2. 21 Feb, 2024 12 commits
    • Jakub Kicinski's avatar
      Merge branch 'tls-fixes-for-record-type-handling-with-peek' · f76d5f65
      Jakub Kicinski authored
      Sabrina Dubroca says:
      
      ====================
      tls: fixes for record type handling with PEEK
      
      There are multiple bugs in tls_sw_recvmsg's handling of record types
      when MSG_PEEK flag is used, which can lead to incorrectly merging two
      records:
       - consecutive non-DATA records shouldn't be merged, even if they're
         the same type (partly handled by the test at the end of the main
         loop)
       - records of the same type (even DATA) shouldn't be merged if one
         record of a different type comes in between
      ====================
      
      Link: https://lore.kernel.org/r/cover.1708007371.git.sd@queasysnail.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f76d5f65
    • Sabrina Dubroca's avatar
      selftests: tls: add test for peeking past a record of a different type · 2bf61726
      Sabrina Dubroca authored
      If we queue 3 records:
       - record 1, type DATA
       - record 2, some other type
       - record 3, type DATA
      the current code can look past the 2nd record and merge the 2 data
      records.
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Link: https://lore.kernel.org/r/4623550f8617c239581030c13402d3262f2bd14f.1708007371.git.sd@queasysnail.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2bf61726
    • Sabrina Dubroca's avatar
      selftests: tls: add test for merging of same-type control messages · 7b2a4c2a
      Sabrina Dubroca authored
      Two consecutive control messages of the same type should never be
      merged into one large received blob of data.
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Link: https://lore.kernel.org/r/018f1633d5471684c65def5fe390de3b15c3d683.1708007371.git.sd@queasysnail.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7b2a4c2a
    • Sabrina Dubroca's avatar
      tls: don't skip over different type records from the rx_list · ec823bf3
      Sabrina Dubroca authored
      If we queue 3 records:
       - record 1, type DATA
       - record 2, some other type
       - record 3, type DATA
      and do a recv(PEEK), the rx_list will contain the first two records.
      
      The next large recv will walk through the rx_list and copy data from
      record 1, then stop because record 2 is a different type. Since we
      haven't filled up our buffer, we will process the next available
      record. It's also DATA, so we can merge it with the current read.
      
      We shouldn't do that, since there was a record in between that we
      ignored.
      
      Add a flag to let process_rx_list inform tls_sw_recvmsg that it had
      more data available.
      
      Fixes: 692d7b5d ("tls: Fix recvmsg() to be able to peek across multiple records")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Link: https://lore.kernel.org/r/f00c0c0afa080c60f016df1471158c1caf983c34.1708007371.git.sd@queasysnail.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ec823bf3
    • Sabrina Dubroca's avatar
      tls: stop recv() if initial process_rx_list gave us non-DATA · fdfbaec5
      Sabrina Dubroca authored
      If we have a non-DATA record on the rx_list and another record of the
      same type still on the queue, we will end up merging them:
       - process_rx_list copies the non-DATA record
       - we start the loop and process the first available record since it's
         of the same type
       - we break out of the loop since the record was not DATA
      
      Just check the record type and jump to the end in case process_rx_list
      did some work.
      
      Fixes: 692d7b5d ("tls: Fix recvmsg() to be able to peek across multiple records")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Link: https://lore.kernel.org/r/bd31449e43bd4b6ff546f5c51cf958c31c511deb.1708007371.git.sd@queasysnail.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fdfbaec5
    • Sabrina Dubroca's avatar
      tls: break out of main loop when PEEK gets a non-data record · 10f41d07
      Sabrina Dubroca authored
      PEEK needs to leave decrypted records on the rx_list so that we can
      receive them later on, so it jumps back into the async code that
      queues the skb. Unfortunately that makes us skip the
      TLS_RECORD_TYPE_DATA check at the bottom of the main loop, so if two
      records of the same (non-DATA) type are queued, we end up merging
      them.
      
      Add the same record type check, and make it unlikely to not penalize
      the async fastpath. Async decrypt only applies to data record, so this
      check is only needed for PEEK.
      
      process_rx_list also has similar issues.
      
      Fixes: 692d7b5d ("tls: Fix recvmsg() to be able to peek across multiple records")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Link: https://lore.kernel.org/r/3df2eef4fdae720c55e69472b5bea668772b45a2.1708007371.git.sd@queasysnail.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      10f41d07
    • Vasiliy Kovalev's avatar
      gtp: fix use-after-free and null-ptr-deref in gtp_genl_dump_pdp() · 136cfaca
      Vasiliy Kovalev authored
      The gtp_net_ops pernet operations structure for the subsystem must be
      registered before registering the generic netlink family.
      
      Syzkaller hit 'general protection fault in gtp_genl_dump_pdp' bug:
      
      general protection fault, probably for non-canonical address
      0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN NOPTI
      KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
      CPU: 1 PID: 5826 Comm: gtp Not tainted 6.8.0-rc3-std-def-alt1 #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-alt1 04/01/2014
      RIP: 0010:gtp_genl_dump_pdp+0x1be/0x800 [gtp]
      Code: c6 89 c6 e8 64 e9 86 df 58 45 85 f6 0f 85 4e 04 00 00 e8 c5 ee 86
            df 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80>
            3c 02 00 0f 85 de 05 00 00 48 8b 44 24 18 4c 8b 30 4c 39 f0 74
      RSP: 0018:ffff888014107220 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: ffff88800fcda588 R14: 0000000000000001 R15: 0000000000000000
      FS:  00007f1be4eb05c0(0000) GS:ffff88806ce80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f1be4e766cf CR3: 000000000c33e000 CR4: 0000000000750ef0
      PKRU: 55555554
      Call Trace:
       <TASK>
       ? show_regs+0x90/0xa0
       ? die_addr+0x50/0xd0
       ? exc_general_protection+0x148/0x220
       ? asm_exc_general_protection+0x22/0x30
       ? gtp_genl_dump_pdp+0x1be/0x800 [gtp]
       ? __alloc_skb+0x1dd/0x350
       ? __pfx___alloc_skb+0x10/0x10
       genl_dumpit+0x11d/0x230
       netlink_dump+0x5b9/0xce0
       ? lockdep_hardirqs_on_prepare+0x253/0x430
       ? __pfx_netlink_dump+0x10/0x10
       ? kasan_save_track+0x10/0x40
       ? __kasan_kmalloc+0x9b/0xa0
       ? genl_start+0x675/0x970
       __netlink_dump_start+0x6fc/0x9f0
       genl_family_rcv_msg_dumpit+0x1bb/0x2d0
       ? __pfx_genl_family_rcv_msg_dumpit+0x10/0x10
       ? genl_op_from_small+0x2a/0x440
       ? cap_capable+0x1d0/0x240
       ? __pfx_genl_start+0x10/0x10
       ? __pfx_genl_dumpit+0x10/0x10
       ? __pfx_genl_done+0x10/0x10
       ? security_capable+0x9d/0xe0
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarVasiliy Kovalev <kovalev@altlinux.org>
      Fixes: 459aa660 ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
      Link: https://lore.kernel.org/r/20240214162733.34214-1-kovalev@altlinux.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      136cfaca
    • Simon Horman's avatar
      MAINTAINERS: Add framer headers to NETWORKING [GENERAL] · 14dec56f
      Simon Horman authored
      The cited commit [1] added framer support under drivers/net/wan,
      which is covered by NETWORKING [GENERAL]. And it is implied
      that framer-provider.h and framer.h, which were also added
      buy the same patch, are also maintained as part of NETWORKING [GENERAL].
      
      Make this explicit by adding these files to the corresponding
      section in MAINTAINERS.
      
      [1] 82c944d0 ("net: wan: Add framer framework support")
      Signed-off-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarHerve Codina <herve.codina@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14dec56f
    • Kuniyuki Iwashima's avatar
      af_unix: Drop oob_skb ref before purging queue in GC. · aa82ac51
      Kuniyuki Iwashima authored
      syzbot reported another task hung in __unix_gc().  [0]
      
      The current while loop assumes that all of the left candidates
      have oob_skb and calling kfree_skb(oob_skb) releases the remaining
      candidates.
      
      However, I missed a case that oob_skb has self-referencing fd and
      another fd and the latter sk is placed before the former in the
      candidate list.  Then, the while loop never proceeds, resulting
      the task hung.
      
      __unix_gc() has the same loop just before purging the collected skb,
      so we can call kfree_skb(oob_skb) there and let __skb_queue_purge()
      release all inflight sockets.
      
      [0]:
      Sending NMI from CPU 0 to CPUs 1:
      NMI backtrace for cpu 1
      CPU: 1 PID: 2784 Comm: kworker/u4:8 Not tainted 6.8.0-rc4-syzkaller-01028-g71b605d3 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
      Workqueue: events_unbound __unix_gc
      RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x70 kernel/kcov.c:200
      Code: 89 fb e8 23 00 00 00 48 8b 3d 84 f5 1a 0c 48 89 de 5b e9 43 26 57 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <f3> 0f 1e fa 48 8b 04 24 65 48 8b 0d 90 52 70 7e 65 8b 15 91 52 70
      RSP: 0018:ffffc9000a17fa78 EFLAGS: 00000287
      RAX: ffffffff8a0a6108 RBX: ffff88802b6c2640 RCX: ffff88802c0b3b80
      RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
      RBP: ffffc9000a17fbf0 R08: ffffffff89383f1d R09: 1ffff1100ee5ff84
      R10: dffffc0000000000 R11: ffffed100ee5ff85 R12: 1ffff110056d84ee
      R13: ffffc9000a17fae0 R14: 0000000000000000 R15: ffffffff8f47b840
      FS:  0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffef5687ff8 CR3: 0000000029b34000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <NMI>
       </NMI>
       <TASK>
       __unix_gc+0xe69/0xf40 net/unix/garbage.c:343
       process_one_work kernel/workqueue.c:2633 [inline]
       process_scheduled_works+0x913/0x1420 kernel/workqueue.c:2706
       worker_thread+0xa5f/0x1000 kernel/workqueue.c:2787
       kthread+0x2ef/0x390 kernel/kthread.c:388
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242
       </TASK>
      
      Reported-and-tested-by: syzbot+ecab4d36f920c3574bf9@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=ecab4d36f920c3574bf9
      Fixes: 25236c91 ("af_unix: Fix task hung while purging oob_skb in GC.")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa82ac51
    • Alex Elder's avatar
      net: ipa: don't overrun IPA suspend interrupt registers · d80f8e96
      Alex Elder authored
      In newer hardware, IPA supports more than 32 endpoints.  Some
      registers--such as IPA interrupt registers--represent endpoints
      as bits in a 4-byte register, and such registers are repeated as
      needed to represent endpoints beyond the first 32.
      
      In ipa_interrupt_suspend_clear_all(), we clear all pending IPA
      suspend interrupts by reading all status register(s) and writing
      corresponding registers to clear interrupt conditions.
      
      Unfortunately the number of registers to read/write is calculated
      incorrectly, and as a result we access *many* more registers than
      intended.  This bug occurs only when the IPA hardware signals a
      SUSPEND interrupt, which happens when a packet is received for an
      endpoint (or its underlying GSI channel) that is suspended.  This
      situation is difficult to reproduce, but possible.
      
      Fix this by correctly computing the number of interrupt registers to
      read and write.  This is the only place in the code where registers
      that map endpoints or channels this way perform this calculation.
      
      Fixes: f298ba78 ("net: ipa: add a parameter to suspend registers")
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d80f8e96
    • Eric Dumazet's avatar
      net: implement lockless setsockopt(SO_PEEK_OFF) · 56667da7
      Eric Dumazet authored
      syzbot reported a lockdep violation [1] involving af_unix
      support of SO_PEEK_OFF.
      
      Since SO_PEEK_OFF is inherently not thread safe (it uses a per-socket
      sk_peek_off field), there is really no point to enforce a pointless
      thread safety in the kernel.
      
      After this patch :
      
      - setsockopt(SO_PEEK_OFF) no longer acquires the socket lock.
      
      - skb_consume_udp() no longer has to acquire the socket lock.
      
      - af_unix no longer needs a special version of sk_set_peek_off(),
        because it does not lock u->iolock anymore.
      
      As a followup, we could replace prot->set_peek_off to be a boolean
      and avoid an indirect call, since we always use sk_set_peek_off().
      
      [1]
      
      WARNING: possible circular locking dependency detected
      6.8.0-rc4-syzkaller-00267-g0f1dd5e9 #0 Not tainted
      
      syz-executor.2/30025 is trying to acquire lock:
       ffff8880765e7d80 (&u->iolock){+.+.}-{3:3}, at: unix_set_peek_off+0x26/0xa0 net/unix/af_unix.c:789
      
      but task is already holding lock:
       ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
       ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: sockopt_lock_sock net/core/sock.c:1060 [inline]
       ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: sk_setsockopt+0xe52/0x3360 net/core/sock.c:1193
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (sk_lock-AF_UNIX){+.+.}-{0:0}:
              lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
              lock_sock_nested+0x48/0x100 net/core/sock.c:3524
              lock_sock include/net/sock.h:1691 [inline]
              __unix_dgram_recvmsg+0x1275/0x12c0 net/unix/af_unix.c:2415
              sock_recvmsg_nosec+0x18e/0x1d0 net/socket.c:1046
              ____sys_recvmsg+0x3c0/0x470 net/socket.c:2801
              ___sys_recvmsg net/socket.c:2845 [inline]
              do_recvmmsg+0x474/0xae0 net/socket.c:2939
              __sys_recvmmsg net/socket.c:3018 [inline]
              __do_sys_recvmmsg net/socket.c:3041 [inline]
              __se_sys_recvmmsg net/socket.c:3034 [inline]
              __x64_sys_recvmmsg+0x199/0x250 net/socket.c:3034
             do_syscall_64+0xf9/0x240
             entry_SYSCALL_64_after_hwframe+0x6f/0x77
      
      -> #0 (&u->iolock){+.+.}-{3:3}:
              check_prev_add kernel/locking/lockdep.c:3134 [inline]
              check_prevs_add kernel/locking/lockdep.c:3253 [inline]
              validate_chain+0x18ca/0x58e0 kernel/locking/lockdep.c:3869
              __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
              lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
              __mutex_lock_common kernel/locking/mutex.c:608 [inline]
              __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
              unix_set_peek_off+0x26/0xa0 net/unix/af_unix.c:789
             sk_setsockopt+0x207e/0x3360
              do_sock_setsockopt+0x2fb/0x720 net/socket.c:2307
              __sys_setsockopt+0x1ad/0x250 net/socket.c:2334
              __do_sys_setsockopt net/socket.c:2343 [inline]
              __se_sys_setsockopt net/socket.c:2340 [inline]
              __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
             do_syscall_64+0xf9/0x240
             entry_SYSCALL_64_after_hwframe+0x6f/0x77
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(sk_lock-AF_UNIX);
                                     lock(&u->iolock);
                                     lock(sk_lock-AF_UNIX);
        lock(&u->iolock);
      
       *** DEADLOCK ***
      
      1 lock held by syz-executor.2/30025:
        #0: ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
        #0: ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: sockopt_lock_sock net/core/sock.c:1060 [inline]
        #0: ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: sk_setsockopt+0xe52/0x3360 net/core/sock.c:1193
      
      stack backtrace:
      CPU: 0 PID: 30025 Comm: syz-executor.2 Not tainted 6.8.0-rc4-syzkaller-00267-g0f1dd5e9 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106
        check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187
        check_prev_add kernel/locking/lockdep.c:3134 [inline]
        check_prevs_add kernel/locking/lockdep.c:3253 [inline]
        validate_chain+0x18ca/0x58e0 kernel/locking/lockdep.c:3869
        __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
        lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
        __mutex_lock_common kernel/locking/mutex.c:608 [inline]
        __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
        unix_set_peek_off+0x26/0xa0 net/unix/af_unix.c:789
       sk_setsockopt+0x207e/0x3360
        do_sock_setsockopt+0x2fb/0x720 net/socket.c:2307
        __sys_setsockopt+0x1ad/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xf9/0x240
       entry_SYSCALL_64_after_hwframe+0x6f/0x77
      RIP: 0033:0x7f78a1c7dda9
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f78a0fde0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 00007f78a1dac050 RCX: 00007f78a1c7dda9
      RDX: 000000000000002a RSI: 0000000000000001 RDI: 0000000000000006
      RBP: 00007f78a1cca47a R08: 0000000000000004 R09: 0000000000000000
      R10: 0000000020000180 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000006e R14: 00007f78a1dac050 R15: 00007ffe5cd81ae8
      
      Fixes: 859051dd ("bpf: Implement cgroup sockaddr hooks for unix sockets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Cc: Daan De Meyer <daan.j.demeyer@gmail.com>
      Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
      Cc: Martin KaFai Lau <martin.lau@kernel.org>
      Cc: David Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56667da7
    • Subbaraya Sundeep's avatar
      octeontx2-af: Consider the action set by PF · 3b1ae9b7
      Subbaraya Sundeep authored
      AF reserves MCAM entries for each PF, VF present in the
      system and populates the entry with DMAC and action with
      default RSS so that basic packet I/O works. Since PF/VF is
      not aware of the RSS action installed by AF, AF only fixup
      the actions of the rules installed by PF/VF with corresponding
      default RSS action. This worked well for rules installed by
      PF/VF for features like RX VLAN offload and DMAC filters but
      rules involving action like drop/forward to queue are also
      getting modified by AF. Hence fix it by setting the default
      RSS action only if requested by PF/VF.
      
      Fixes: 967db352 ("octeontx2-af: add support for multicast/promisc packet replication feature")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b1ae9b7
  3. 20 Feb, 2024 4 commits
    • Jakub Kicinski's avatar
      docs: netdev: update the link to the CI repo · 23f9c2c0
      Jakub Kicinski authored
      Netronome graciously transferred the original NIPA repo
      to our new netdev umbrella org. Link to that instead of
      my private fork.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240216161945.2208842-1-kuba@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      23f9c2c0
    • Kuniyuki Iwashima's avatar
      arp: Prevent overflow in arp_req_get(). · a7d60277
      Kuniyuki Iwashima authored
      syzkaller reported an overflown write in arp_req_get(). [0]
      
      When ioctl(SIOCGARP) is issued, arp_req_get() looks up an neighbour
      entry and copies neigh->ha to struct arpreq.arp_ha.sa_data.
      
      The arp_ha here is struct sockaddr, not struct sockaddr_storage, so
      the sa_data buffer is just 14 bytes.
      
      In the splat below, 2 bytes are overflown to the next int field,
      arp_flags.  We initialise the field just after the memcpy(), so it's
      not a problem.
      
      However, when dev->addr_len is greater than 22 (e.g. MAX_ADDR_LEN),
      arp_netmask is overwritten, which could be set as htonl(0xFFFFFFFFUL)
      in arp_ioctl() before calling arp_req_get().
      
      To avoid the overflow, let's limit the max length of memcpy().
      
      Note that commit b5f0de6d ("net: dev: Convert sa_data to flexible
      array in struct sockaddr") just silenced syzkaller.
      
      [0]:
      memcpy: detected field-spanning write (size 16) of single field "r->arp_ha.sa_data" at net/ipv4/arp.c:1128 (size 14)
      WARNING: CPU: 0 PID: 144638 at net/ipv4/arp.c:1128 arp_req_get+0x411/0x4a0 net/ipv4/arp.c:1128
      Modules linked in:
      CPU: 0 PID: 144638 Comm: syz-executor.4 Not tainted 6.1.74 #31
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
      RIP: 0010:arp_req_get+0x411/0x4a0 net/ipv4/arp.c:1128
      Code: fd ff ff e8 41 42 de fb b9 0e 00 00 00 4c 89 fe 48 c7 c2 20 6d ab 87 48 c7 c7 80 6d ab 87 c6 05 25 af 72 04 01 e8 5f 8d ad fb <0f> 0b e9 6c fd ff ff e8 13 42 de fb be 03 00 00 00 4c 89 e7 e8 a6
      RSP: 0018:ffffc900050b7998 EFLAGS: 00010286
      RAX: 0000000000000000 RBX: ffff88803a815000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff8641a44a RDI: 0000000000000001
      RBP: ffffc900050b7a98 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 203a7970636d656d R12: ffff888039c54000
      R13: 1ffff92000a16f37 R14: ffff88803a815084 R15: 0000000000000010
      FS:  00007f172bf306c0(0000) GS:ffff88805aa00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f172b3569f0 CR3: 0000000057f12005 CR4: 0000000000770ef0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <TASK>
       arp_ioctl+0x33f/0x4b0 net/ipv4/arp.c:1261
       inet_ioctl+0x314/0x3a0 net/ipv4/af_inet.c:981
       sock_do_ioctl+0xdf/0x260 net/socket.c:1204
       sock_ioctl+0x3ef/0x650 net/socket.c:1321
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:870 [inline]
       __se_sys_ioctl fs/ioctl.c:856 [inline]
       __x64_sys_ioctl+0x18e/0x220 fs/ioctl.c:856
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x37/0x90 arch/x86/entry/common.c:81
       entry_SYSCALL_64_after_hwframe+0x64/0xce
      RIP: 0033:0x7f172b262b8d
      Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f172bf300b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00007f172b3abf80 RCX: 00007f172b262b8d
      RDX: 0000000020000000 RSI: 0000000000008954 RDI: 0000000000000003
      RBP: 00007f172b2d3493 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000000b R14: 00007f172b3abf80 R15: 00007f172bf10000
       </TASK>
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Reported-by: default avatarBjoern Doebel <doebel@amazon.de>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240215230516.31330-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a7d60277
    • Vasiliy Kovalev's avatar
      devlink: fix possible use-after-free and memory leaks in devlink_init() · def689fc
      Vasiliy Kovalev authored
      The pernet operations structure for the subsystem must be registered
      before registering the generic netlink family.
      
      Make an unregister in case of unsuccessful registration.
      
      Fixes: 687125b5 ("devlink: split out core code")
      Signed-off-by: default avatarVasiliy Kovalev <kovalev@altlinux.org>
      Link: https://lore.kernel.org/r/20240215203400.29976-1-kovalev@altlinux.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      def689fc
    • Vasiliy Kovalev's avatar
      ipv6: sr: fix possible use-after-free and null-ptr-deref · 5559cea2
      Vasiliy Kovalev authored
      The pernet operations structure for the subsystem must be registered
      before registering the generic netlink family.
      
      Fixes: 915d7e5e ("ipv6: sr: add code base for control plane support of SR-IPv6")
      Signed-off-by: default avatarVasiliy Kovalev <kovalev@altlinux.org>
      Link: https://lore.kernel.org/r/20240215202717.29815-1-kovalev@altlinux.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5559cea2
  4. 19 Feb, 2024 3 commits
    • Kees Cook's avatar
      enic: Avoid false positive under FORTIFY_SOURCE · 40b9385d
      Kees Cook authored
      FORTIFY_SOURCE has been ignoring 0-sized destinations while the kernel
      code base has been converted to flexible arrays. In order to enforce
      the 0-sized destinations (e.g. with __counted_by), the remaining 0-sized
      destinations need to be handled. Unfortunately, struct vic_provinfo
      resists full conversion, as it contains a flexible array of flexible
      arrays, which is only possible with the 0-sized fake flexible array.
      
      Use unsafe_memcpy() to avoid future false positives under
      CONFIG_FORTIFY_SOURCE.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40b9385d
    • Shannon Nelson's avatar
      ionic: use pci_is_enabled not open code · 121e4dcb
      Shannon Nelson authored
      Since there is a utility available for this, use
      the API rather than open code.
      
      Fixes: 13943d6c ("ionic: prevent pci disable of already disabled device")
      Reviewed-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      121e4dcb
    • Hangbin Liu's avatar
      selftests: bonding: set active slave to primary eth1 specifically · cd65c48d
      Hangbin Liu authored
      In bond priority testing, we set the primary interface to eth1 and add
      eth0,1,2 to bond in serial. This is OK in normal times. But when in
      debug kernel, the bridge port that eth0,1,2 connected would start
      slowly (enter blocking, forwarding state), which caused the primary
      interface down for a while after enslaving and active slave changed.
      Here is a test log from Jakub's debug test[1].
      
       [  400.399070][   T50] br0: port 1(s0) entered disabled state
       [  400.400168][   T50] br0: port 4(s2) entered disabled state
       [  400.941504][ T2791] bond0: (slave eth0): making interface the new active one
       [  400.942603][ T2791] bond0: (slave eth0): Enslaving as an active interface with an up link
       [  400.943633][ T2766] br0: port 1(s0) entered blocking state
       [  400.944119][ T2766] br0: port 1(s0) entered forwarding state
       [  401.128792][ T2792] bond0: (slave eth1): making interface the new active one
       [  401.130771][ T2792] bond0: (slave eth1): Enslaving as an active interface with an up link
       [  401.131643][   T69] br0: port 2(s1) entered blocking state
       [  401.132067][   T69] br0: port 2(s1) entered forwarding state
       [  401.346201][ T2793] bond0: (slave eth2): Enslaving as a backup interface with an up link
       [  401.348414][   T50] br0: port 4(s2) entered blocking state
       [  401.348857][   T50] br0: port 4(s2) entered forwarding state
       [  401.519669][  T250] bond0: (slave eth0): link status definitely down, disabling slave
       [  401.526522][  T250] bond0: (slave eth1): link status definitely down, disabling slave
       [  401.526986][  T250] bond0: (slave eth2): making interface the new active one
       [  401.629470][  T250] bond0: (slave eth0): link status definitely up
       [  401.630089][  T250] bond0: (slave eth1): link status definitely up
       [...]
       # TEST: prio (active-backup ns_ip6_target primary_reselect 1)         [FAIL]
       # Current active slave is eth2 but not eth1
      
      Fix it by setting active slave to primary slave specifically before
      testing.
      
      [1] https://netdev-3.bots.linux.dev/vmksft-bonding-dbg/results/464301/1-bond-options-sh/stdout
      
      Fixes: 481b56e0 ("selftests: bonding: re-format bond option tests")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd65c48d
  5. 18 Feb, 2024 7 commits
    • David S. Miller's avatar
      Merge branch 'bcmasp-fixes' · ee710bbc
      David S. Miller authored
      Justin Chen says:
      
      ====================
      net: bcmasp: bug fixes for bcmasp
      
      Fix two bugs.
      
      - Indicate that PM is managed by mac to prevent double pm calls. This
        doesn't lead to a crash, but waste a noticable amount of time
        suspending/resuming.
      
      - Sanity check for OOB write was off by one. Leading to a false error
        when using the full array.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee710bbc
    • Justin Chen's avatar
      net: bcmasp: Sanity check is off by one · f120e62e
      Justin Chen authored
      A sanity check for OOB write is off by one leading to a false positive
      when the array is full.
      
      Fixes: 9b90aca9 ("net: ethernet: bcmasp: fix possible OOB write in bcmasp_netfilt_get_all_active()")
      Signed-off-by: default avatarJustin Chen <justin.chen@broadcom.com>
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f120e62e
    • Florian Fainelli's avatar
      net: bcmasp: Indicate MAC is in charge of PHY PM · 5b76d928
      Florian Fainelli authored
      Avoid the PHY library call unnecessarily into the suspend/resume
      functions by setting phydev->mac_managed_pm to true. The ASP driver
      essentially does exactly what mdio_bus_phy_resume() does.
      
      Fixes: 490cb412 ("net: bcmasp: Add support for ASP2.0 Ethernet controller")
      Signed-off-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Signed-off-by: default avatarJustin Chen <justin.chen@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b76d928
    • David S. Miller's avatar
      Merge branch 'mptcp-fixes' · 398b7c37
      David S. Miller authored
      Matthieu Baerts says:
      
      ====================
      mptcp: misc. fixes for v6.8
      
      This series includes 4 types of fixes:
      
      Patches 1 and 2 force the path-managers not to allocate a new address
      entry when dealing with the "special" ID 0, reserved to the address of
      the initial subflow. These patches can be backported up to v5.19 and
      v5.12 respectively.
      
      Patch 3 to 6 fix the in-kernel path-manager not to create duplicated
      subflows. Patch 6 is the main fix, but patches 3 to 5 are some kind of
      pre-requisities: they fix some data races that could also lead to the
      creation of unexpected subflows. These patches can be backported up to
      v5.7, v5.10, v6.0, and v5.15 respectively.
      
      Note that patch 3 modifies the existing ULP API. No better solutions
      have been found for -net, and there is some similar prior art, see
      commit 0df48c26 ("tcp: add tcpi_bytes_acked to tcp_info"). Please
      also note that TLS ULP Diag has likely the same issue.
      
      Patches 7 to 9 fix issues in the selftests, when executing them on older
      kernels, e.g. when testing the last version of these kselftests on the
      v5.15.148 kernel as it is done by LKFT when validating stable kernels.
      These patches only avoid printing expected errors the console and
      marking some tests as "OK" while they have been skipped. Patches 7 and 8
      can be backported up to v6.6.
      
      Patches 10 to 13 make sure all MPTCP selftests subtests have a unique
      name. It is important to have a unique (sub)test name in TAP, because
      that's the test identifier. Some CI environments might drop tests with
      duplicated names. Patches 10 to 12 can be backported up to v6.6.
      ====================
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      398b7c37
    • Matthieu Baerts (NGI0)'s avatar
      selftests: mptcp: diag: unique 'cestab' subtest names · 4103d848
      Matthieu Baerts (NGI0) authored
      It is important to have a unique (sub)test name in TAP, because some CI
      environments drop tests with duplicated name.
      
      Some 'cestab' subtests from the diag selftest had the same names, e.g.:
      
          ....chk 0 cestab
      
      Now the previous value is taken, to have different names, e.g.:
      
          ....chk 2->0 cestab after flush
      
      While at it, the 'after flush' info is added, similar to what is done
      with the 'in use' subtests. Also inspired by these 'in use' subtests,
      'many' is displayed instead of a large number:
      
          many msk socket present                           [  ok  ]
          ....chk many msk in use                           [  ok  ]
          ....chk many cestab                               [  ok  ]
          ....chk many->0 msk in use after flush            [  ok  ]
          ....chk many->0 cestab after flush                [  ok  ]
      
      Fixes: 81ab7728 ("selftests: mptcp: diag: check CURRESTAB counters")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarGeliang Tang <geliang@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4103d848
    • Matthieu Baerts (NGI0)'s avatar
      selftests: mptcp: diag: unique 'in use' subtest names · 645c1dc9
      Matthieu Baerts (NGI0) authored
      It is important to have a unique (sub)test name in TAP, because some CI
      environments drop tests with duplicated name.
      
      Some 'in use' subtests from the diag selftest had the same names, e.g.:
      
          chk 0 msk in use after flush
      
      Now the previous value is taken, to have different names, e.g.:
      
          chk 2->0 msk in use after flush
      
      While at it, avoid repeating the full message, declare it once in the
      helper.
      
      Fixes: ce990257 ("selftests: mptcp: diag: format subtests results in TAP")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarGeliang Tang <geliang@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      645c1dc9
    • Matthieu Baerts (NGI0)'s avatar
      selftests: mptcp: userspace_pm: unique subtest names · 2ef0d804
      Matthieu Baerts (NGI0) authored
      It is important to have a unique (sub)test name in TAP, because some CI
      environments drop tests with duplicated names.
      
      Some subtests from the userspace_pm selftest had the same names. That's
      because different subflows are created (and deleted) between the same
      pair of IP addresses.
      
      Simply adding the destination port in the name is then enough to have
      different names, because the destination port is always different.
      
      Note that adding such info takes a bit more space, so we need to
      increase a bit the width to print the name, simply to keep all the
      '[ OK ]' aligned as before.
      
      Fixes: f589234e ("selftests: mptcp: userspace_pm: format subtests results in TAP")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarGeliang Tang <geliang@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ef0d804