1. 15 Jul, 2021 33 commits
  2. 14 Jul, 2021 7 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 8096acd7
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski.
       "Including fixes from bpf and netfilter.
      
        Current release - regressions:
      
         - sock: fix parameter order in sock_setsockopt()
      
        Current release - new code bugs:
      
         - netfilter: nft_last:
             - fix incorrect arithmetic when restoring last used
             - honor NFTA_LAST_SET on restoration
      
        Previous releases - regressions:
      
         - udp: properly flush normal packet at GRO time
      
         - sfc: ensure correct number of XDP queues; don't allow enabling the
           feature if there isn't sufficient resources to Tx from any CPU
      
         - dsa: sja1105: fix address learning getting disabled on the CPU port
      
         - mptcp: addresses a rmem accounting issue that could keep packets in
           subflow receive buffers longer than necessary, delaying MPTCP-level
           ACKs
      
         - ip_tunnel: fix mtu calculation for ETHER tunnel devices
      
         - do not reuse skbs allocated from skbuff_fclone_cache in the napi
           skb cache, we'd try to return them to the wrong slab cache
      
         - tcp: consistently disable header prediction for mptcp
      
        Previous releases - always broken:
      
         - bpf: fix subprog poke descriptor tracking use-after-free
      
         - ipv6:
             - allocate enough headroom in ip6_finish_output2() in case
               iptables TEE is used
             - tcp: drop silly ICMPv6 packet too big messages to avoid
               expensive and pointless lookups (which may serve as a DDOS
               vector)
             - make sure fwmark is copied in SYNACK packets
             - fix 'disable_policy' for forwarded packets (align with IPv4)
      
         - netfilter: conntrack:
             - do not renew entry stuck in tcp SYN_SENT state
             - do not mark RST in the reply direction coming after SYN packet
               for an out-of-sync entry
      
         - mptcp: cleanly handle error conditions with MP_JOIN and syncookies
      
         - mptcp: fix double free when rejecting a join due to port mismatch
      
         - validate lwtstate->data before returning from skb_tunnel_info()
      
         - tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
      
         - mt76: mt7921: continue to probe driver when fw already downloaded
      
         - bonding: fix multiple issues with offloading IPsec to (thru?) bond
      
         - stmmac: ptp: fix issues around Qbv support and setting time back
      
         - bcmgenet: always clear wake-up based on energy detection
      
        Misc:
      
         - sctp: move 198 addresses from unusable to private scope
      
         - ptp: support virtual clocks and timestamping
      
         - openvswitch: optimize operation for key comparison"
      
      * tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits)
        net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave()
        sfc: add logs explaining XDP_TX/REDIRECT is not available
        sfc: ensure correct number of XDP queues
        sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
        net: fddi: fix UAF in fza_probe
        net: dsa: sja1105: fix address learning getting disabled on the CPU port
        net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload
        net: Use nlmsg_unicast() instead of netlink_unicast()
        octeontx2-pf: Fix uninitialized boolean variable pps
        ipv6: allocate enough headroom in ip6_finish_output2()
        net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific
        net: bridge: multicast: fix MRD advertisement router port marking race
        net: bridge: multicast: fix PIM hello router port marking race
        net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340
        dsa: fix for_each_child.cocci warnings
        virtio_net: check virtqueue_add_sgs() return value
        mptcp: properly account bulk freed memory
        selftests: mptcp: fix case multiple subflows limited by server
        mptcp: avoid processing packet if a subflow reset
        mptcp: fix syncookie process if mptcp can not_accept new subflow
        ...
      8096acd7
    • Christian Brauner's avatar
      fs: add vfs_parse_fs_param_source() helper · d1d488d8
      Christian Brauner authored
      Add a simple helper that filesystems can use in their parameter parser
      to parse the "source" parameter. A few places open-coded this function
      and that already caused a bug in the cgroup v1 parser that we fixed.
      Let's make it harder to get this wrong by introducing a helper which
      performs all necessary checks.
      
      Link: https://syzkaller.appspot.com/bug?id=6312526aba5beae046fdae8f00399f87aab48b12
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d1d488d8
    • Christian Brauner's avatar
      cgroup: verify that source is a string · 3b046272
      Christian Brauner authored
      The following sequence can be used to trigger a UAF:
      
          int fscontext_fd = fsopen("cgroup");
          int fd_null = open("/dev/null, O_RDONLY);
          int fsconfig(fscontext_fd, FSCONFIG_SET_FD, "source", fd_null);
          close_range(3, ~0U, 0);
      
      The cgroup v1 specific fs parser expects a string for the "source"
      parameter.  However, it is perfectly legitimate to e.g.  specify a file
      descriptor for the "source" parameter.  The fs parser doesn't know what
      a filesystem allows there.  So it's a bug to assume that "source" is
      always of type fs_value_is_string when it can reasonably also be
      fs_value_is_file.
      
      This assumption in the cgroup code causes a UAF because struct
      fs_parameter uses a union for the actual value.  Access to that union is
      guarded by the param->type member.  Since the cgroup paramter parser
      didn't check param->type but unconditionally moved param->string into
      fc->source a close on the fscontext_fd would trigger a UAF during
      put_fs_context() which frees fc->source thereby freeing the file stashed
      in param->file causing a UAF during a close of the fd_null.
      
      Fix this by verifying that param->type is actually a string and report
      an error if not.
      
      In follow up patches I'll add a new generic helper that can be used here
      and by other filesystems instead of this error-prone copy-pasta fix.
      But fixing it in here first makes backporting a it to stable a lot
      easier.
      
      Fixes: 8d2451f4 ("cgroup1: switch to option-by-option parsing")
      Reported-by: syzbot+283ce5a46486d6acdbaf@syzkaller.appspotmail.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: <stable@kernel.org>
      Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3b046272
    • Like Xu's avatar
      KVM: x86/pmu: Clear anythread deprecated bit when 0xa leaf is unsupported on the SVM · 7234c362
      Like Xu authored
      The AMD platform does not support the functions Ah CPUID leaf. The returned
      results for this entry should all remain zero just like the native does:
      
      AMD host:
         0x0000000a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
      (uncanny) AMD guest:
         0x0000000a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00008000
      
      Fixes: cadbaa03 ("perf/x86/intel: Make anythread filter support conditional")
      Signed-off-by: default avatarLike Xu <likexu@tencent.com>
      Message-Id: <20210628074354.33848-1-likexu@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7234c362
    • Kefeng Wang's avatar
      KVM: mmio: Fix use-after-free Read in kvm_vm_ioctl_unregister_coalesced_mmio · 23fa2e46
      Kefeng Wang authored
      BUG: KASAN: use-after-free in kvm_vm_ioctl_unregister_coalesced_mmio+0x7c/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:183
      Read of size 8 at addr ffff0000c03a2500 by task syz-executor083/4269
      
      CPU: 5 PID: 4269 Comm: syz-executor083 Not tainted 5.10.0 #7
      Hardware name: linux,dummy-virt (DT)
      Call trace:
       dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132
       show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x110/0x164 lib/dump_stack.c:118
       print_address_description+0x78/0x5c8 mm/kasan/report.c:385
       __kasan_report mm/kasan/report.c:545 [inline]
       kasan_report+0x148/0x1e4 mm/kasan/report.c:562
       check_memory_region_inline mm/kasan/generic.c:183 [inline]
       __asan_load8+0xb4/0xbc mm/kasan/generic.c:252
       kvm_vm_ioctl_unregister_coalesced_mmio+0x7c/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:183
       kvm_vm_ioctl+0xe30/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3755
       vfs_ioctl fs/ioctl.c:48 [inline]
       __do_sys_ioctl fs/ioctl.c:753 [inline]
       __se_sys_ioctl fs/ioctl.c:739 [inline]
       __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739
       __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
       el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
       do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220
       el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
       el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
       el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
      
      Allocated by task 4269:
       stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
       kasan_save_stack mm/kasan/common.c:48 [inline]
       kasan_set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461
       kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475
       kmem_cache_alloc_trace include/linux/slab.h:450 [inline]
       kmalloc include/linux/slab.h:552 [inline]
       kzalloc include/linux/slab.h:664 [inline]
       kvm_vm_ioctl_register_coalesced_mmio+0x78/0x1cc arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:146
       kvm_vm_ioctl+0x7e8/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3746
       vfs_ioctl fs/ioctl.c:48 [inline]
       __do_sys_ioctl fs/ioctl.c:753 [inline]
       __se_sys_ioctl fs/ioctl.c:739 [inline]
       __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739
       __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
       el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
       do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220
       el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
       el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
       el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
      
      Freed by task 4269:
       stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
       kasan_save_stack mm/kasan/common.c:48 [inline]
       kasan_set_track+0x38/0x6c mm/kasan/common.c:56
       kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355
       __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422
       kasan_slab_free+0x10/0x1c mm/kasan/common.c:431
       slab_free_hook mm/slub.c:1544 [inline]
       slab_free_freelist_hook mm/slub.c:1577 [inline]
       slab_free mm/slub.c:3142 [inline]
       kfree+0x104/0x38c mm/slub.c:4124
       coalesced_mmio_destructor+0x94/0xa4 arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:102
       kvm_iodevice_destructor include/kvm/iodev.h:61 [inline]
       kvm_io_bus_unregister_dev+0x248/0x280 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:4374
       kvm_vm_ioctl_unregister_coalesced_mmio+0x158/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:186
       kvm_vm_ioctl+0xe30/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3755
       vfs_ioctl fs/ioctl.c:48 [inline]
       __do_sys_ioctl fs/ioctl.c:753 [inline]
       __se_sys_ioctl fs/ioctl.c:739 [inline]
       __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739
       __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
       el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
       do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220
       el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
       el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
       el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
      
      If kvm_io_bus_unregister_dev() return -ENOMEM, we already call kvm_iodevice_destructor()
      inside this function to delete 'struct kvm_coalesced_mmio_dev *dev' from list
      and free the dev, but kvm_iodevice_destructor() is called again, it will lead
      the above issue.
      
      Let's check the the return value of kvm_io_bus_unregister_dev(), only call
      kvm_iodevice_destructor() if the return value is 0.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: kvm@vger.kernel.org
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Message-Id: <20210626070304.143456-1-wangkefeng.wang@huawei.com>
      Cc: stable@vger.kernel.org
      Fixes: 5d3c4c79 ("KVM: Stop looking for coalesced MMIO zones if the bus is destroyed", 2021-04-20)
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      23fa2e46
    • Sean Christopherson's avatar
      KVM: SVM: Revert clearing of C-bit on GPA in #NPF handler · 76ff371b
      Sean Christopherson authored
      Don't clear the C-bit in the #NPF handler, as it is a legal GPA bit for
      non-SEV guests, and for SEV guests the C-bit is dropped before the GPA
      hits the NPT in hardware.  Clearing the bit for non-SEV guests causes KVM
      to mishandle #NPFs with that collide with the host's C-bit.
      
      Although the APM doesn't explicitly state that the C-bit is not reserved
      for non-SEV, Tom Lendacky confirmed that the following snippet about the
      effective reduction due to the C-bit does indeed apply only to SEV guests.
      
        Note that because guest physical addresses are always translated
        through the nested page tables, the size of the guest physical address
        space is not impacted by any physical address space reduction indicated
        in CPUID 8000_001F[EBX]. If the C-bit is a physical address bit however,
        the guest physical address space is effectively reduced by 1 bit.
      
      And for SEV guests, the APM clearly states that the bit is dropped before
      walking the nested page tables.
      
        If the C-bit is an address bit, this bit is masked from the guest
        physical address when it is translated through the nested page tables.
        Consequently, the hypervisor does not need to be aware of which pages
        the guest has chosen to mark private.
      
      Note, the bogus C-bit clearing was removed from legacy #PF handler in
      commit 6d1b867d ("KVM: SVM: Don't strip the C-bit from CR2 on #PF
      interception").
      
      Fixes: 0ede79e1 ("KVM: SVM: Clear C-bit from the page fault address")
      Cc: Peter Gonda <pgonda@google.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210625020354.431829-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      76ff371b
    • Sean Christopherson's avatar
      KVM: x86/mmu: Do not apply HPA (memory encryption) mask to GPAs · fc9bf2e0
      Sean Christopherson authored
      Ignore "dynamic" host adjustments to the physical address mask when
      generating the masks for guest PTEs, i.e. the guest PA masks.  The host
      physical address space and guest physical address space are two different
      beasts, e.g. even though SEV's C-bit is the same bit location for both
      host and guest, disabling SME in the host (which clears shadow_me_mask)
      does not affect the guest PTE->GPA "translation".
      
      For non-SEV guests, not dropping bits is the correct behavior.  Assuming
      KVM and userspace correctly enumerate/configure guest MAXPHYADDR, bits
      that are lost as collateral damage from memory encryption are treated as
      reserved bits, i.e. KVM will never get to the point where it attempts to
      generate a gfn using the affected bits.  And if userspace wants to create
      a bogus vCPU, then userspace gets to deal with the fallout of hardware
      doing odd things with bad GPAs.
      
      For SEV guests, not dropping the C-bit is technically wrong, but it's a
      moot point because KVM can't read SEV guest's page tables in any case
      since they're always encrypted.  Not to mention that the current KVM code
      is also broken since sme_me_mask does not have to be non-zero for SEV to
      be supported by KVM.  The proper fix would be to teach all of KVM to
      correctly handle guest private memory, but that's a task for the future.
      
      Fixes: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
      Cc: stable@vger.kernel.org
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210623230552.4027702-5-seanjc@google.com>
      [Use a new header instead of adding header guards to paging_tmpl.h. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fc9bf2e0