1. 17 Apr, 2021 1 commit
    • Florian Westphal's avatar
      netlink: don't call ->netlink_bind with table lock held · f2764bd4
      Florian Westphal authored
      When I added support to allow generic netlink multicast groups to be
      restricted to subscribers with CAP_NET_ADMIN I was unaware that a
      genl_bind implementation already existed in the past.
      
      It was reverted due to ABBA deadlock:
      
      1. ->netlink_bind gets called with the table lock held.
      2. genetlink bind callback is invoked, it grabs the genl lock.
      
      But when a new genl subsystem is (un)registered, these two locks are
      taken in reverse order.
      
      One solution would be to revert again and add a comment in genl
      referring 1e82a62f, "genetlink: remove genl_bind").
      
      This would need a second change in mptcp to not expose the raw token
      value anymore, e.g.  by hashing the token with a secret key so userspace
      can still associate subflow events with the correct mptcp connection.
      
      However, Paolo Abeni reminded me to double-check why the netlink table is
      locked in the first place.
      
      I can't find one.  netlink_bind() is already called without this lock
      when userspace joins a group via NETLINK_ADD_MEMBERSHIP setsockopt.
      Same holds for the netlink_unbind operation.
      
      Digging through the history, commit f7736080
      ("netlink: access nlk groups safely in netlink bind and getname")
      expanded the lock scope.
      
      commit 3a20773b ("net: netlink: cap max groups which will be considered in netlink_bind()")
      ... removed the nlk->ngroups access that the lock scope
      extension was all about.
      
      Reduce the lock scope again and always call ->netlink_bind without
      the table lock.
      
      The Fixes tag should be vs. the patch mentioned in the link below,
      but that one got squash-merged into the patch that came earlier in the
      series.
      
      Fixes: 4d54cc32 ("mptcp: avoid lock_fast usage in accept path")
      Link: https://lore.kernel.org/mptcp/20210213000001.379332-8-mathew.j.martineau@linux.intel.com/T/#u
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Johannes Berg <johannes.berg@intel.com>
      Cc: Sean Tranchetti <stranche@codeaurora.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2764bd4
  2. 16 Apr, 2021 11 commits
  3. 15 Apr, 2021 7 commits
  4. 14 Apr, 2021 15 commits
  5. 13 Apr, 2021 6 commits
    • Michael Brown's avatar
      xen-netback: Check for hotplug-status existence before watching · 2afeec08
      Michael Brown authored
      The logic in connect() is currently written with the assumption that
      xenbus_watch_pathfmt() will return an error for a node that does not
      exist.  This assumption is incorrect: xenstore does allow a watch to
      be registered for a nonexistent node (and will send notifications
      should the node be subsequently created).
      
      As of commit 1f256578 ("xen-netback: remove 'hotplug-status' once it
      has served its purpose"), this leads to a failure when a domU
      transitions into XenbusStateConnected more than once.  On the first
      domU transition into Connected state, the "hotplug-status" node will
      be deleted by the hotplug_status_changed() callback in dom0.  On the
      second or subsequent domU transition into Connected state, the
      hotplug_status_changed() callback will therefore never be invoked, and
      so the backend will remain stuck in InitWait.
      
      This failure prevents scenarios such as reloading the xen-netfront
      module within a domU, or booting a domU via iPXE.  There is
      unfortunately no way for the domU to work around this dom0 bug.
      
      Fix by explicitly checking for existence of the "hotplug-status" node,
      thereby creating the behaviour that was previously assumed to exist.
      Signed-off-by: default avatarMichael Brown <mbrown@fensystems.co.uk>
      Reviewed-by: default avatarPaul Durrant <paul@xen.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2afeec08
    • Eric Dumazet's avatar
      gro: ensure frag0 meets IP header alignment · 38ec4944
      Eric Dumazet authored
      After commit 0f6925b3 ("virtio_net: Do not pull payload in skb->head")
      Guenter Roeck reported one failure in his tests using sh architecture.
      
      After much debugging, we have been able to spot silent unaligned accesses
      in inet_gro_receive()
      
      The issue at hand is that upper networking stacks assume their header
      is word-aligned. Low level drivers are supposed to reserve NET_IP_ALIGN
      bytes before the Ethernet header to make that happen.
      
      This patch hardens skb_gro_reset_offset() to not allow frag0 fast-path
      if the fragment is not properly aligned.
      
      Some arches like x86, arm64 and powerpc do not care and define NET_IP_ALIGN
      as 0, this extra check will be a NOP for them.
      
      Note that if frag0 is not used, GRO will call pskb_may_pull()
      as many times as needed to pull network and transport headers.
      
      Fixes: 0f6925b3 ("virtio_net: Do not pull payload in skb->head")
      Fixes: 78a478d0 ("gro: Inline skb_gro_header and cache frag0 virtual address")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38ec4944
    • Or Cohen's avatar
      net/sctp: fix race condition in sctp_destroy_sock · b166a20b
      Or Cohen authored
      If sctp_destroy_sock is called without sock_net(sk)->sctp.addr_wq_lock
      held and sp->do_auto_asconf is true, then an element is removed
      from the auto_asconf_splist without any proper locking.
      
      This can happen in the following functions:
      1. In sctp_accept, if sctp_sock_migrate fails.
      2. In inet_create or inet6_create, if there is a bpf program
         attached to BPF_CGROUP_INET_SOCK_CREATE which denies
         creation of the sctp socket.
      
      The bug is fixed by acquiring addr_wq_lock in sctp_destroy_sock
      instead of sctp_close.
      
      This addresses CVE-2021-23133.
      Reported-by: default avatarOr Cohen <orcohen@paloaltonetworks.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Fixes: 61023658 ("bpf: Add new cgroup attach type to enable sock modifications")
      Signed-off-by: default avatarOr Cohen <orcohen@paloaltonetworks.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b166a20b
    • Lijun Pan's avatar
      ibmvnic: correctly use dev_consume/free_skb_irq · ca09bf7b
      Lijun Pan authored
      It is more correct to use dev_kfree_skb_irq when packets are dropped,
      and to use dev_consume_skb_irq when packets are consumed.
      
      Fixes: 0d973388 ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls")
      Suggested-by: default avatarThomas Falcon <tlfalcon@linux.ibm.com>
      Signed-off-by: default avatarLijun Pan <lijunp213@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca09bf7b
    • Jonathon Reinhart's avatar
      net: Make tcp_allowed_congestion_control readonly in non-init netns · 97684f09
      Jonathon Reinhart authored
      Currently, tcp_allowed_congestion_control is global and writable;
      writing to it in any net namespace will leak into all other net
      namespaces.
      
      tcp_available_congestion_control and tcp_allowed_congestion_control are
      the only sysctls in ipv4_net_table (the per-netns sysctl table) with a
      NULL data pointer; their handlers (proc_tcp_available_congestion_control
      and proc_allowed_congestion_control) have no other way of referencing a
      struct net. Thus, they operate globally.
      
      Because ipv4_net_table does not use designated initializers, there is no
      easy way to fix up this one "bad" table entry. However, the data pointer
      updating logic shouldn't be applied to NULL pointers anyway, so we
      instead force these entries to be read-only.
      
      These sysctls used to exist in ipv4_table (init-net only), but they were
      moved to the per-net ipv4_net_table, presumably without realizing that
      tcp_allowed_congestion_control was writable and thus introduced a leak.
      
      Because the intent of that commit was only to know (i.e. read) "which
      congestion algorithms are available or allowed", this read-only solution
      should be sufficient.
      
      The logic added in recent commit
      31c4d2f1: ("net: Ensure net namespace isolation of sysctls")
      does not and cannot check for NULL data pointers, because
      other table entries (e.g. /proc/sys/net/netfilter/nf_log/) have
      .data=NULL but use other methods (.extra2) to access the struct net.
      
      Fixes: 9cb8e048 ("net/ipv4/sysctl: show tcp_{allowed, available}_congestion_control in non-initial netns")
      Signed-off-by: default avatarJonathon Reinhart <jonathon.reinhart@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97684f09
    • David S. Miller's avatar
      Merge branch 'catch-all-devices' · 61aaa1aa
      David S. Miller authored
      Hristo Venev says:
      
      ====================
      net: Fix two use-after-free bugs
      
      The two patches fix two use-after-free bugs related to cleaning up
      network namespaces, one in sit and one in ip6_tunnel. They are easy to
      trigger if the user has the ability to create network namespaces.
      
      The bugs can be used to trigger null pointer dereferences. I am not
      sure if they can be exploited further, but I would guess that they
      can. I am not sending them to the mailing list without confirmation
      that doing so would be OK.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61aaa1aa