1. 25 Jan, 2018 14 commits
    • Kirill Tkhai's avatar
      net: Move net:netns_ids destruction out of rtnl_lock() and document locking scheme · fb07a820
      Kirill Tkhai authored
      Currently, we unhash a dying net from netns_ids lists
      under rtnl_lock(). It's a leftover from the time when
      net::netns_ids was introduced. There was no net::nsid_lock,
      and rtnl_lock() was mostly need to order modification
      of alive nets nsid idr, i.e. for:
      	for_each_net(tmp) {
      		...
      		id = __peernet2id(tmp, net);
      		idr_remove(&tmp->netns_ids, id);
      		...
      	}
      
      Since we have net::nsid_lock, the modifications are
      protected by this local lock, and now we may introduce
      better scheme of netns_ids destruction.
      
      Let's look at the functions peernet2id_alloc() and
      get_net_ns_by_id(). Previous commits taught these
      functions to work well with dying net acquired from
      rtnl unlocked lists. And they are the only functions
      which can hash a net to netns_ids or obtain from there.
      And as easy to check, other netns_ids operating functions
      works with id, not with net pointers. So, we do not
      need rtnl_lock to synchronize cleanup_net() with all them.
      
      The another property, which is used in the patch,
      is that net is unhashed from net_namespace_list
      in the only place and by the only process. So,
      we avoid excess rcu_read_lock() or rtnl_lock(),
      when we'are iterating over the list in unhash_nsid().
      
      All the above makes possible to keep rtnl_lock() locked
      only for net->list deletion, and completely avoid it
      for netns_ids unhashing and destruction. As these two
      doings may take long time (e.g., memory allocation
      to send skb), the patch should positively act on
      the scalability and signify decrease the time, which
      rtnl_lock() is held in cleanup_net().
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb07a820
    • David S. Miller's avatar
    • David S. Miller's avatar
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 5b7d2796
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Avoid negative netdev refcount in error flow of xfrm state add, from
          Aviad Yehezkel.
      
       2) Fix tcpdump decoding of IPSEC decap'd frames by filling in the
          ethernet header protocol field in xfrm{4,6}_mode_tunnel_input().
          From Yossi Kuperman.
      
       3) Fix a syzbot triggered skb_under_panic in pppoe having to do with
          failing to allocate an appropriate amount of headroom. From
          Guillaume Nault.
      
       4) Fix memory leak in vmxnet3 driver, from Neil Horman.
      
       5) Cure out-of-bounds packet memory access in em_nbyte EMATCH module,
          from Wolfgang Bumiller.
      
       6) Restrict what kinds of sockets can be bound to the KCM multiplexer
          and also disallow when another layer has attached to the socket and
          made use of sk_user_data. From Tom Herbert.
      
       7) Fix use before init of IOTLB in vhost code, from Jason Wang.
      
       8) Correct STACR register write bit definition in IBM emac driver, from
          Ivan Mikhaylov.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net/ibm/emac: wrong bit is used for STA control register write
        net/ibm/emac: add 8192 rx/tx fifo size
        vhost: do not try to access device IOTLB when not initialized
        vhost: use mutex_lock_nested() in vhost_dev_lock_vqs()
        i40e: flower: check if TC offload is enabled on a netdev
        qed: Free reserved MR tid
        qed: Remove reserveration of dpi for kernel
        kcm: Check if sk_user_data already set in kcm_attach
        kcm: Only allow TCP sockets to be attached to a KCM mux
        net: sched: fix TCF_LAYER_LINK case in tcf_get_base_ptr
        net: sched: em_nbyte: don't add the data offset twice
        mlxsw: spectrum_router: Don't log an error on missing neighbor
        vmxnet3: repair memory leak
        ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL
        pppoe: take ->needed_headroom of lower device into account on xmit
        xfrm: fix boolean assignment in xfrm_get_type_offload
        xfrm: Fix eth_hdr(skb)->h_proto to reflect inner IP version
        xfrm: fix error flow in case of add state fails
        xfrm: Add SA to hardware at the end of xfrm_state_construct()
      5b7d2796
    • Al Viro's avatar
      kill kernel_sock_ioctl() · 5c59e564
      Al Viro authored
      no users since 2014
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      5c59e564
    • Al Viro's avatar
      dev_ioctl(): move copyin/copyout to callers · 44c02a2c
      Al Viro authored
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      44c02a2c
    • Al Viro's avatar
      ipconfig: use dev_set_mtu() · 6a88fbe7
      Al Viro authored
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6a88fbe7
    • Al Viro's avatar
      lift handling of SIOCIW... out of dev_ioctl() · b1b0c245
      Al Viro authored
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b1b0c245
    • Al Viro's avatar
      kill dev_ifname32() · 4cf808e7
      Al Viro authored
      same story...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      4cf808e7
    • Al Viro's avatar
      kill bond_ioctl() · f92d4fc9
      Al Viro authored
      Same story as with dev_ifsioc(), except that the last cases with non-trivial
      conversions had been taken out in 2013...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      f92d4fc9
    • Al Viro's avatar
      kill dev_ifsioc() · bf440573
      Al Viro authored
      Once upon a time net/socket.c:dev_ifsioc() used to handle SIOCSHWTSTAMP and
      SIOCSIFMAP.  These have different native and compat layout, so the format
      conversion had been needed.  In 2009 these two cases had been taken out,
      turning the rest into a convoluted way to calling sock_do_ioctl().  We copy
      compat structure into native one, call sock_do_ioctl() on that and copy
      the result back for the in/out ioctls.  No layout transformation anywhere,
      so we might as well just call sock_do_ioctl() and skip all the headache with
      copying.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      bf440573
    • Al Viro's avatar
      ip_rt_ioctl(): take copyin to caller · ca25c300
      Al Viro authored
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      ca25c300
    • Al Viro's avatar
      03aef17b
    • Al Viro's avatar
      net: separate SIOCGIFCONF handling from dev_ioctl() · 36fd633e
      Al Viro authored
      Only two of dev_ioctl() callers may pass SIOCGIFCONF to it.
      Separating that codepath from the rest of dev_ioctl() allows both
      to simplify dev_ioctl() itself (all other cases work with struct ifreq *)
      *and* seriously simplify the compat side of that beast: all it takes
      is passing to inet_gifconf() an extra argument - the size of individual
      records (sizeof(struct ifreq) or sizeof(struct compat_ifreq)).  With
      dev_ifconf() called directly from sock_do_ioctl()/compat_dev_ifconf()
      that's easy to arrange.
      
      As the result, compat side of SIOCGIFCONF doesn't need any
      allocations, copy_in_user() back and forth, etc.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      36fd633e
  2. 24 Jan, 2018 26 commits