1. 15 Aug, 2016 9 commits
    • Vitaly Kuznetsov's avatar
      hv_netvsc: protect module refcount by checking net_device_ctx->vf_netdev · 0f20d795
      Vitaly Kuznetsov authored
      We're not guaranteed to see NETDEV_REGISTER/NETDEV_UNREGISTER notifications
      only once per VF but we increase/decrease module refcount unconditionally.
      Check vf_netdev to make sure we don't take/release it twice. We presume
      that only one VF per netvsc device may exist.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Acked-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f20d795
    • Vitaly Kuznetsov's avatar
      hv_netvsc: reset vf_inject on VF removal · 57c1826b
      Vitaly Kuznetsov authored
      We reset vf_inject on VF going down (netvsc_vf_down()) but we don't on
      VF removal (netvsc_unregister_vf()) so vf_inject stays 'true' while
      vf_netdev is already NULL and we're trying to inject packets into NULL
      net device in netvsc_recv_callback() causing kernel to crash.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Acked-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57c1826b
    • Vitaly Kuznetsov's avatar
      hv_netvsc: avoid deadlocks between rtnl lock and vf_use_cnt wait · d072218f
      Vitaly Kuznetsov authored
      Here is a deadlock scenario:
      - netvsc_vf_up() schedules netvsc_notify_peers() work and quits.
      - netvsc_vf_down() runs before netvsc_notify_peers() gets executed. As it
        is being executed from netdev notifier chain we hold rtnl lock when we
        get here.
      - we enter while (atomic_read(&net_device_ctx->vf_use_cnt) != 0) loop and
        wait till netvsc_notify_peers() drops vf_use_cnt.
      - netvsc_notify_peers() starts on some other CPU but netdev_notify_peers()
        will hang on rtnl_lock().
      - deadlock!
      
      Instead of introducing additional synchronization I suggest we drop
      gwrk.dwrk completely and call NETDEV_NOTIFY_PEERS directly. As we're
      acting under rtnl lock this is legitimate.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Acked-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d072218f
    • Vitaly Kuznetsov's avatar
      hv_netvsc: don't lose VF information · f9a7da91
      Vitaly Kuznetsov authored
      struct netvsc_device is not suitable for storing VF information as this
      structure is being destroyed on MTU change / set channel operation (see
      rndis_filter_device_remove()). Move all VF related stuff to struct
      net_device_context which is persistent.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Acked-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9a7da91
    • Simon Horman's avatar
      gre: set inner_protocol on xmit · 3d7b3320
      Simon Horman authored
      Ensure that the inner_protocol is set on transmit so that GSO segmentation,
      which relies on that field, works correctly.
      
      This is achieved by setting the inner_protocol in gre_build_header rather
      than each caller of that function. It ensures that the inner_protocol is
      set when gre_fb_xmit() is used to transmit GRE which was not previously the
      case.
      
      I have observed this is not the case when OvS transmits GRE using
      lwtunnel metadata (which it always does).
      
      Fixes: 38720352 ("gre: Use inner_proto to obtain inner header protocol")
      Cc: Pravin Shelar <pshelar@ovn.org>
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d7b3320
    • Lorenzo Colitti's avatar
      net: ipv6: Fix ping to link-local addresses. · 5e457896
      Lorenzo Colitti authored
      ping_v6_sendmsg does not set flowi6_oif in response to
      sin6_scope_id or sk_bound_dev_if, so it is not possible to use
      these APIs to ping an IPv6 address on a different interface.
      Instead, it sets flowi6_iif, which is incorrect but harmless.
      
      Stop setting flowi6_iif, and support various ways of setting oif
      in the same priority order used by udpv6_sendmsg.
      
      Tested: https://android-review.googlesource.com/#/c/254470/Signed-off-by: default avatarLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e457896
    • Vegard Nossum's avatar
      rhashtable: fix shift by 64 when shrinking · 12311959
      Vegard Nossum authored
      I got this:
      
          ================================================================================
          UBSAN: Undefined behaviour in ./include/linux/log2.h:63:13
          shift exponent 64 is too large for 64-bit type 'long unsigned int'
          CPU: 1 PID: 721 Comm: kworker/1:1 Not tainted 4.8.0-rc1+ #87
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
          Workqueue: events rht_deferred_worker
           0000000000000000 ffff88011661f8d8 ffffffff82344f50 0000000041b58ab3
           ffffffff84f98000 ffffffff82344ea4 ffff88011661f900 ffff88011661f8b0
           0000000000000001 ffff88011661f6b8 dffffc0000000000 ffffffff867f7640
          Call Trace:
           [<ffffffff82344f50>] dump_stack+0xac/0xfc
           [<ffffffff82344ea4>] ? _atomic_dec_and_lock+0xc4/0xc4
           [<ffffffff8242f5b8>] ubsan_epilogue+0xd/0x8a
           [<ffffffff82430c41>] __ubsan_handle_shift_out_of_bounds+0x255/0x29a
           [<ffffffff824309ec>] ? __ubsan_handle_out_of_bounds+0x180/0x180
           [<ffffffff84003436>] ? nl80211_req_set_reg+0x256/0x2f0
           [<ffffffff812112ba>] ? print_context_stack+0x8a/0x160
           [<ffffffff81200031>] ? amd_pmu_reset+0x341/0x380
           [<ffffffff823af808>] rht_deferred_worker+0x1618/0x1790
           [<ffffffff823af808>] ? rht_deferred_worker+0x1618/0x1790
           [<ffffffff823ae1f0>] ? rhashtable_jhash2+0x370/0x370
           [<ffffffff8134c12d>] ? process_one_work+0x6fd/0x1970
           [<ffffffff8134c1cf>] process_one_work+0x79f/0x1970
           [<ffffffff8134c12d>] ? process_one_work+0x6fd/0x1970
           [<ffffffff8134ba30>] ? try_to_grab_pending+0x4c0/0x4c0
           [<ffffffff8134d564>] ? worker_thread+0x1c4/0x1340
           [<ffffffff8134d8ff>] worker_thread+0x55f/0x1340
           [<ffffffff845e904f>] ? __schedule+0x4df/0x1d40
           [<ffffffff8134d3a0>] ? process_one_work+0x1970/0x1970
           [<ffffffff8134d3a0>] ? process_one_work+0x1970/0x1970
           [<ffffffff813642f7>] kthread+0x237/0x390
           [<ffffffff813640c0>] ? __kthread_parkme+0x280/0x280
           [<ffffffff845f8c93>] ? _raw_spin_unlock_irq+0x33/0x50
           [<ffffffff845f95df>] ret_from_fork+0x1f/0x40
           [<ffffffff813640c0>] ? __kthread_parkme+0x280/0x280
          ================================================================================
      
      roundup_pow_of_two() is undefined when called with an argument of 0, so
      let's avoid the call and just fall back to ht->p.min_size (which should
      never be smaller than HASH_MIN_SIZE).
      
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      12311959
    • Vincent's avatar
      mlxsw: spectrum_router: Fix use after free · eb8fc323
      Vincent authored
      In mlxsw_sp_router_fib4_add_info_destroy(), the fib_entry pointer is used
      after it has been freed by mlxsw_sp_fib_entry_destroy(). Use a temporary
      variable to fix this.
      
      Fixes: 61c503f9 ("mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops")
      Signed-off-by: default avatarVincent Stehlé <vincent.stehle@laposte.net>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb8fc323
    • Florian Westphal's avatar
      rhashtable: avoid large lock-array allocations · 4cf0b354
      Florian Westphal authored
      Sander reports following splat after netfilter nat bysrc table got
      converted to rhashtable:
      
      swapper/0: page allocation failure: order:3, mode:0x2084020(GFP_ATOMIC|__GFP_COMP)
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc1 [..]
       [<ffffffff811633ed>] warn_alloc_failed+0xdd/0x140
       [<ffffffff811638b1>] __alloc_pages_nodemask+0x3e1/0xcf0
       [<ffffffff811a72ed>] alloc_pages_current+0x8d/0x110
       [<ffffffff8117cb7f>] kmalloc_order+0x1f/0x70
       [<ffffffff811aec19>] __kmalloc+0x129/0x140
       [<ffffffff8146d561>] bucket_table_alloc+0xc1/0x1d0
       [<ffffffff8146da1d>] rhashtable_insert_rehash+0x5d/0xe0
       [<ffffffff819fcfff>] nf_nat_setup_info+0x2ef/0x400
      
      The failure happens when allocating the spinlock array.
      Even with GFP_KERNEL its unlikely for such a large allocation
      to succeed.
      
      Thomas Graf pointed me at inet_ehash_locks_alloc(), so in addition
      to adding NOWARN for atomic allocations this also makes the bucket-array
      sizing more conservative.
      
      In commit 095dc8e0 ("tcp: fix/cleanup inet_ehash_locks_alloc()"),
      Eric Dumazet says: "Budget 2 cache lines per cpu worth of 'spinlocks'".
      IOW, consider size needed by a single spinlock when determining
      number of locks per cpu.  So with 64 byte per cacheline and 4 byte per
      spinlock this gives 32 locks per cpu.
      
      Resulting size of the lock-array (sizeof(spinlock) == 4):
      
      cpus:    1   2   4   8   16   32   64
      old:    1k  1k  4k  8k  16k  16k  16k
      new:   128 256 512  1k   2k   4k   8k
      
      8k allocation should have decent chance of success even
      with GFP_ATOMIC, and should not fail with GFP_KERNEL.
      
      With 72-byte spinlock (LOCKDEP):
      cpus :   1   2
      old:    9k 18k
      new:   ~2k ~4k
      Reported-by: default avatarSander Eikelenboom <linux@eikelenboom.it>
      Suggested-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4cf0b354
  2. 13 Aug, 2016 12 commits
    • Sabrina Dubroca's avatar
      net: remove type_check from dev_get_nest_level() · 952fcfd0
      Sabrina Dubroca authored
      The idea for type_check in dev_get_nest_level() was to count the number
      of nested devices of the same type (currently, only macvlan or vlan
      devices).
      This prevented the false positive lockdep warning on configurations such
      as:
      
      eth0 <--- macvlan0 <--- vlan0 <--- macvlan1
      
      However, this doesn't prevent a warning on a configuration such as:
      
      eth0 <--- macvlan0 <--- vlan0
      eth1 <--- vlan1 <--- macvlan1
      
      In this case, all the locks end up with a nesting subclass of 1, so
      lockdep thinks that there is still a deadlock:
      
      - in the first case we have (macvlan_netdev_addr_lock_key, 1) and then
        take (vlan_netdev_xmit_lock_key, 1)
      - in the second case, we have (vlan_netdev_xmit_lock_key, 1) and then
        take (macvlan_netdev_addr_lock_key, 1)
      
      By removing the linktype check in dev_get_nest_level() and always
      incrementing the nesting depth, lockdep considers this configuration
      valid.
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      952fcfd0
    • Sabrina Dubroca's avatar
      macsec: fix lockdep splats when nesting devices · e2003872
      Sabrina Dubroca authored
      Currently, trying to setup a vlan over a macsec device, or other
      combinations of devices, triggers a lockdep warning.
      
      Use netdev_lockdep_set_classes and ndo_get_lock_subclass, similar to
      what macvlan does.
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2003872
    • Mike Manning's avatar
      net: ipv6: Do not keep IPv6 addresses when IPv6 is disabled · bc561632
      Mike Manning authored
      If IPv6 is disabled when the option is set to keep IPv6
      addresses on link down, userspace is unaware of this as
      there is no such indication via netlink. The solution is to
      remove the IPv6 addresses in this case, which results in
      netlink messages indicating removal of addresses in the
      usual manner. This fix also makes the behavior consistent
      with the case of having IPv6 disabled first, which stops
      IPv6 addresses from being added.
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Signed-off-by: default avatarMike Manning <mmanning@brocade.com>
      Acked-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc561632
    • Vegard Nossum's avatar
      net/sctp: always initialise sctp_ht_iter::start_fail · 54236ab0
      Vegard Nossum authored
      sctp_transport_seq_start() does not currently clear iter->start_fail on
      success, but relies on it being zero when it is allocated (by
      seq_open_net()).
      
      This can be a problem in the following sequence:
      
          open() // allocates iter (and implicitly sets iter->start_fail = 0)
          read()
           - iter->start() // fails and sets iter->start_fail = 1
           - iter->stop() // doesn't call sctp_transport_walk_stop() (correct)
          read() again
           - iter->start() // succeeds, but doesn't change iter->start_fail
           - iter->stop() // doesn't call sctp_transport_walk_stop() (wrong)
      
      We should initialize sctp_ht_iter::start_fail to zero if ->start()
      succeeds, otherwise it's possible that we leave an old value of 1 there,
      which will cause ->stop() to not call sctp_transport_walk_stop(), which
      causes all sorts of problems like not calling rcu_read_unlock() (and
      preempt_enable()), eventually leading to more warnings like this:
      
          BUG: sleeping function called from invalid context at mm/slab.h:388
          in_atomic(): 0, irqs_disabled(): 0, pid: 16551, name: trinity-c2
          Preemption disabled at:[<ffffffff819bceb6>] rhashtable_walk_start+0x46/0x150
      
           [<ffffffff81149abb>] preempt_count_add+0x1fb/0x280
           [<ffffffff83295892>] _raw_spin_lock+0x12/0x40
           [<ffffffff819bceb6>] rhashtable_walk_start+0x46/0x150
           [<ffffffff82ec665f>] sctp_transport_walk_start+0x2f/0x60
           [<ffffffff82edda1d>] sctp_transport_seq_start+0x4d/0x150
           [<ffffffff81439e50>] traverse+0x170/0x850
           [<ffffffff8143aeec>] seq_read+0x7cc/0x1180
           [<ffffffff814f996c>] proc_reg_read+0xbc/0x180
           [<ffffffff813d0384>] do_loop_readv_writev+0x134/0x210
           [<ffffffff813d2a95>] do_readv_writev+0x565/0x660
           [<ffffffff813d6857>] vfs_readv+0x67/0xa0
           [<ffffffff813d6c16>] do_preadv+0x126/0x170
           [<ffffffff813d710c>] SyS_preadv+0xc/0x10
           [<ffffffff8100334c>] do_syscall_64+0x19c/0x410
           [<ffffffff83296225>] return_from_SYSCALL_64+0x0/0x6a
           [<ffffffffffffffff>] 0xffffffffffffffff
      
      Notice that this is a subtly different stacktrace from the one in commit
      5fc382d8 ("net/sctp: terminate rhashtable walk correctly").
      
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Acked-By: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54236ab0
    • Vegard Nossum's avatar
      net/irda: handle iriap_register_lsap() allocation failure · 5ba092ef
      Vegard Nossum authored
      If iriap_register_lsap() fails to allocate memory, self->lsap is
      set to NULL. However, none of the callers handle the failure and
      irlmp_connect_request() will happily dereference it:
      
          iriap_register_lsap: Unable to allocated LSAP!
          ================================================================================
          UBSAN: Undefined behaviour in net/irda/irlmp.c:378:2
          member access within null pointer of type 'struct lsap_cb'
          CPU: 1 PID: 15403 Comm: trinity-c0 Not tainted 4.8.0-rc1+ #81
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org
          04/01/2014
           0000000000000000 ffff88010c7e78a8 ffffffff82344f40 0000000041b58ab3
           ffffffff84f98000 ffffffff82344e94 ffff88010c7e78d0 ffff88010c7e7880
           ffff88010630ad00 ffffffff84a5fae0 ffffffff84d3f5c0 000000000000017a
          Call Trace:
           [<ffffffff82344f40>] dump_stack+0xac/0xfc
           [<ffffffff8242f5a8>] ubsan_epilogue+0xd/0x8a
           [<ffffffff824302bf>] __ubsan_handle_type_mismatch+0x157/0x411
           [<ffffffff83b7bdbc>] irlmp_connect_request+0x7ac/0x970
           [<ffffffff83b77cc0>] iriap_connect_request+0xa0/0x160
           [<ffffffff83b77f48>] state_s_disconnect+0x88/0xd0
           [<ffffffff83b78904>] iriap_do_client_event+0x94/0x120
           [<ffffffff83b77710>] iriap_getvaluebyclass_request+0x3e0/0x6d0
           [<ffffffff83ba6ebb>] irda_find_lsap_sel+0x1eb/0x630
           [<ffffffff83ba90c8>] irda_connect+0x828/0x12d0
           [<ffffffff833c0dfb>] SYSC_connect+0x22b/0x340
           [<ffffffff833c7e09>] SyS_connect+0x9/0x10
           [<ffffffff81007bd3>] do_syscall_64+0x1b3/0x4b0
           [<ffffffff845f946a>] entry_SYSCALL64_slow_path+0x25/0x25
          ================================================================================
      
      The bug seems to have been around since forever.
      
      There's more problems with missing error checks in iriap_init() (and
      indeed all of irda_init()), but that's a bigger problem that needs
      very careful review and testing. This patch will fix the most serious
      bug (as it's easily reached from unprivileged userspace).
      
      I have tested my patch with a reproducer.
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ba092ef
    • Johannes Berg's avatar
      ipv6: suppress sparse warnings in IP6_ECN_set_ce() · c15c0ab1
      Johannes Berg authored
      Pass the correct type __wsum to csum_sub() and csum_add(). This doesn't
      really change anything since __wsum really *is* __be32, but removes the
      address space warnings from sparse.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Fixes: 34ae6a1a ("ipv6: update skb->csum when CE mark is propagated")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c15c0ab1
    • Daniel Borkmann's avatar
      bpf: fix write helpers with regards to non-linear parts · 0ed661d5
      Daniel Borkmann authored
      Fix the bpf_try_make_writable() helper and all call sites we have in BPF,
      it's currently defect with regards to skbs when the write_len spans into
      non-linear parts, no matter if cloned or not.
      
      There are multiple issues at once. First, using skb_store_bits() is not
      correct since even if we have a cloned skb, page frags can still be shared.
      To really make them private, we need to pull them in via __pskb_pull_tail()
      first, which also gets us a private head via pskb_expand_head() implicitly.
      
      This is for helpers like bpf_skb_store_bytes(), bpf_l3_csum_replace(),
      bpf_l4_csum_replace(). Really, the only thing reasonable and working here
      is to call skb_ensure_writable() before any write operation. Meaning, via
      pskb_may_pull() it makes sure that parts we want to access are pulled in and
      if not does so plus unclones the skb implicitly. If our write_len still fits
      the headlen and we're cloned and our header of the clone is not writable,
      then we need to make a private copy via pskb_expand_head(). skb_store_bits()
      is a bit misleading and only safe to store into non-linear data in different
      contexts such as 357b40a1 ("[IPV6]: IPV6_CHECKSUM socket option can
      corrupt kernel memory").
      
      For above BPF helper functions, it means after fixed bpf_try_make_writable(),
      we've pulled in enough, so that we operate always based on skb->data. Thus,
      the call to skb_header_pointer() and skb_store_bits() becomes superfluous.
      In bpf_skb_store_bytes(), the len check is unnecessary too since it can
      only pass in maximum of BPF stack size, so adding offset is guaranteed to
      never overflow. Also bpf_l3/4_csum_replace() helpers must test for proper
      offset alignment since they use __sum16 pointer for writing resulting csum.
      
      The remaining helpers that change skb data not discussed here yet are
      bpf_skb_vlan_push(), bpf_skb_vlan_pop() and bpf_skb_change_proto(). The
      vlan helpers internally call either skb_ensure_writable() (pop case) and
      skb_cow_head() (push case, for head expansion), respectively. Similarly,
      bpf_skb_proto_xlat() takes care to not mangle page frags.
      
      Fixes: 608cd71a ("tc: bpf: generalize pedit action")
      Fixes: 91bc4822 ("tc: bpf: add checksum helpers")
      Fixes: 3697649f ("bpf: try harder on clones when writing into skb")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ed661d5
    • sean.wang@mediatek.com's avatar
      net: ethernet: mediatek: add the missing of_node_put() after node is used done · e8c2993a
      sean.wang@mediatek.com authored
      This patch adds the missing of_node_put() after finishing the usage
      of of_parse_phandle() or of_node_get() used by fixed_phy.
      Signed-off-by: default avatarSean Wang <sean.wang@mediatek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e8c2993a
    • sean.wang@mediatek.com's avatar
      net: ethernet: mediatek: fixed that initializing u64_stats_sync is missing · d7005652
      sean.wang@mediatek.com authored
      To fix runtime warning with lockdep is enabled due that u64_stats_sync
      is not initialized well, so add it.
      Signed-off-by: default avatarSean Wang <sean.wang@mediatek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7005652
    • Colin Ian King's avatar
      calipso: fix resource leak on calipso_genopt failure · b4c0e0c6
      Colin Ian King authored
      Currently, if calipso_genopt fails then the error exit path
      does not free the ipv6_opt_hdr new causing a memory leak. Fix
      this by kfree'ing new on the error exit path.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4c0e0c6
    • Daniel Borkmann's avatar
      bpf: fix bpf_skb_in_cgroup helper naming · 747ea55e
      Daniel Borkmann authored
      While hashing out BPF's current_task_under_cgroup helper bits, it came
      to discussion that the skb_in_cgroup helper name was suboptimally chosen.
      
      Tejun says:
      
        So, I think in_cgroup should mean that the object is in that
        particular cgroup while under_cgroup in the subhierarchy of that
        cgroup. Let's rename the other subhierarchy test to under too. I
        think that'd be a lot less confusing going forward.
      
        [...]
      
        It's more intuitive and gives us the room to implement the real
        "in" test if ever necessary in the future.
      
      Since this touches uapi bits, we need to change this as long as v4.8
      is not yet officially released. Thus, change the helper enum and rename
      related bits.
      
      Fixes: 4a482f34 ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
      Reference: http://patchwork.ozlabs.org/patch/658500/Suggested-by: default avatarSargun Dhillon <sargun@sargun.me>
      Suggested-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      747ea55e
    • Arnd Bergmann's avatar
      dsa: mv88e6xxx: hide unused functions · 601bbae0
      Arnd Bergmann authored
      When CONFIG_NET_DSA_HWMON is disabled, we get warnings about two unused
      functions whose only callers are all inside of an #ifdef:
      
      drivers/net/dsa/mv88e6xxx.c:3257:12: 'mv88e6xxx_mdio_page_write' defined but not used [-Werror=unused-function]
      drivers/net/dsa/mv88e6xxx.c:3244:12: 'mv88e6xxx_mdio_page_read' defined but not used [-Werror=unused-function]
      
      This adds another ifdef around the function definitions. The warnings
      appeared after the functions were marked 'static', but the problem
      was already there before that.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 57d32310 ("net: dsa: mv88e6xxx: fix style issues")
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      601bbae0
  3. 11 Aug, 2016 5 commits
  4. 10 Aug, 2016 3 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 293fddff
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for your net tree,
      they are:
      
      1) Use mod_timer_pending() to avoid reactivating a dead expectation in
         the h323 conntrack helper, from Liping Zhang.
      
      2) Oneliner to fix a type in the register name defined in the nf_tables
         header.
      
      3) Don't try to look further when we find an inactive elements with no
         descendants in the rbtree set implementation, otherwise we crash.
      
      4) Handle valid zero CSeq in the SIP conntrack helper, from
         Christophe Leroy.
      
      5) Don't display a trailing slash in conntrack helper with no classes
         via /proc/net/nf_conntrack_expect, from Liping Zhang.
      
      6) Fix an expectation leak during creation from the nfqueue path, again
         from Liping Zhang.
      
      7) Validate netlink port ID in verdict message from nfqueue, otherwise
         an injection can be possible. Again from Zhang.
      
      8) Reject conntrack tuples with different transport protocol on
         original and reply tuples, also from Zhang.
      
      9) Validate offset and length in nft_exthdr, make sure they are under
         sizeof(u8), from Laura Garcia Liebana.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      293fddff
    • Laura Garcia Liebana's avatar
      netfilter: nft_exthdr: Add size check on u8 nft_exthdr attributes · 4da449ae
      Laura Garcia Liebana authored
      Fix the direct assignment of offset and length attributes included in
      nft_exthdr structure from u32 data to u8.
      Signed-off-by: default avatarLaura Garcia Liebana <nevola@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      4da449ae
    • Toshiaki Makita's avatar
      bridge: Fix problems around fdb entries pointing to the bridge device · 7bb90c37
      Toshiaki Makita authored
      Adding fdb entries pointing to the bridge device uses fdb_insert(),
      which lacks various checks and does not respect added_by_user flag.
      
      As a result, some inconsistent behavior can happen:
      * Adding temporary entries succeeds but results in permanent entries.
      * Same goes for "dynamic" and "use".
      * Changing mac address of the bridge device causes deletion of
        user-added entries.
      * Replacing existing entries looks successful from userspace but actually
        not, regardless of NLM_F_EXCL flag.
      
      Use the same logic as other entries and fix them.
      
      Fixes: 3741873b ("bridge: allow adding of fdb entries pointing to the bridge device")
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7bb90c37
  5. 09 Aug, 2016 11 commits
    • Wenyou Yang's avatar
      net: phy: micrel: Add specific suspend · 836384d2
      Wenyou Yang authored
      Disable all interrupts when suspend, they will be enabled
      when resume. Otherwise, the suspend/resume process will be
      blocked occasionally.
      Signed-off-by: default avatarWenyou Yang <wenyou.yang@atmel.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      836384d2
    • Sylwester Nawrocki's avatar
      dm9000: Fix irq trigger type setup on non-dt platforms · a96d3b75
      Sylwester Nawrocki authored
      Commit b5a099c6 "net: ethernet: davicom: fix devicetree irq
      resource" causes an interrupt storm after the ethernet interface
      is activated on S3C24XX platform (ARM non-dt), due to the interrupt
      trigger type not being set properly.
      
      It seems, after adding parsing of IRQ flags in commit 7085a740
      "drivers: platform: parse IRQ flags from resources", there is no path
      for non-dt platforms where irq_set_type callback could be invoked when
      we don't pass the trigger type flags to the request_irq() call.
      
      In case of a board where the regression is seen the interrupt trigger
      type flags are passed through a platform device's resource and it is
      not currently handled properly without passing the irq trigger type
      flags to the request_irq() call.  In case of OF an of_irq_get() call
      within platform_get_irq() function seems to be ensuring required irq_chip
      setup, but there is no equivalent code for non OF/ACPI platforms.
      
      This patch mostly restores irq trigger type setting code which has been
      removed in commit ("net: ethernet: davicom: fix devicetree irq resource").
      
      Fixes: b5a099c6 ("net: ethernet: davicom: fix devicetree irq resource")
      Signed-off-by: default avatarSylwester Nawrocki <s.nawrocki@samsung.com>
      Acked-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a96d3b75
    • Zhu Yanjun's avatar
      bonding: fix the typo · 0d039f33
      Zhu Yanjun authored
      The message "803.ad" should be "802.3ad".
      Signed-off-by: default avatarZhu Yanjun <zyjzyj2000@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d039f33
    • Grygorii Strashko's avatar
      drivers: net: cpsw: fix kmemleak false-positive reports for sk buffers · 254a49d5
      Grygorii Strashko authored
      Kmemleak reports following false positive memory leaks for each sk
      buffers allocated by CPSW (__netdev_alloc_skb_ip_align()) in
      cpsw_ndo_open() and cpsw_rx_handler():
      
      unreferenced object 0xea915000 (size 2048):
        comm "systemd-network", pid 713, jiffies 4294938323 (age 102.180s)
        hex dump (first 32 bytes):
          00 58 91 ea ff ff ff ff ff ff ff ff ff ff ff ff  .X..............
          ff ff ff ff ff ff fd 0f 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<c0108680>] __kmalloc_track_caller+0x1a4/0x230
          [<c0529eb4>] __alloc_skb+0x68/0x16c
          [<c052c884>] __netdev_alloc_skb+0x40/0x104
          [<bf1ad29c>] cpsw_ndo_open+0x374/0x670 [ti_cpsw]
          [<c053c3d4>] __dev_open+0xb0/0x114
          [<c053c690>] __dev_change_flags+0x9c/0x14c
          [<c053c760>] dev_change_flags+0x20/0x50
          [<c054bdcc>] do_setlink+0x2cc/0x78c
          [<c054c358>] rtnl_setlink+0xcc/0x100
          [<c054b34c>] rtnetlink_rcv_msg+0x184/0x224
          [<c056467c>] netlink_rcv_skb+0xa8/0xc4
          [<c054b1c0>] rtnetlink_rcv+0x2c/0x34
          [<c0564018>] netlink_unicast+0x16c/0x1f8
          [<c0564498>] netlink_sendmsg+0x334/0x348
          [<c052015c>] sock_sendmsg+0x1c/0x2c
          [<c05213e0>] SyS_sendto+0xc0/0xe8
      
      unreferenced object 0xec861780 (size 192):
        comm "softirq", pid 0, jiffies 4294938759 (age 109.540s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 b0 5a ed 00 00 00 00 00 00 00 00  ......Z.........
        backtrace:
          [<c0107830>] kmem_cache_alloc+0x190/0x208
          [<c052c768>] __build_skb+0x30/0x98
          [<c052c8fc>] __netdev_alloc_skb+0xb8/0x104
          [<bf1abc54>] cpsw_rx_handler+0x68/0x1e4 [ti_cpsw]
          [<bf11aa30>] __cpdma_chan_free+0xa8/0xc4 [davinci_cpdma]
          [<bf11ab98>] __cpdma_chan_process+0x14c/0x16c [davinci_cpdma]
          [<bf11abfc>] cpdma_chan_process+0x44/0x5c [davinci_cpdma]
          [<bf1adc78>] cpsw_rx_poll+0x1c/0x9c [ti_cpsw]
          [<c0539180>] net_rx_action+0x1f0/0x2ec
          [<c003881c>] __do_softirq+0x134/0x258
          [<c0038a00>] do_softirq+0x68/0x70
          [<c0038adc>] __local_bh_enable_ip+0xd4/0xe8
          [<c0640994>] _raw_spin_unlock_bh+0x30/0x34
          [<c05f4e9c>] igmp6_group_added+0x4c/0x1bc
          [<c05f6600>] ipv6_dev_mc_inc+0x398/0x434
          [<c05dba74>] addrconf_dad_work+0x224/0x39c
      
      This happens because CPSW allocates SK buffers and then passes
      pointers on them in CPDMA where they stored in internal CPPI RAM
      (SRAM) which belongs to DEV MMIO space. Kmemleak does not scan IO
      memory and so reports memory leaks.
      
      Hence, mark allocated sk buffers as false positive explicitly.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      254a49d5
    • Lance Richardson's avatar
      vti: flush x-netns xfrm cache when vti interface is removed · a5d0dc81
      Lance Richardson authored
      When executing the script included below, the netns delete operation
      hangs with the following message (repeated at 10 second intervals):
      
        kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      This occurs because a reference to the lo interface in the "secure" netns
      is still held by a dst entry in the xfrm bundle cache in the init netns.
      
      Address this problem by garbage collecting the tunnel netns flow cache
      when a cross-namespace vti interface receives a NETDEV_DOWN notification.
      
      A more detailed description of the problem scenario (referencing commands
      in the script below):
      
      (1) ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1
      
        The vti_test interface is created in the init namespace. vti_tunnel_init()
        attaches a struct ip_tunnel to the vti interface's netdev_priv(dev),
        setting the tunnel net to &init_net.
      
      (2) ip link set vti_test netns secure
      
        The vti_test interface is moved to the "secure" netns. Note that
        the associated struct ip_tunnel still has tunnel->net set to &init_net.
      
      (3) ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1
      
        The first packet sent using the vti device causes xfrm_lookup() to be
        called as follows:
      
            dst = xfrm_lookup(tunnel->net, skb_dst(skb), fl, NULL, 0);
      
        Note that tunnel->net is the init namespace, while skb_dst(skb) references
        the vti_test interface in the "secure" namespace. The returned dst
        references an interface in the init namespace.
      
        Also note that the first parameter to xfrm_lookup() determines which flow
        cache is used to store the computed xfrm bundle, so after xfrm_lookup()
        returns there will be a cached bundle in the init namespace flow cache
        with a dst referencing a device in the "secure" namespace.
      
      (4) ip netns del secure
      
        Kernel begins to delete the "secure" namespace.  At some point the
        vti_test interface is deleted, at which point dst_ifdown() changes
        the dst->dev in the cached xfrm bundle flow from vti_test to lo (still
        in the "secure" namespace however).
        Since nothing has happened to cause the init namespace's flow cache
        to be garbage collected, this dst remains attached to the flow cache,
        so the kernel loops waiting for the last reference to lo to go away.
      
      <Begin script>
      ip link add br1 type bridge
      ip link set dev br1 up
      ip addr add dev br1 1.1.1.1/8
      
      ip netns add secure
      ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1
      ip link set vti_test netns secure
      ip netns exec secure ip link set vti_test up
      ip netns exec secure ip link s lo up
      ip netns exec secure ip addr add dev lo 192.168.100.1/24
      ip netns exec secure ip route add 192.168.200.0/24 dev vti_test
      ip xfrm policy flush
      ip xfrm state flush
      ip xfrm policy add dir out tmpl src 1.1.1.1 dst 1.1.1.2 \
         proto esp mode tunnel mark 1
      ip xfrm policy add dir in tmpl src 1.1.1.2 dst 1.1.1.1 \
         proto esp mode tunnel mark 1
      ip xfrm state add src 1.1.1.1 dst 1.1.1.2 proto esp spi 1 \
         mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788
      ip xfrm state add src 1.1.1.2 dst 1.1.1.1 proto esp spi 1 \
         mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788
      
      ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1
      
      ip netns del secure
      <End script>
      Reported-by: default avatarHangbin Liu <haliu@redhat.com>
      Reported-by: default avatarJan Tluka <jtluka@redhat.com>
      Signed-off-by: default avatarLance Richardson <lrichard@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5d0dc81
    • David S. Miller's avatar
      Merge tag 'rxrpc-fixes-20160809' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · aee4eda1
      David S. Miller authored
      David Howells says:
      
      ====================
      rxrpc: Miscellaneous fixes
      
      Here are a bunch of miscellaneous fixes to AF_RXRPC:
      
       (*) Fix an uninitialised pointer.
      
       (*) Fix error handling when we fail to connect a call.
      
       (*) Fix a NULL pointer dereference.
      
       (*) Fix two occasions where a packet is accessed again after being queued
           for someone else to deal with.
      
       (*) Fix a missing skb free.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aee4eda1
    • David Howells's avatar
      rxrpc: Free packets discarded in data_ready · 992c273a
      David Howells authored
      Under certain conditions, the data_ready handler will discard a packet.
      These need to be freed.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      992c273a
    • David Howells's avatar
      rxrpc: Fix a use-after-push in data_ready handler · 50fd85a1
      David Howells authored
      Fix a use of a packet after it has been enqueued onto the packet processing
      queue in the data_ready handler.  Once on a call's Rx queue, we mustn't
      touch it any more as it may be dequeued and freed by the call processor
      running on a work queue.
      
      Save the values we need before enqueuing.
      
      Without this, we can get an oops like the following:
      
      BUG: unable to handle kernel NULL pointer dereference at 000000000000009c
      IP: [<ffffffffa01854e8>] rxrpc_fast_process_packet+0x724/0xa11 [af_rxrpc]
      PGD 0 
      Oops: 0000 [#1] SMP
      Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc]
      CPU: 2 PID: 0 Comm: swapper/2 Tainted: G            E   4.7.0-fsdevel+ #1336
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      task: ffff88040d6863c0 task.stack: ffff88040d68c000
      RIP: 0010:[<ffffffffa01854e8>]  [<ffffffffa01854e8>] rxrpc_fast_process_packet+0x724/0xa11 [af_rxrpc]
      RSP: 0018:ffff88041fb03a78  EFLAGS: 00010246
      RAX: ffffffffffffffff RBX: ffff8803ff195b00 RCX: 0000000000000001
      RDX: ffffffffa01854d1 RSI: 0000000000000008 RDI: ffff8803ff195b00
      RBP: ffff88041fb03ab0 R08: 0000000000000000 R09: 0000000000000001
      R10: ffff88041fb038c8 R11: 0000000000000000 R12: ffff880406874800
      R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff88041fb00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000000009c CR3: 0000000001c14000 CR4: 00000000001406e0
      Stack:
       ffff8803ff195ea0 ffff880408348800 ffff880406874800 ffff8803ff195b00
       ffff880408348800 ffff8803ff195ed8 0000000000000000 ffff88041fb03af0
       ffffffffa0186072 0000000000000000 ffff8804054da000 0000000000000000
      Call Trace:
       <IRQ> 
       [<ffffffffa0186072>] rxrpc_data_ready+0x89d/0xbae [af_rxrpc]
       [<ffffffff814c94d7>] __sock_queue_rcv_skb+0x24c/0x2b2
       [<ffffffff8155c59a>] __udp_queue_rcv_skb+0x4b/0x1bd
       [<ffffffff8155e048>] udp_queue_rcv_skb+0x281/0x4db
       [<ffffffff8155ea8f>] __udp4_lib_rcv+0x7ed/0x963
       [<ffffffff8155ef9a>] udp_rcv+0x15/0x17
       [<ffffffff81531d86>] ip_local_deliver_finish+0x1c3/0x318
       [<ffffffff81532544>] ip_local_deliver+0xbb/0xc4
       [<ffffffff81531bc3>] ? inet_del_offload+0x40/0x40
       [<ffffffff815322a9>] ip_rcv_finish+0x3ce/0x42c
       [<ffffffff81532851>] ip_rcv+0x304/0x33d
       [<ffffffff81531edb>] ? ip_local_deliver_finish+0x318/0x318
       [<ffffffff814dff9d>] __netif_receive_skb_core+0x601/0x6e8
       [<ffffffff814e072e>] __netif_receive_skb+0x13/0x54
       [<ffffffff814e082a>] netif_receive_skb_internal+0xbb/0x17c
       [<ffffffff814e1838>] napi_gro_receive+0xf9/0x1bd
       [<ffffffff8144eb9f>] rtl8169_poll+0x32b/0x4a8
       [<ffffffff814e1c7b>] net_rx_action+0xe8/0x357
       [<ffffffff81051074>] __do_softirq+0x1aa/0x414
       [<ffffffff810514ab>] irq_exit+0x3d/0xb0
       [<ffffffff810184a2>] do_IRQ+0xe4/0xfc
       [<ffffffff81612053>] common_interrupt+0x93/0x93
       <EOI> 
       [<ffffffff814af837>] ? cpuidle_enter_state+0x1ad/0x2be
       [<ffffffff814af832>] ? cpuidle_enter_state+0x1a8/0x2be
       [<ffffffff814af96a>] cpuidle_enter+0x12/0x14
       [<ffffffff8108956f>] call_cpuidle+0x39/0x3b
       [<ffffffff81089855>] cpu_startup_entry+0x230/0x35d
       [<ffffffff810312ea>] start_secondary+0xf4/0xf7
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      50fd85a1
    • David Howells's avatar
      rxrpc: Once packet posted in data_ready, don't retry posting · 2e7e9758
      David Howells authored
      Once a packet has been posted to a connection in the data_ready handler, we
      mustn't try reposting if we then find that the connection is dying as the
      refcount has been given over to the dying connection and the packet might
      no longer exist.
      
      Losing the packet isn't a problem as the peer will retransmit.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      2e7e9758
    • David Howells's avatar
      rxrpc: Don't access connection from call if pointer is NULL · f9dc5757
      David Howells authored
      The call state machine processor sets up the message parameters for a UDP
      message that it might need to transmit in advance on the basis that there's
      a very good chance it's going to have to transmit either an ACK or an
      ABORT.  This requires it to look in the connection struct to retrieve some
      of the parameters.
      
      However, if the call is complete, the call connection pointer may be NULL
      to dissuade the processor from transmitting a message.  However, there are
      some situations where the processor is still going to be called - and it's
      still going to set up message parameters whether it needs them or not.
      
      This results in a NULL pointer dereference at:
      
      	net/rxrpc/call_event.c:837
      
      To fix this, skip the message pre-initialisation if there's no connection
      attached.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f9dc5757
    • David Howells's avatar
      rxrpc: Need to flag call as being released on connect failure · 17b963e3
      David Howells authored
      If rxrpc_new_client_call() fails to make a connection, the call record that
      it allocated needs to be marked as RXRPC_CALL_RELEASED before it is passed
      to rxrpc_put_call() to indicate that it no longer has any attachment to the
      AF_RXRPC socket.
      
      Without this, an assertion failure may occur at:
      
      	net/rxrpc/call_object:635
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      17b963e3