1. 15 Nov, 2016 9 commits
    • Paolo Abeni's avatar
      udplite: fix NULL pointer dereference · c915fe13
      Paolo Abeni authored
      The commit 850cbadd ("udp: use it's own memory accounting schema")
      assumes that the socket proto has memory accounting enabled,
      but this is not the case for UDPLITE.
      Fix it enabling memory accounting for UDPLITE and performing
      fwd allocated memory reclaiming on socket shutdown.
      UDP and UDPLITE share now the same memory accounting limits.
      Also drop the backlog receive operation, since is no more needed.
      
      Fixes: 850cbadd ("udp: use it's own memory accounting schema")
      Reported-by: default avatarAndrei Vagin <avagin@gmail.com>
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c915fe13
    • David S. Miller's avatar
      Merge branch 'bpf-lru' · e6ca4f16
      David S. Miller authored
      Martin KaFai Lau says:
      
      ====================
      bpf: LRU map
      
      This patch set adds LRU map implementation to the existing BPF map
      family.
      
      The first few patches introduce the basic BPF LRU list
      implementation.
      
      The later patches introduce the LRU versions of the
      existing BPF_MAP_TYPE_LRU_[PERCPU_]HASH maps by leveraging
      the BPF LRU list.
      
      v2:
      - Added a percpu LRU list option which can be specified as
        a map attribute.
      
        [Note: percpu LRU list has nothing to do with the map's value]
      
      - Removed the cpu variable from the struct bpf_lru_locallist
        since it is not needed.
      
      - Changed the __bpf_lru_node_move_out to __bpf_lru_node_move_to_free in
        patch 1 to prepare the percpu LRU list in patch 2.
      
      - Moved the test_lru_map under selftests
      
      - Refactored a few things in the test codes
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6ca4f16
    • Martin KaFai Lau's avatar
      bpf: Add tests for the LRU bpf_htab · 5db58faf
      Martin KaFai Lau authored
      This patch has some unit tests and a test_lru_dist.
      
      The test_lru_dist reads in the numeric keys from a file.
      The files used here are generated by a modified fio-genzipf tool
      originated from the fio test suit.  The sample data file can be
      found here: https://github.com/iamkafai/bpf-lru
      
      The zipf.* data files have 100k numeric keys and the key is also
      ranged from 1 to 100k.
      
      The test_lru_dist outputs the number of unique keys (nr_unique).
      F.e. The following means, 61239 of them is unique out of 100k keys.
      nr_misses means it cannot be found in the LRU map, so nr_misses
      must be >= nr_unique. test_lru_dist also simulates a perfect LRU
      map as a comparison:
      
      [root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \
      /root/zipf.100k.a1_01.out 4000 1
      ...
      test_parallel_lru_dist (map_type:9 map_flags:0x0):
          task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31603(/100000)
          task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000)
      ....
      test_parallel_lru_dist (map_type:9 map_flags:0x2):
          task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31710(/100000)
          task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000)
      
      [root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \
      /root/zipf.100k.a0_01.out 40000 1
      ...
      test_parallel_lru_dist (map_type:9 map_flags:0x0):
          task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67054(/100000)
          task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000)
      ...
      test_parallel_lru_dist (map_type:9 map_flags:0x2):
          task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67068(/100000)
          task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000)
      
      LRU map has also been added to map_perf_test:
      /* Global LRU */
      [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
      ./map_perf_test 16 $i | awk '{r += $3}END{print r " updates"}'; done
       1 cpus: 2934082 updates
       4 cpus: 7391434 updates
       8 cpus: 6500576 updates
      
      /* Percpu LRU */
      [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
      ./map_perf_test 32 $i | awk '{r += $3}END{print r " updates"}'; done
        1 cpus: 2896553 updates
        4 cpus: 9766395 updates
        8 cpus: 17460553 updates
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5db58faf
    • Martin KaFai Lau's avatar
      bpf: Add BPF_MAP_TYPE_LRU_PERCPU_HASH · 8f844938
      Martin KaFai Lau authored
      Provide a LRU version of the existing BPF_MAP_TYPE_PERCPU_HASH
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f844938
    • Martin KaFai Lau's avatar
      bpf: Add BPF_MAP_TYPE_LRU_HASH · 29ba732a
      Martin KaFai Lau authored
      Provide a LRU version of the existing BPF_MAP_TYPE_HASH.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29ba732a
    • Martin KaFai Lau's avatar
      bpf: Refactor codes handling percpu map · fd91de7b
      Martin KaFai Lau authored
      Refactor the codes that populate the value
      of a htab_elem in a BPF_MAP_TYPE_PERCPU_HASH
      typed bpf_map.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd91de7b
    • Martin KaFai Lau's avatar
      bpf: Add percpu LRU list · 961578b6
      Martin KaFai Lau authored
      Instead of having a common LRU list, this patch allows a
      percpu LRU list which can be selected by specifying a map
      attribute.  The map attribute will be added in the later
      patch.
      
      While the common use case for LRU is #reads >> #updates,
      percpu LRU list allows bpf prog to absorb unusual #updates
      under pathological case (e.g. external traffic facing machine which
      could be under attack).
      
      Each percpu LRU is isolated from each other.  The LRU nodes (including
      free nodes) cannot be moved across different LRU Lists.
      
      Here are the update performance comparison between
      common LRU list and percpu LRU list (the test code is
      at the last patch):
      
      [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
      ./map_perf_test 16 $i | awk '{r += $3}END{print r " updates"}'; done
       1 cpus: 2934082 updates
       4 cpus: 7391434 updates
       8 cpus: 6500576 updates
      
      [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
      ./map_perf_test 32 $i | awk '{r += $3}END{printr " updates"}'; done
        1 cpus: 2896553 updates
        4 cpus: 9766395 updates
        8 cpus: 17460553 updates
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      961578b6
    • Martin KaFai Lau's avatar
      bpf: LRU List · 3a08c2fd
      Martin KaFai Lau authored
      Introduce bpf_lru_list which will provide LRU capability to
      the bpf_htab in the later patch.
      
      * General Thoughts:
      1. Target use case.  Read is more often than update.
         (i.e. bpf_lookup_elem() is more often than bpf_update_elem()).
         If bpf_prog does a bpf_lookup_elem() first and then an in-place
         update, it still counts as a read operation to the LRU list concern.
      2. It may be useful to think of it as a LRU cache
      3. Optimize the read case
         3.1 No lock in read case
         3.2 The LRU maintenance is only done during bpf_update_elem()
      4. If there is a percpu LRU list, it will lose the system-wise LRU
         property.  A completely isolated percpu LRU list has the best
         performance but the memory utilization is not ideal considering
         the work load may be imbalance.
      5. Hence, this patch starts the LRU implementation with a global LRU
         list with batched operations before accessing the global LRU list.
         As a LRU cache, #read >> #update/#insert operations, it will work well.
      6. There is a local list (for each cpu) which is named
         'struct bpf_lru_locallist'.  This local list is not used to sort
         the LRU property.  Instead, the local list is to batch enough
         operations before acquiring the lock of the global LRU list.  More
         details on this later.
      7. In the later patch, it allows a percpu LRU list by specifying a
         map-attribute for scalability reason and for use cases that need to
         prepare for the worst (and pathological) case like DoS attack.
         The percpu LRU list is completely isolated from each other and the
         LRU nodes (including free nodes) cannot be moved across the list.  The
         following description is for the global LRU list but mostly applicable
         to the percpu LRU list also.
      
      * Global LRU List:
      1. It has three sub-lists: active-list, inactive-list and free-list.
      2. The two list idea, active and inactive, is borrowed from the
         page cache.
      3. All nodes are pre-allocated and all sit at the free-list (of the
         global LRU list) at the beginning.  The pre-allocation reasoning
         is similar to the existing BPF_MAP_TYPE_HASH.  However,
         opting-out prealloc (BPF_F_NO_PREALLOC) is not supported in
         the LRU map.
      
      * Active/Inactive List (of the global LRU list):
      1. The active list, as its name says it, maintains the active set of
         the nodes.  We can think of it as the working set or more frequently
         accessed nodes.  The access frequency is approximated by a ref-bit.
         The ref-bit is set during the bpf_lookup_elem().
      2. The inactive list, as its name also says it, maintains a less
         active set of nodes.  They are the candidates to be removed
         from the bpf_htab when we are running out of free nodes.
      3. The ordering of these two lists is acting as a rough clock.
         The tail of the inactive list is the older nodes and
         should be released first if the bpf_htab needs free element.
      
      * Rotating the Active/Inactive List (of the global LRU list):
      1. It is the basic operation to maintain the LRU property of
         the global list.
      2. The active list is only rotated when the inactive list is running
         low.  This idea is similar to the current page cache.
         Inactive running low is currently defined as
         "# of inactive < # of active".
      3. The active list rotation always starts from the tail.  It moves
         node without ref-bit set to the head of the inactive list.
         It moves node with ref-bit set back to the head of the active
         list and then clears its ref-bit.
      4. The inactive rotation is pretty simply.
         It walks the inactive list and moves the nodes back to the head of
         active list if its ref-bit is set. The ref-bit is cleared after moving
         to the active list.
         If the node does not have ref-bit set, it just leave it as it is
         because it is already in the inactive list.
      
      * Shrinking the Inactive List (of the global LRU list):
      1. Shrinking is the operation to get free nodes when the bpf_htab is
         full.
      2. It usually only shrinks the inactive list to get free nodes.
      3. During shrinking, it will walk the inactive list from the tail,
         delete the nodes without ref-bit set from bpf_htab.
      4. If no free node found after step (3), it will forcefully get
         one node from the tail of inactive or active list.  Forcefully is
         in the sense that it ignores the ref-bit.
      
      * Local List:
      1. Each CPU has a 'struct bpf_lru_locallist'.  The purpose is to
         batch enough operations before acquiring the lock of the
         global LRU.
      2. A local list has two sub-lists, free-list and pending-list.
      3. During bpf_update_elem(), it will try to get from the free-list
         of (the current CPU local list).
      4. If the local free-list is empty, it will acquire from the
         global LRU list.  The global LRU list can either satisfy it
         by its global free-list or by shrinking the global inactive
         list.  Since we have acquired the global LRU list lock,
         it will try to get at most LOCAL_FREE_TARGET elements
         to the local free list.
      5. When a new element is added to the bpf_htab, it will
         first sit at the pending-list (of the local list) first.
         The pending-list will be flushed to the global LRU list
         when it needs to acquire free nodes from the global list
         next time.
      
      * Lock Consideration:
      The LRU list has a lock (lru_lock).  Each bucket of htab has a
      lock (buck_lock).  If both locks need to be acquired together,
      the lock order is always lru_lock -> buck_lock and this only
      happens in the bpf_lru_list.c logic.
      
      In hashtab.c, both locks are not acquired together (i.e. one
      lock is always released first before acquiring another lock).
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a08c2fd
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · bb598c1b
      David S. Miller authored
      Several cases of bug fixes in 'net' overlapping other changes in
      'net-next-.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb598c1b
  2. 14 Nov, 2016 31 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · e76d21c4
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix off by one wrt. indexing when dumping /proc/net/route entries,
          from Alexander Duyck.
      
       2) Fix lockdep splats in iwlwifi, from Johannes Berg.
      
       3) Cure panic when inserting certain netfilter rules when NFT_SET_HASH
          is disabled, from Liping Zhang.
      
       4) Memory leak when nft_expr_clone() fails, also from Liping Zhang.
      
       5) Disable UFO when path will apply IPSEC tranformations, from Jakub
          Sitnicki.
      
       6) Don't bogusly double cwnd in dctcp module, from Florian Westphal.
      
       7) skb_checksum_help() should never actually use the value "0" for the
          resulting checksum, that has a special meaning, use CSUM_MANGLED_0
          instead. From Eric Dumazet.
      
       8) Per-tx/rx queue statistic strings are wrong in qed driver, fix from
          Yuval MIntz.
      
       9) Fix SCTP reference counting of associations and transports in
          sctp_diag. From Xin Long.
      
      10) When we hit ip6tunnel_xmit() we could have come from an ipv4 path in
          a previous layer or similar, so explicitly clear the ipv6 control
          block in the skb. From Eli Cooper.
      
      11) Fix bogus sleeping inside of inet_wait_for_connect(), from WANG
          Cong.
      
      12) Correct deivce ID of T6 adapter in cxgb4 driver, from Hariprasad
          Shenai.
      
      13) Fix potential access past the end of the skb page frag array in
          tcp_sendmsg(). From Eric Dumazet.
      
      14) 'skb' can legitimately be NULL in inet{,6}_exact_dif_match(). Fix
          from David Ahern.
      
      15) Don't return an error in tcp_sendmsg() if we wronte any bytes
          successfully, from Eric Dumazet.
      
      16) Extraneous unlocks in netlink_diag_dump(), we removed the locking
          but forgot to purge these unlock calls. From Eric Dumazet.
      
      17) Fix memory leak in error path of __genl_register_family(). We leak
          the attrbuf, from WANG Cong.
      
      18) cgroupstats netlink policy table is mis-sized, from WANG Cong.
      
      19) Several XDP bug fixes in mlx5, from Saeed Mahameed.
      
      20) Fix several device refcount leaks in network drivers, from Johan
          Hovold.
      
      21) icmp6_send() should use skb dst device not skb->dev to determine L3
          routing domain. From David Ahern.
      
      22) ip_vs_genl_family sets maxattr incorrectly, from WANG Cong.
      
      23) We leak new macvlan port in some cases of maclan_common_netlink()
          errors. Fix from Gao Feng.
      
      24) Similar to the icmp6_send() fix, icmp_route_lookup() should
          determine L3 routing domain using skb_dst(skb)->dev not skb->dev.
          Also from David Ahern.
      
      25) Several fixes for route offloading and FIB notification handling in
          mlxsw driver, from Jiri Pirko.
      
      26) Properly cap __skb_flow_dissect()'s return value, from Eric Dumazet.
      
      27) Fix long standing regression in ipv4 redirect handling, wrt.
          validating the new neighbour's reachability. From Stephen Suryaputra
          Lin.
      
      28) If sk_filter() trims the packet excessively, handle it reasonably in
          tcp input instead of exploding. From Eric Dumazet.
      
      29) Fix handling of napi hash state when copying channels in sfc driver,
          from Bert Kenward.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (121 commits)
        mlxsw: spectrum_router: Flush FIB tables during fini
        net: stmmac: Fix lack of link transition for fixed PHYs
        sctp: change sk state only when it has assocs in sctp_shutdown
        bnx2: Wait for in-flight DMA to complete at probe stage
        Revert "bnx2: Reset device during driver initialization"
        ps3_gelic: fix spelling mistake in debug message
        net: ethernet: ixp4xx_eth: fix spelling mistake in debug message
        ibmvnic: Fix size of debugfs name buffer
        ibmvnic: Unmap ibmvnic_statistics structure
        sfc: clear napi_hash state when copying channels
        mlxsw: spectrum_router: Correctly dump neighbour activity
        mlxsw: spectrum: Fix refcount bug on span entries
        bnxt_en: Fix VF virtual link state.
        bnxt_en: Fix ring arithmetic in bnxt_setup_tc().
        Revert "include/uapi/linux/atm_zatm.h: include linux/time.h"
        tcp: take care of truncations done by sk_filter()
        ipv4: use new_gw for redirect neigh lookup
        r8152: Fix error path in open function
        net: bpqether.h: remove if_ether.h guard
        net: __skb_flow_dissect() must cap its return value
        ...
      e76d21c4
    • Linus Torvalds's avatar
      Merge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile · d4b95323
      Linus Torvalds authored
      Pull arch/tile bugfix from Chris Metcalf:
       "This just fixes an incompatibility with tile __ro_after_init"
      
      * 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
        tile: handle __ro_after_init like parisc does
      d4b95323
    • Linus Torvalds's avatar
      Merge tag 'rtc-4.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux · ac38126b
      Linus Torvalds authored
      Pull RTC fixes from Alexandre Belloni:
       "Here are a few driver fixes for 4.9. It has been calm for a while so I
        don't expect more for this cycle.
      
        Drivers:
         - asm9260: fix module autoload
         - cmos: fix crashes
         - omap: fix clock handling"
      
      * tag 'rtc-4.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
        rtc: omap: prevent disabling of clock/module during suspend
        rtc: omap: Fix selecting external osc
        rtc: cmos: Don't enable interrupts in the middle of the interrupt handler
        rtc: cmos: remove all __exit_p annotations
        rtc: asm9260: fix module autoload
      ac38126b
    • Chris Metcalf's avatar
      tile: handle __ro_after_init like parisc does · e123386b
      Chris Metcalf authored
      The tile architecture already marks RO_DATA as read-only in
      the kernel, so grouping RO_AFTER_INIT_DATA with RO_DATA, as is
      done by default, means the kernel faults in init when it tries
      to write to RO_AFTER_INIT_DATA.  For now, just arrange that
      __ro_after_init is handled like __write_once, i.e. __read_mostly.
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarChris Metcalf <cmetcalf@mellanox.com>
      e123386b
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Flush FIB tables during fini · ac571de9
      Ido Schimmel authored
      Since commit b45f64d1 ("mlxsw: spectrum_router: Use FIB notifications
      instead of switchdev calls") we reflect to the device the entire FIB
      table and not only FIBs that point to netdevs created by the driver.
      
      During module removal, FIBs of the second type are removed following
      NETDEV_UNREGISTER events sent. The other FIBs are still present in both
      the driver's cache and the device's table.
      
      Fix this by iterating over all the FIB tables in the device and flush
      them. There's no need to take locks, as we're the only writer.
      
      Fixes: b45f64d1 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac571de9
    • Florian Fainelli's avatar
      mdio: Demote print from info to debug in mdio_driver_register · eb2ca35f
      Florian Fainelli authored
      While it is useful to know which MDIO driver is being registered, demote
      the pr_info() to a pr_debug().
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb2ca35f
    • Florian Fainelli's avatar
      net: stmmac: Fix lack of link transition for fixed PHYs · c51e424d
      Florian Fainelli authored
      Commit 52f95bbf ("stmmac: fix adjust link call in case of a switch
      is attached") added some logic to avoid polling the fixed PHY and
      therefore invoking the adjust_link callback more than once, since this
      is a fixed PHY and link events won't be generated.
      
      This works fine the first time, because we start with phydev->irq =
      PHY_POLL, so we call adjust_link, then we set phydev->irq =
      PHY_IGNORE_INTERRUPT and we stop polling the PHY.
      
      Now, if we called ndo_close(), which calls both phy_stop() and does an
      explicit netif_carrier_off(), we end up with a link down. Upon calling
      ndo_open() again, despite starting the PHY state machine, we have
      PHY_IGNORE_INTERRUPT set, and we generate no link event at all, so the
      link is permanently down.
      
      Fixes: 52f95bbf ("stmmac: fix adjust link call in case of a switch is attached")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: default avatarGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c51e424d
    • Gao Feng's avatar
      driver: macvlan: Replace integer number with bool value · d94d0254
      Gao Feng authored
      The return value of function macvlan_addr_busy is used as bool value,
      so use bool value instead of integer number "1" and "0".
      Signed-off-by: default avatarGao Feng <gfree.wind@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d94d0254
    • Mickaël Salaün's avatar
      bpf: Use u64_to_user_ptr() · 535e7b4b
      Mickaël Salaün authored
      Replace the custom u64_to_ptr() function with the u64_to_user_ptr()
      macro.
      Signed-off-by: default avatarMickaël Salaün <mic@digikod.net>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      535e7b4b
    • Xin Long's avatar
      sctp: change sk state only when it has assocs in sctp_shutdown · 5bf35ddf
      Xin Long authored
      Now when users shutdown a sock with SEND_SHUTDOWN in sctp, even if
      this sock has no connection (assoc), sk state would be changed to
      SCTP_SS_CLOSING, which is not as we expect.
      
      Besides, after that if users try to listen on this sock, kernel
      could even panic when it dereference sctp_sk(sk)->bind_hash in
      sctp_inet_listen, as bind_hash is null when sock has no assoc.
      
      This patch is to move sk state change after checking sk assocs
      is not empty, and also merge these two if() conditions and reduce
      indent level.
      
      Fixes: d46e416c ("sctp: sctp should change socket state when shutdown is received")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5bf35ddf
    • David S. Miller's avatar
      Merge branch 'bnx2-kdump-fix' · 193f5122
      David S. Miller authored
      Baoquan He says:
      
      ====================
      bnx2: Wait for in-flight DMA to complete at probe stage
      
      This is v2 post.
      
      In commit 3e1be7ad ("bnx2: Reset device during driver initialization"),
      firmware requesting code was moved from open stage to probe stage.
      The reason is in kdump kernel hardware iommu need device be reset in
      driver probe stage, otherwise those in-flight DMA from 1st kernel
      will continue going and look up into the newly created io-page tables.
      However bnx2 chip resetting involves firmware requesting issue, that
      need be done in open stage.
      
      Michale Chan suggested we can just wait for the old in-flight DMA to
      complete at probe stage, then though without device resetting, we
      don't need to worry the old in-flight DMA could continue looking up
      the newly created io-page tables.
      
      v1->v2:
          Michael suggested to wait for the in-flight DMA to complete at probe
          stage. So give up the old method of trying to reset chip at probe
          stage, take the new way accordingly.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      193f5122
    • Baoquan He's avatar
      bnx2: Wait for in-flight DMA to complete at probe stage · 6df77862
      Baoquan He authored
      In-flight DMA from 1st kernel could continue going in kdump kernel.
      New io-page table has been created before bnx2 does reset at open stage.
      We have to wait for the in-flight DMA to complete to avoid it look up
      into the newly created io-page table at probe stage.
      Suggested-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Acked-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6df77862
    • Baoquan He's avatar
      Revert "bnx2: Reset device during driver initialization" · 5d0d4b91
      Baoquan He authored
      This reverts commit 3e1be7ad.
      
      When people build bnx2 driver into kernel, it will fail to detect
      and load firmware because firmware is contained in initramfs and
      initramfs has not been uncompressed yet during do_initcalls. So
      revert commit 3e1be7ad and work out a new way in the later patch.
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Acked-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d0d4b91
    • Colin Ian King's avatar
      ps3_gelic: fix spelling mistake in debug message · 7020637b
      Colin Ian King authored
      Trivial fix to spelling mistake "unmached" to "unmatched" in
      debug message.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7020637b
    • Philippe Reynes's avatar
      net: atheros: atl2: use new api ethtool_{get|set}_link_ksettings · a7888596
      Philippe Reynes authored
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      
      The previous implementation of set_settings was modifying
      the value of advertising, but with the new API, it's not
      possible. The structure ethtool_link_ksettings is defined
      as const.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7888596
    • Philippe Reynes's avatar
      net: atheros: atl1: use new api ethtool_{get|set}_link_ksettings · 1dae02b3
      Philippe Reynes authored
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      
      The previous implementation of set_settings was modifying
      the value of advertising, but with the new API, it's not
      possible. The structure ethtool_link_ksettings is defined
      as const.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1dae02b3
    • Philippe Reynes's avatar
      net: atheros: atl1c: use new api ethtool_{get|set}_link_ksettings · 58046c70
      Philippe Reynes authored
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58046c70
    • Philippe Reynes's avatar
      net: alx: use new api ethtool_{get|set}_link_ksettings · 36a4e690
      Philippe Reynes authored
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36a4e690
    • WANG Cong's avatar
      net: fix sleeping for sk_wait_event() · d9dc8b0f
      WANG Cong authored
      Similar to commit 14135f30 ("inet: fix sleeping inside inet_wait_for_connect()"),
      sk_wait_event() needs to fix too, because release_sock() is blocking,
      it changes the process state back to running after sleep, which breaks
      the previous prepare_to_wait().
      
      Switch to the new wait API.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9dc8b0f
    • Linus Torvalds's avatar
      ASoC: lpass-platform: fix uninitialized variable · ee2bd216
      Linus Torvalds authored
      In commit 022d00ee ("ASoC: lpass-platform: Fix broken pcm data
      usage") the stream specific information initialization was broken, with
      the dma channel information not being initialized if there was no
      alloc_dma_channel() helper function.
      
      Before that, the DMA channel number was implicitly initialized to zero
      because the backing store was allocated with devm_kzalloc().  When the
      init code was rewritten, that implicit initialization was lost, and gcc
      rightfully complains about an uninitialized variable being used.
      
      Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Cc: Mark Brown <broonie@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ee2bd216
    • Linus Torvalds's avatar
      Revert "printk: make reading the kernel log flush pending lines" · f5c9f9c7
      Linus Torvalds authored
      This reverts commit bfd8d3f2.
      
      It turns out that this flushes things much too aggressiverly, and causes
      lines to break up when the system logger races with new continuation
      lines being printed.
      
      There's a pending patch to make printk() flushing much more
      straightforward, but it's too invasive for 4.9, so in the meantime let's
      just not make the system message logging flush continuation lines.
      They'll be flushed by the final newline anyway.
      Suggested-by: default avatarPetr Mladek <pmladek@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f5c9f9c7
    • Mauro Carvalho Chehab's avatar
      gp8psk-fe: add missing MODULE_foo() macros · b15efc38
      Mauro Carvalho Chehab authored
      This file was converted to a separate module at commit 7a0786c1
      ("gp8psk: Fix DVB frontend attach"), because the DVB attach routines
      require it to work.  However, I forgot to copy the MODULE_foo() macros
      from the original module, causing this warning:
      
          WARNING: modpost: missing MODULE_LICENSE() in drivers/media/dvb-frontends/gp8psk-fe.o
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Fixes: 7a0786c1 ("gp8psk: Fix DVB frontend attach")
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b15efc38
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8528d662
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Misc fixes:
      
         - fix an Intel/MID boot crash/hang bug
      
         - fix a cache topology mis-parsing bug on certain AMD CPUs
      
         - fix a virtualization firmware bug by adding a check+quirk
           workaround on the kernel side"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/cpu: Deal with broken firmware (VMWare/XEN)
        x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems
        x86/platform/intel-mid: Retrofit pci_platform_pm_ops ->get_state hook
      8528d662
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5d69561b
      Linus Torvalds authored
      Pull irq fix from Ingo Molnar:
       "This fixes a genirq regression that resulted in the Intel/Broxton
        pinctrl/GPIO driver (and possibly others) spewing warnings"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Use irq type from irqdata instead of irqdesc
      5d69561b
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5ad62a9e
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "An uncore PMU driver hardware enablement change for Intel SkyLake
        uncore PMUs (Skylake Y, U, H and S platforms), plus a number of
        tooling fixes for the histogram handling/displaying code"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel/uncore: Add more Intel uncore IMC PCI IDs for SkyLake
        perf hists: Fix column length on --hierarchy
        perf hists browser: Fix column indentation on --hierarchy
        perf hists browser: Show folded sign properly on --hierarchy
        perf hists browser: Fix indentation of folded sign on --hierarchy
        perf hist browser: Fix hierarchy column counts
      5ad62a9e
    • Linus Torvalds's avatar
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 53381e2e
      Linus Torvalds authored
      Pull EFI fixes from Ingo Molnar:
       "A boot crash fix and a build warning fix"
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK=y
        x86/efi: Fix EFI memmap pointer size warning
      53381e2e
    • Linus Torvalds's avatar
      Merge tag 'ntb-4.9' of git://github.com/jonmason/ntb · 28ddafa5
      Linus Torvalds authored
      Pull NTB fixes from Jon Mason:
       "NTB bug fixes for ntb_hw_intel, ntb_perf, and ntb_pingpong.
      
        Also, a fixup to use jiffies in schedule_timeout_* call instead of a
        constant"
      
      * tag 'ntb-4.9' of git://github.com/jonmason/ntb:
        ntb_perf: potential info leak in debugfs
        ntb: ntb_hw_intel: init peer_addr in struct intel_ntb_dev
        ntb: make DMA_OUT_RESOURCE_TO HZ independent
        ntb_transport: make DMA_OUT_RESOURCE_TO HZ independent
        NTB: ntb_hw_intel: Fix typo in module parameter descriptions
        ntb_pingpong: Fix db_init parameter description
      28ddafa5
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 7d384846
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains a second batch of Netfilter updates for
      your net-next tree. This includes a rework of the core hook
      infrastructure that improves Netfilter performance by ~15% according to
      synthetic benchmarks. Then, a large batch with ipset updates, including
      a new hash:ipmac set type, via Jozsef Kadlecsik. This also includes a
      couple of assorted updates.
      
      Regarding the core hook infrastructure rework to improve performance,
      using this simple drop-all packets ruleset from ingress:
      
              nft add table netdev x
              nft add chain netdev x y { type filter hook ingress device eth0 priority 0\; }
              nft add rule netdev x y drop
      
      And generating traffic through Jesper Brouer's
      samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh script using -i
      option. perf report shows nf_tables calls in its top 10:
      
          17.30%  kpktgend_0   [nf_tables]            [k] nft_do_chain
          15.75%  kpktgend_0   [kernel.vmlinux]       [k] __netif_receive_skb_core
          10.39%  kpktgend_0   [nf_tables_netdev]     [k] nft_do_chain_netdev
      
      I'm measuring here an improvement of ~15% in performance with this
      patchset, so we got +2.5Mpps more. I have used my old laptop Intel(R)
      Core(TM) i5-3320M CPU @ 2.60GHz 4-cores.
      
      This rework contains more specifically, in strict order, these patches:
      
      1) Remove compile-time debugging from core.
      
      2) Remove obsolete comments that predate the rcu era. These days it is
         well known that a Netfilter hook always runs under rcu_read_lock().
      
      3) Remove threshold handling, this is only used by br_netfilter too.
         We already have specific code to handle this from br_netfilter,
         so remove this code from the core path.
      
      4) Deprecate NF_STOP, as this is only used by br_netfilter.
      
      5) Place nf_state_hook pointer into xt_action_param structure, so
         this structure fits into one single cacheline according to pahole.
         This also implicit affects nftables since it also relies on the
         xt_action_param structure.
      
      6) Move state->hook_entries into nf_queue entry. The hook_entries
         pointer is only required by nf_queue(), so we can store this in the
         queue entry instead.
      
      7) use switch() statement to handle verdict cases.
      
      8) Remove hook_entries field from nf_hook_state structure, this is only
         required by nf_queue, so store it in nf_queue_entry structure.
      
      9) Merge nf_iterate() into nf_hook_slow() that results in a much more
         simple and readable function.
      
      10) Handle NF_REPEAT away from the core, so far the only client is
          nf_conntrack_in() and we can restart the packet processing using a
          simple goto to jump back there when the TCP requires it.
          This update required a second pass to fix fallout, fix from
          Arnd Bergmann.
      
      11) Set random seed from nft_hash when no seed is specified from
          userspace.
      
      12) Simplify nf_tables expression registration, in a much smarter way
          to save lots of boiler plate code, by Liping Zhang.
      
      13) Simplify layer 4 protocol conntrack tracker registration, from
          Davide Caratti.
      
      14) Missing CONFIG_NF_SOCKET_IPV4 dependency for udp4_lib_lookup, due
          to recent generalization of the socket infrastructure, from Arnd
          Bergmann.
      
      15) Then, the ipset batch from Jozsef, he describes it as it follows:
      
      * Cleanup: Remove extra whitespaces in ip_set.h
      * Cleanup: Mark some of the helpers arguments as const in ip_set.h
      * Cleanup: Group counter helper functions together in ip_set.h
      * struct ip_set_skbinfo is introduced instead of open coded fields
        in skbinfo get/init helper funcions.
      * Use kmalloc() in comment extension helper instead of kzalloc()
        because it is unnecessary to zero out the area just before
        explicit initialization.
      * Cleanup: Split extensions into separate files.
      * Cleanup: Separate memsize calculation code into dedicated function.
      * Cleanup: group ip_set_put_extensions() and ip_set_get_extensions()
        together.
      * Add element count to hash headers by Eric B Munson.
      * Add element count to all set types header for uniform output
        across all set types.
      * Count non-static extension memory into memsize calculation for
        userspace.
      * Cleanup: Remove redundant mtype_expire() arguments, because
        they can be get from other parameters.
      * Cleanup: Simplify mtype_expire() for hash types by removing
        one level of intendation.
      * Make NLEN compile time constant for hash types.
      * Make sure element data size is a multiple of u32 for the hash set
        types.
      * Optimize hash creation routine, exit as early as possible.
      * Make struct htype per ipset family so nets array becomes fixed size
        and thus simplifies the struct htype allocation.
      * Collapse same condition body into a single one.
      * Fix reported memory size for hash:* types, base hash bucket structure
        was not taken into account.
      * hash:ipmac type support added to ipset by Tomasz Chilinski.
      * Use setup_timer() and mod_timer() instead of init_timer()
        by Muhammad Falak R Wani, individually for the set type families.
      
      16) Remove useless connlabel field in struct netns_ct, patch from
          Florian Westphal.
      
      17) xt_find_table_lock() doesn't return ERR_PTR() anymore, so simplify
          {ip,ip6,arp}tables code that uses this.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d384846
    • Jiri Pirko's avatar
      mlxsw: spectrum_router: Add FIB abort warning · 8d419324
      Jiri Pirko authored
      Add a warning that the abort mechanism was triggered for device.
      Also avoid going through the procedure if abort was already done.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d419324
    • David S. Miller's avatar
      Merge branch 'dsa-mv88e6xxx-post-refactor-fixes' · 5aad5b42
      David S. Miller authored
      Andrew Lunn says:
      
      ====================
      dsa: mv88e6xxx: Fixes for port refactoring
      
      The patches which refactored setting up the switch MACs introduced a
      couple of regressions. The RGMII delays for a port can be set using
      other mechanism than just phy-mode. Don't overwrite the delays unless
      explicitly asked to. This broke my Armada 370 RD. Also, the mv88e6351
      family supports setting RGMII delays, but is missing the necessary
      entries in the ops structures to allow this.
      
      These fixes are to patches currently in net-next. No need for stable
      etc.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5aad5b42
    • Andrew Lunn's avatar
      net: dsa: mv88e6xxx: 6351 family also has RGMII delays · 94d66ae6
      Andrew Lunn authored
      The recent refactoring of setting the MAC configuration broke setting
      of RGMII delays, via the phy-mode, on the 6351 family. Add the missing
      ops to the structure.
      
      Fixes: 7340e5ecdbb1 ("net: dsa: mv88e6xxx: setup port's MAC")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94d66ae6