1. 05 Sep, 2017 27 commits
    • Iyappan Subramanian's avatar
      drivers: net: xgene: Read tx/rx delay for ACPI · 9d7e72c0
      Iyappan Subramanian authored
      This patch fixes reading tx/rx delay values for ACPI.
      Signed-off-by: default avatarIyappan Subramanian <isubramanian@apm.com>
      Signed-off-by: default avatarQuan Nguyen <qnguyen@apm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d7e72c0
    • Zahari Doychev's avatar
      rocker: fix kcalloc parameter order · b1357cfb
      Zahari Doychev authored
      The function calls to kcalloc use wrong parameter order and incorrect flags
      values. GFP_KERNEL is used instead of flags now and the order is corrected.
      
      The change was done using the following coccinelle script:
      
      @@
      expression E1,E2;
      type T;
      @@
      
      -kcalloc(E1, E2, sizeof(T))
      +kcalloc(E2, sizeof(T), GFP_KERNEL)
      Signed-off-by: default avatarZahari Doychev <zahari.doychev@linux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1357cfb
    • Håkon Bugge's avatar
      rds: Fix non-atomic operation on shared flag variable · f530f39f
      Håkon Bugge authored
      The bits in m_flags in struct rds_message are used for a plurality of
      reasons, and from different contexts. To avoid any missing updates to
      m_flags, use the atomic set_bit() instead of the non-atomic equivalent.
      Signed-off-by: default avatarHåkon Bugge <haakon.bugge@oracle.com>
      Reviewed-by: default avatarKnut Omang <knut.omang@oracle.com>
      Reviewed-by: default avatarWei Lin Guay <wei.lin.guay@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f530f39f
    • Jakub Kicinski's avatar
      net: sched: don't use GFP_KERNEL under spin lock · 2c8468dc
      Jakub Kicinski authored
      The new TC IDR code uses GFP_KERNEL under spin lock.  Which leads
      to:
      
      [  582.621091] BUG: sleeping function called from invalid context at ../mm/slab.h:416
      [  582.629721] in_atomic(): 1, irqs_disabled(): 0, pid: 3379, name: tc
      [  582.636939] 2 locks held by tc/3379:
      [  582.641049]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff910354ce>] rtnetlink_rcv_msg+0x92e/0x1400
      [  582.650958]  #1:  (&(&tn->idrinfo->lock)->rlock){+.-.+.}, at: [<ffffffff9110a5e0>] tcf_idr_create+0x2f0/0x8e0
      [  582.662217] Preemption disabled at:
      [  582.662222] [<ffffffff9110a5e0>] tcf_idr_create+0x2f0/0x8e0
      [  582.672592] CPU: 9 PID: 3379 Comm: tc Tainted: G        W       4.13.0-rc7-debug-00648-g43503a79b9f0 #287
      [  582.683432] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
      [  582.691937] Call Trace:
      ...
      [  582.742460]  kmem_cache_alloc+0x286/0x540
      [  582.747055]  radix_tree_node_alloc.constprop.6+0x4a/0x450
      [  582.753209]  idr_get_free_cmn+0x627/0xf80
      ...
      [  582.815525]  idr_alloc_cmn+0x1a8/0x270
      ...
      [  582.833804]  tcf_idr_create+0x31b/0x8e0
      ...
      
      Try to preallocate the memory with idr_prealloc(GFP_KERNEL)
      (as suggested by Eric Dumazet), and change the allocation
      flags under spin lock.
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c8468dc
    • Jason Wang's avatar
      vhost_net: correctly check tx avail during rx busy polling · 8b949bef
      Jason Wang authored
      We check tx avail through vhost_enable_notify() in the past which is
      wrong since it only checks whether or not guest has filled more
      available buffer since last avail idx synchronization which was just
      done by vhost_vq_avail_empty() before. What we really want is checking
      pending buffers in the avail ring. Fix this by calling
      vhost_vq_avail_empty() instead.
      
      This issue could be noticed by doing netperf TCP_RR benchmark as
      client from guest (but not host). With this fix, TCP_RR from guest to
      localhost restores from 1375.91 trans per sec to 55235.28 trans per
      sec on my laptop (Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz).
      
      Fixes: 03088137 ("vhost_net: basic polling support")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b949bef
    • Corentin Labbe's avatar
      net: mdio-mux: add mdio_mux parameter to mdio_mux_init() · 5482a978
      Corentin Labbe authored
      mdio_mux_init() use the parameter dev for two distinct thing:
      1) Have a device for all devm_ functions
      2) Get device_node from it
      
      Since it is two distinct purpose, this patch add a parameter mdio_mux
      that is linked to task 2.
      
      This will also permit to register an of_node mdio-mux that lacks a direct
      owning device.
      For example a mdio-mux which is a subnode of a real device.
      Signed-off-by: default avatarCorentin Labbe <clabbe.montjoie@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5482a978
    • David Howells's avatar
      rxrpc: Make service connection lookup always check for retry · fdade4f6
      David Howells authored
      When an RxRPC service packet comes in, the target connection is looked up
      by an rb-tree search under RCU and a read-locked seqlock; the seqlock retry
      check is, however, currently skipped if we got a match, but probably
      shouldn't be in case the connection we found gets replaced whilst we're
      doing a search.
      
      Make the lookup procedure always go through need_seqretry(), even if the
      lookup was successful.  This makes sure we always pick up on a write-lock
      event.
      
      On the other hand, since we don't take a ref on the object, but rely on RCU
      to prevent its destruction after dropping the seqlock, I'm not sure this is
      necessary.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fdade4f6
    • Romain Perier's avatar
      net: stmmac: Delete dead code for MDIO registration · 5e369aef
      Romain Perier authored
      This code is no longer used, the logging function was changed by commit
      fbca1647 ("net: stmmac: Use the right logging function in stmmac_mdio_register").
      It was previously showing information about the type of the IRQ, if it's
      polled, ignored or a normal interrupt. As we don't want information loss,
      I have moved this code to phy_attached_print().
      
      Fixes: fbca1647 ("net: stmmac: Use the right logging function in stmmac_mdio_register")
      Signed-off-by: default avatarRomain Perier <romain.perier@collabora.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e369aef
    • Claudiu Manoil's avatar
      gianfar: Fix Tx flow control deactivation · 5d621672
      Claudiu Manoil authored
      The wrong register is checked for the Tx flow control bit,
      it should have been maccfg1 not maccfg2.
      This went unnoticed for so long probably because the impact is
      hardly visible, not to mention the tangled code from adjust_link().
      First, link flow control (i.e. handling of Rx/Tx link level pause frames)
      is disabled by default (needs to be enabled via 'ethtool -A').
      Secondly, maccfg2 always returns 0 for tx_flow_oldval (except for a few
      old boards), which results in Tx flow control remaining always on
      once activated.
      
      Fixes: 45b679c9 ("gianfar: Implement PAUSE frame generation support")
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d621672
    • Ganesh Goudar's avatar
      cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6 · ef18e3b9
      Ganesh Goudar authored
      MPS_TX_INT_CAUSE[Bubble] is a normal condition for T6, hence
      ignore this interrupt for T6.
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef18e3b9
    • Ganesh Goudar's avatar
      cxgb4: Fix pause frame count in t4_get_port_stats · 2de489f4
      Ganesh Goudar authored
      MPS_STAT_CTL[CountPauseStatTx] and MPS_STAT_CTL[CountPauseStatRx]
      only control whether or not Pause Frames will be counted as part
      of the 64-Byte Tx/Rx Frame counters.  These bits do not control
      whether Pause Frames are counted in the Total Tx/Rx Frames/Bytes
      counters.
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2de489f4
    • Ganesh Goudar's avatar
      cxgb4: fix memory leak · 128416ac
      Ganesh Goudar authored
      do not reuse the loop counter which is used iterate over
      the ports, so that sched_tbl will be freed for all the ports.
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      128416ac
    • Jason Wang's avatar
      tun: rename generic_xdp to skb_xdp · 1cfe6e93
      Jason Wang authored
      Rename "generic_xdp" to "skb_xdp" to avoid confusing it with the
      generic XDP which will be done at netif_receive_skb().
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cfe6e93
    • Jason Wang's avatar
      tun: reserve extra headroom only when XDP is set · 7df13219
      Jason Wang authored
      We reserve headroom unconditionally which could cause unnecessary
      stress on socket memory accounting because of increased trusesize. Fix
      this by only reserve extra headroom when XDP is set.
      
      Cc: Jakub Kicinski <kubakici@wp.pl>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7df13219
    • David S. Miller's avatar
      Merge branch 'dsa-tx-queues' · 9e776f22
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      net: dsa: Allow switch drivers to indicate number of TX queues
      
      This patch series extracts the parts of the patch set that are likely not to be
      controversial and actually bringing multi-queue support to DSA-created network
      devices.
      
      With these patches, we can now use sch_multiq as documented under
      Documentation/networking/multique.txt and let applications dedice the switch
      port output queue they want to use. Currently only Broadcom tags utilize that
      information.
      
      Resending based on David's feedback regarding the patches not in patchwork.
      
      Changes in v2:
      - use a proper define for the number of TX queues in bcm_sf2.c (Andrew)
      
      Changes from RFC:
      
      - dropped the ability to configure RX queues since we don't do anything with
        those just yet
      - dropped the patches that dealt with binding the DSA slave network devices
        queues with their master network devices queues this will be worked on
        separately.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e776f22
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping · c837fc81
      Florian Fainelli authored
      Even though TC2QOS mapping is for switch egress queues, we need to
      configure it correclty in order for the Broadcom tag ingress (CPU ->
      switch) queue selection to work correctly since there is a 1:1 mapping
      between switch egress queues and ingress queues.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c837fc81
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Advertise number of egress queues · 18118377
      Florian Fainelli authored
      The switch supports 8 egress queues per port, so indicate that such that
      net/dsa/slave.c::dsa_slave_create can allocate the right number of TX queues.
      While at it use SF2_NUM_EGRESS_QUEUE as a define for the number of queues we
      support.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18118377
    • Florian Fainelli's avatar
      net: dsa: tag_brcm: Set output queue from skb queue mapping · 0f15b098
      Florian Fainelli authored
      We originally used skb->priority but that was not quite correct as this
      bitfield needs to contain the egress switch queue we intend to send this
      SKB to.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f15b098
    • Florian Fainelli's avatar
      net: dsa: Allow switch drivers to indicate number of TX queues · 55199df6
      Florian Fainelli authored
      Let switch drivers indicate how many TX queues they support. Some
      switches, such as Broadcom Starfighter 2 are designed with 8 egress
      queues. Future changes will allow us to leverage the queue mapping and
      direct the transmission towards a particular queue.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55199df6
    • Ido Schimmel's avatar
      f1c2eddf
    • Thomas Meyer's avatar
      net/mlx4_core: Use ARRAY_SIZE macro · 691223ec
      Thomas Meyer authored
      Use ARRAY_SIZE macro, rather than explicitly coding some variant of it
      yourself.
      Found with: find -type f -name "*.c" -o -name "*.h" | xargs perl -p -i -e
      's/\bsizeof\s*\(\s*(\w+)\s*\)\s*\ /\s*sizeof\s*\(\s*\1\s*\[\s*0\s*\]\s*\)
      /ARRAY_SIZE(\1)/g' and manual check/verification.
      Signed-off-by: default avatarThomas Meyer <thomas@m3y3r.de>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      691223ec
    • David S. Miller's avatar
      Merge branch 'flow_dissector-fixes' · c4492d8a
      David S. Miller authored
      Tom Herbert says:
      
      ====================
      flow_dissector: Flow dissector fixes
      
      This patch set fixes some basic issues with __skb_flow_dissect function.
      
      Items addressed:
        - Cleanup control flow in the function; in particular eliminate a
          bunch of goto's and implement a simplified control flow model
        - Add limits for number of encapsulations and headers that can be
          dissected
      
      v2:
        - Simplify the logic for limits on flow dissection. Just set the
          limit based on the number of headers the flow dissector can
          processes. The accounted headers includes encapsulation headers,
          extension headers, or other shim headers.
      
      Tested:
      
      Ran normal traffic, GUE, and VXLAN traffic.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4492d8a
    • Tom Herbert's avatar
      flow_dissector: Add limit for number of headers to dissect · 1eed4dfb
      Tom Herbert authored
      In flow dissector there are no limits to the number of nested
      encapsulations or headers that might be dissected which makes for a
      nice DOS attack. This patch sets a limit of the number of headers
      that flow dissector will parse.
      
      Headers includes network layer headers, transport layer headers, shim
      headers for encapsulation, IPv6 extension headers, etc. The limit for
      maximum number of headers to parse has be set to fifteen to account for
      a reasonable number of encapsulations, extension headers, VLAN,
      in a packet. Note that this limit does not supercede the STOP_AT_*
      flags which may stop processing before the headers limit is reached.
      Reported-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarTom Herbert <tom@quantonium.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1eed4dfb
    • Tom Herbert's avatar
      flow_dissector: Cleanup control flow · 3a1214e8
      Tom Herbert authored
      __skb_flow_dissect is riddled with gotos that make discerning the flow,
      debugging, and extending the capability difficult. This patch
      reorganizes things so that we only perform goto's after the two main
      switch statements (no gotos within the cases now). It also eliminates
      several goto labels so that there are only two labels that can be target
      for goto.
      Reported-by: default avatarAlexander Popov <alex.popov@linux.com>
      Signed-off-by: default avatarTom Herbert <tom@quantonium.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a1214e8
    • Arnd Bergmann's avatar
      soc: ti/knav_dma: include dmaengine header · 2c08ab3f
      Arnd Bergmann authored
      A header file cleanup apparently caused a build regression
      with one driver using the knav infrastructure:
      
      In file included from drivers/net/ethernet/ti/netcp_core.c:30:0:
      include/linux/soc/ti/knav_dma.h:129:30: error: field 'direction' has incomplete type
        enum dma_transfer_direction direction;
                                    ^~~~~~~~~
      drivers/net/ethernet/ti/netcp_core.c: In function 'netcp_txpipe_open':
      drivers/net/ethernet/ti/netcp_core.c:1349:21: error: 'DMA_MEM_TO_DEV' undeclared (first use in this function); did you mean 'DMA_MEMORY_MAP'?
        config.direction = DMA_MEM_TO_DEV;
                           ^~~~~~~~~~~~~~
                           DMA_MEMORY_MAP
      drivers/net/ethernet/ti/netcp_core.c:1349:21: note: each undeclared identifier is reported only once for each function it appears in
      drivers/net/ethernet/ti/netcp_core.c: In function 'netcp_setup_navigator_resources':
      drivers/net/ethernet/ti/netcp_core.c:1659:22: error: 'DMA_DEV_TO_MEM' undeclared (first use in this function); did you mean 'DMA_DESC_HOST'?
        config.direction  = DMA_DEV_TO_MEM;
      
      As the header is no longer included implicitly through netdevice.h,
      we should include it in the header that references the enum.
      
      Fixes: 0dd5759d ("net: remove dmaengine.h inclusion from netdevice.h")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c08ab3f
    • Arnd Bergmann's avatar
      net/ncsi: fix ncsi_vlan_rx_{add,kill}_vid references · fd0c88b7
      Arnd Bergmann authored
      We get a new link error in allmodconfig kernels after ftgmac100
      started using the ncsi helpers:
      
      ERROR: "ncsi_vlan_rx_kill_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined!
      ERROR: "ncsi_vlan_rx_add_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined!
      
      Related to that, we get another error when CONFIG_NET_NCSI is disabled:
      
      drivers/net/ethernet/faraday/ftgmac100.c:1626:25: error: 'ncsi_vlan_rx_add_vid' undeclared here (not in a function); did you mean 'ncsi_start_dev'?
      drivers/net/ethernet/faraday/ftgmac100.c:1627:26: error: 'ncsi_vlan_rx_kill_vid' undeclared here (not in a function); did you mean 'ncsi_vlan_rx_add_vid'?
      
      This fixes both problems at once, using a 'static inline' stub helper
      for the disabled case, and exporting the functions when they are present.
      
      Fixes: 51564585 ("ftgmac100: Support NCSI VLAN filtering when available")
      Fixes: 21acf630 ("net/ncsi: Configure VLAN tag filter")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd0c88b7
    • Eric Dumazet's avatar
      bpf: fix numa_node validation · 96e5ae4e
      Eric Dumazet authored
      syzkaller reported crashes in bpf map creation or map update [1]
      
      Problem is that nr_node_ids is a signed integer,
      NUMA_NO_NODE is also an integer, so it is very tempting
      to declare numa_node as a signed integer.
      
      This means the typical test to validate a user provided value :
      
              if (numa_node != NUMA_NO_NODE &&
                  (numa_node >= nr_node_ids ||
                   !node_online(numa_node)))
      
      must be written :
      
              if (numa_node != NUMA_NO_NODE &&
                  ((unsigned int)numa_node >= nr_node_ids ||
                   !node_online(numa_node)))
      
      [1]
      kernel BUG at mm/slab.c:3256!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 2946 Comm: syzkaller916108 Not tainted 4.13.0-rc7+ #35
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801d2bc60c0 task.stack: ffff8801c0c90000
      RIP: 0010:____cache_alloc_node+0x1d4/0x1e0 mm/slab.c:3292
      RSP: 0018:ffff8801c0c97638 EFLAGS: 00010096
      RAX: ffffffffffff8b7b RBX: 0000000001080220 RCX: 0000000000000000
      RDX: 00000000ffff8b7b RSI: 0000000001080220 RDI: ffff8801dac00040
      RBP: ffff8801c0c976c0 R08: 0000000000000000 R09: 0000000000000000
      R10: ffff8801c0c97620 R11: 0000000000000001 R12: ffff8801dac00040
      R13: ffff8801dac00040 R14: 0000000000000000 R15: 00000000ffff8b7b
      FS:  0000000002119940(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020001fec CR3: 00000001d2980000 CR4: 00000000001406f0
      Call Trace:
       __do_kmalloc_node mm/slab.c:3688 [inline]
       __kmalloc_node+0x33/0x70 mm/slab.c:3696
       kmalloc_node include/linux/slab.h:535 [inline]
       alloc_htab_elem+0x2a8/0x480 kernel/bpf/hashtab.c:740
       htab_map_update_elem+0x740/0xb80 kernel/bpf/hashtab.c:820
       map_update_elem kernel/bpf/syscall.c:587 [inline]
       SYSC_bpf kernel/bpf/syscall.c:1468 [inline]
       SyS_bpf+0x20c5/0x4c40 kernel/bpf/syscall.c:1443
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x440409
      RSP: 002b:00007ffd1f1792b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440409
      RDX: 0000000000000020 RSI: 0000000020006000 RDI: 0000000000000002
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401d70
      R13: 0000000000401e00 R14: 0000000000000000 R15: 0000000000000000
      Code: 83 c2 01 89 50 18 4c 03 70 08 e8 38 f4 ff ff 4d 85 f6 0f 85 3e ff ff ff 44 89 fe 4c 89 ef e8 94 fb ff ff 49 89 c6 e9 2b ff ff ff <0f> 0b 0f 0b 0f 0b 66 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41
      RIP: ____cache_alloc_node+0x1d4/0x1e0 mm/slab.c:3292 RSP: ffff8801c0c97638
      ---[ end trace d745f355da2e33ce ]---
      Kernel panic - not syncing: Fatal exception
      
      Fixes: 96eabe7a ("bpf: Allow selecting numa node during map creation")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96e5ae4e
  2. 04 Sep, 2017 13 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 2ff81cd3
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for next-net (part 2)
      
      The following patchset contains Netfilter updates for net-next. This
      patchset includes updates for nf_tables, removal of
      CONFIG_NETFILTER_DEBUG and a new mode for xt_hashlimit. More
      specifically, they:
      
      1) Add new rate match mode for hashlimit, this introduces a new revision
         for this match. The idea is to stop matching packets until ratelimit
         criteria stands true. Patch from Vishwanath Pai.
      
      2) Add ->select_ops indirection to nf_tables named objects, so we can
         choose between different flavours of the same object type, patch from
         Pablo M. Bermudo.
      
      3) Shorter function names in nft_limit, basically:
         s/nft_limit_pkt_bytes/nft_limit_bytes, also from Pablo M. Bermudo.
      
      4) Add new stateful limit named object type, this allows us to create
         limit policies that you can identify via name, also from Pablo.
      
      5) Remove unused hooknum parameter in conntrack ->packet indirection.
         From Florian Westphal.
      
      6) Patches to remove CONFIG_NETFILTER_DEBUG and macros such as
         IP_NF_ASSERT and IP_NF_ASSERT. From Varsha Rao.
      
      7) Add nf_tables_updchain() helper function and use it from
         nf_tables_newchain() to make it more maintainable. Similarly,
         add nf_tables_addchain() and use it too.
      
      8) Add new netlink NLM_F_NONREC flag, this flag should only be used for
         deletion requests, specifically, to support non-recursive deletion.
         Based on what we discussed during NFWS'17 in Faro.
      
      9) Use NLM_F_NONREC from table and sets in nf_tables.
      
      10) Support for recursive chain deletion. Table and set deletion
          commands come with an implicit content flush on deletion, while
          chains do not. This patch addresses this inconsistency by adding
          the code to perform recursive chain deletions. This also comes with
          the bits to deal with the new NLM_F_NONREC netlink flag.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ff81cd3
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: support for recursive chain deletion · 9dee1474
      Pablo Neira Ayuso authored
      This patch sorts out an asymmetry in deletions. Currently, table and set
      deletion commands come with an implicit content flush on deletion.
      However, chain deletion results in -EBUSY if there is content in this
      chain, so no implicit flush happens. So you have to send a flush command
      in first place to delete chains, this is inconsistent and it can be
      annoying in terms of user experience.
      
      This patch uses the new NLM_F_NONREC flag to request non-recursive chain
      deletion, ie. if the chain to be removed contains rules, then this
      returns EBUSY. This problem was discussed during the NFWS'17 in Faro,
      Portugal. In iptables, you hit -EBUSY if you try to delete a chain that
      contains rules, so you have to flush first before you can remove
      anything. Since iptables-compat uses the nf_tables netlink interface, it
      has to use the NLM_F_NONREC flag from userspace to retain the original
      iptables semantics, ie.  bail out on removing chains that contain rules.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      9dee1474
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: use NLM_F_NONREC for deletion requests · a8278400
      Pablo Neira Ayuso authored
      Bail out if user requests non-recursive deletion for tables and sets.
      This new flags tells nf_tables netlink interface to reject deletions if
      tables and sets have content.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a8278400
    • Pablo Neira Ayuso's avatar
      netlink: add NLM_F_NONREC flag for deletion requests · 2335ba70
      Pablo Neira Ayuso authored
      In the last NFWS in Faro, Portugal, we discussed that netlink is lacking
      the semantics to request non recursive deletions, ie. do not delete an
      object iff it has child objects that hang from this parent object that
      the user requests to be deleted.
      
      We need this new flag to solve a problem for the iptables-compat
      backward compatibility utility, that runs iptables commands using the
      existing nf_tables netlink interface. Specifically, custom chains in
      iptables cannot be deleted if there are rules in it, however, nf_tables
      allows to remove any chain that is populated with content. To sort out
      this asymmetry, iptables-compat userspace sets this new NLM_F_NONREC
      flag to obtain the same semantics that iptables provides.
      
      This new flag should only be used for deletion requests. Note this new
      flag value overlaps with the existing:
      
      * NLM_F_ROOT for get requests.
      * NLM_F_REPLACE for new requests.
      
      However, those flags should not ever be used in deletion requests.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2335ba70
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: add nf_tables_addchain() · 4035285f
      Pablo Neira Ayuso authored
      Wrap the chain addition path in a function to make it more maintainable.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      4035285f
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: add nf_tables_updchain() · 2c4a488a
      Pablo Neira Ayuso authored
      nf_tables_newchain() is too large, wrap the chain update path in a
      function to make it more maintainable.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2c4a488a
    • Varsha Rao's avatar
      net: Remove CONFIG_NETFILTER_DEBUG and _ASSERT() macros. · 9efdb14f
      Varsha Rao authored
      This patch removes CONFIG_NETFILTER_DEBUG and _ASSERT() macros as they
      are no longer required. Replace _ASSERT() macros with WARN_ON().
      Signed-off-by: default avatarVarsha Rao <rvarsha016@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      9efdb14f
    • Varsha Rao's avatar
      net: Replace NF_CT_ASSERT() with WARN_ON(). · 44d6e2f2
      Varsha Rao authored
      This patch removes NF_CT_ASSERT() and instead uses WARN_ON().
      Signed-off-by: default avatarVarsha Rao <rvarsha016@gmail.com>
      44d6e2f2
    • Florian Westphal's avatar
      netfilter: remove unused hooknum arg from packet functions · d1c1e39d
      Florian Westphal authored
      tested with allmodconfig build.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      d1c1e39d
    • Pablo M. Bermudo Garay's avatar
      netfilter: nft_limit: add stateful object type · a6912055
      Pablo M. Bermudo Garay authored
      Register a new limit stateful object type into the stateful object
      infrastructure.
      Signed-off-by: default avatarPablo M. Bermudo Garay <pablombg@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a6912055
    • Pablo M. Bermudo Garay's avatar
      netfilter: nft_limit: replace pkt_bytes with bytes · 6e323887
      Pablo M. Bermudo Garay authored
      Just a small refactor patch in order to improve the code readability.
      Signed-off-by: default avatarPablo M. Bermudo Garay <pablombg@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6e323887
    • Pablo M. Bermudo Garay's avatar
      netfilter: nf_tables: add select_ops for stateful objects · dfc46034
      Pablo M. Bermudo Garay authored
      This patch adds support for overloading stateful objects operations
      through the select_ops() callback, just as it is implemented for
      expressions.
      
      This change is needed for upcoming additions to the stateful objects
      infrastructure.
      Signed-off-by: default avatarPablo M. Bermudo Garay <pablombg@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      dfc46034
    • Vishwanath Pai's avatar
      netfilter: xt_hashlimit: add rate match mode · bea74641
      Vishwanath Pai authored
      This patch adds a new feature to hashlimit that allows matching on the
      current packet/byte rate without rate limiting. This can be enabled
      with a new flag --hashlimit-rate-match. The match returns true if the
      current rate of packets is above/below the user specified value.
      
      The main difference between the existing algorithm and the new one is
      that the existing algorithm rate-limits the flow whereas the new
      algorithm does not. Instead it *classifies* the flow based on whether
      it is above or below a certain rate. I will demonstrate this with an
      example below. Let us assume this rule:
      
      iptables -A INPUT -m hashlimit --hashlimit-above 10/s -j new_chain
      
      If the packet rate is 15/s, the existing algorithm would ACCEPT 10
      packets every second and send 5 packets to "new_chain".
      
      But with the new algorithm, as long as the rate of 15/s is sustained,
      all packets will continue to match and every packet is sent to new_chain.
      
      This new functionality will let us classify different flows based on
      their current rate, so that further decisions can be made on them based on
      what the current rate is.
      
      This is how the new algorithm works:
      We divide time into intervals of 1 (sec/min/hour) as specified by
      the user. We keep track of the number of packets/bytes processed in the
      current interval. After each interval we reset the counter to 0.
      
      When we receive a packet for match, we look at the packet rate
      during the current interval and the previous interval to make a
      decision:
      
      if [ prev_rate < user and cur_rate < user ]
              return Below
      else
              return Above
      
      Where cur_rate is the number of packets/bytes seen in the current
      interval, prev is the number of packets/bytes seen in the previous
      interval and 'user' is the rate specified by the user.
      
      We also provide flexibility to the user for choosing the time
      interval using the option --hashilmit-interval. For example the user can
      keep a low rate like x/hour but still keep the interval as small as 1
      second.
      
      To preserve backwards compatibility we have to add this feature in a new
      revision, so I've created revision 3 for hashlimit. The two new options
      we add are:
      
      --hashlimit-rate-match
      --hashlimit-rate-interval
      
      I have updated the help text to add these new options. Also added a few
      tests for the new options.
      Suggested-by: default avatarIgor Lubashev <ilubashe@akamai.com>
      Reviewed-by: default avatarJosh Hunt <johunt@akamai.com>
      Signed-off-by: default avatarVishwanath Pai <vpai@akamai.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      bea74641