1. 05 Sep, 2017 2 commits
    • Arnd Bergmann's avatar
      net/ncsi: fix ncsi_vlan_rx_{add,kill}_vid references · fd0c88b7
      Arnd Bergmann authored
      We get a new link error in allmodconfig kernels after ftgmac100
      started using the ncsi helpers:
      
      ERROR: "ncsi_vlan_rx_kill_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined!
      ERROR: "ncsi_vlan_rx_add_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined!
      
      Related to that, we get another error when CONFIG_NET_NCSI is disabled:
      
      drivers/net/ethernet/faraday/ftgmac100.c:1626:25: error: 'ncsi_vlan_rx_add_vid' undeclared here (not in a function); did you mean 'ncsi_start_dev'?
      drivers/net/ethernet/faraday/ftgmac100.c:1627:26: error: 'ncsi_vlan_rx_kill_vid' undeclared here (not in a function); did you mean 'ncsi_vlan_rx_add_vid'?
      
      This fixes both problems at once, using a 'static inline' stub helper
      for the disabled case, and exporting the functions when they are present.
      
      Fixes: 51564585 ("ftgmac100: Support NCSI VLAN filtering when available")
      Fixes: 21acf630 ("net/ncsi: Configure VLAN tag filter")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd0c88b7
    • Eric Dumazet's avatar
      bpf: fix numa_node validation · 96e5ae4e
      Eric Dumazet authored
      syzkaller reported crashes in bpf map creation or map update [1]
      
      Problem is that nr_node_ids is a signed integer,
      NUMA_NO_NODE is also an integer, so it is very tempting
      to declare numa_node as a signed integer.
      
      This means the typical test to validate a user provided value :
      
              if (numa_node != NUMA_NO_NODE &&
                  (numa_node >= nr_node_ids ||
                   !node_online(numa_node)))
      
      must be written :
      
              if (numa_node != NUMA_NO_NODE &&
                  ((unsigned int)numa_node >= nr_node_ids ||
                   !node_online(numa_node)))
      
      [1]
      kernel BUG at mm/slab.c:3256!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 2946 Comm: syzkaller916108 Not tainted 4.13.0-rc7+ #35
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801d2bc60c0 task.stack: ffff8801c0c90000
      RIP: 0010:____cache_alloc_node+0x1d4/0x1e0 mm/slab.c:3292
      RSP: 0018:ffff8801c0c97638 EFLAGS: 00010096
      RAX: ffffffffffff8b7b RBX: 0000000001080220 RCX: 0000000000000000
      RDX: 00000000ffff8b7b RSI: 0000000001080220 RDI: ffff8801dac00040
      RBP: ffff8801c0c976c0 R08: 0000000000000000 R09: 0000000000000000
      R10: ffff8801c0c97620 R11: 0000000000000001 R12: ffff8801dac00040
      R13: ffff8801dac00040 R14: 0000000000000000 R15: 00000000ffff8b7b
      FS:  0000000002119940(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020001fec CR3: 00000001d2980000 CR4: 00000000001406f0
      Call Trace:
       __do_kmalloc_node mm/slab.c:3688 [inline]
       __kmalloc_node+0x33/0x70 mm/slab.c:3696
       kmalloc_node include/linux/slab.h:535 [inline]
       alloc_htab_elem+0x2a8/0x480 kernel/bpf/hashtab.c:740
       htab_map_update_elem+0x740/0xb80 kernel/bpf/hashtab.c:820
       map_update_elem kernel/bpf/syscall.c:587 [inline]
       SYSC_bpf kernel/bpf/syscall.c:1468 [inline]
       SyS_bpf+0x20c5/0x4c40 kernel/bpf/syscall.c:1443
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x440409
      RSP: 002b:00007ffd1f1792b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440409
      RDX: 0000000000000020 RSI: 0000000020006000 RDI: 0000000000000002
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401d70
      R13: 0000000000401e00 R14: 0000000000000000 R15: 0000000000000000
      Code: 83 c2 01 89 50 18 4c 03 70 08 e8 38 f4 ff ff 4d 85 f6 0f 85 3e ff ff ff 44 89 fe 4c 89 ef e8 94 fb ff ff 49 89 c6 e9 2b ff ff ff <0f> 0b 0f 0b 0f 0b 66 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41
      RIP: ____cache_alloc_node+0x1d4/0x1e0 mm/slab.c:3292 RSP: ffff8801c0c97638
      ---[ end trace d745f355da2e33ce ]---
      Kernel panic - not syncing: Fatal exception
      
      Fixes: 96eabe7a ("bpf: Allow selecting numa node during map creation")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96e5ae4e
  2. 04 Sep, 2017 38 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 2ff81cd3
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for next-net (part 2)
      
      The following patchset contains Netfilter updates for net-next. This
      patchset includes updates for nf_tables, removal of
      CONFIG_NETFILTER_DEBUG and a new mode for xt_hashlimit. More
      specifically, they:
      
      1) Add new rate match mode for hashlimit, this introduces a new revision
         for this match. The idea is to stop matching packets until ratelimit
         criteria stands true. Patch from Vishwanath Pai.
      
      2) Add ->select_ops indirection to nf_tables named objects, so we can
         choose between different flavours of the same object type, patch from
         Pablo M. Bermudo.
      
      3) Shorter function names in nft_limit, basically:
         s/nft_limit_pkt_bytes/nft_limit_bytes, also from Pablo M. Bermudo.
      
      4) Add new stateful limit named object type, this allows us to create
         limit policies that you can identify via name, also from Pablo.
      
      5) Remove unused hooknum parameter in conntrack ->packet indirection.
         From Florian Westphal.
      
      6) Patches to remove CONFIG_NETFILTER_DEBUG and macros such as
         IP_NF_ASSERT and IP_NF_ASSERT. From Varsha Rao.
      
      7) Add nf_tables_updchain() helper function and use it from
         nf_tables_newchain() to make it more maintainable. Similarly,
         add nf_tables_addchain() and use it too.
      
      8) Add new netlink NLM_F_NONREC flag, this flag should only be used for
         deletion requests, specifically, to support non-recursive deletion.
         Based on what we discussed during NFWS'17 in Faro.
      
      9) Use NLM_F_NONREC from table and sets in nf_tables.
      
      10) Support for recursive chain deletion. Table and set deletion
          commands come with an implicit content flush on deletion, while
          chains do not. This patch addresses this inconsistency by adding
          the code to perform recursive chain deletions. This also comes with
          the bits to deal with the new NLM_F_NONREC netlink flag.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ff81cd3
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: support for recursive chain deletion · 9dee1474
      Pablo Neira Ayuso authored
      This patch sorts out an asymmetry in deletions. Currently, table and set
      deletion commands come with an implicit content flush on deletion.
      However, chain deletion results in -EBUSY if there is content in this
      chain, so no implicit flush happens. So you have to send a flush command
      in first place to delete chains, this is inconsistent and it can be
      annoying in terms of user experience.
      
      This patch uses the new NLM_F_NONREC flag to request non-recursive chain
      deletion, ie. if the chain to be removed contains rules, then this
      returns EBUSY. This problem was discussed during the NFWS'17 in Faro,
      Portugal. In iptables, you hit -EBUSY if you try to delete a chain that
      contains rules, so you have to flush first before you can remove
      anything. Since iptables-compat uses the nf_tables netlink interface, it
      has to use the NLM_F_NONREC flag from userspace to retain the original
      iptables semantics, ie.  bail out on removing chains that contain rules.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      9dee1474
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: use NLM_F_NONREC for deletion requests · a8278400
      Pablo Neira Ayuso authored
      Bail out if user requests non-recursive deletion for tables and sets.
      This new flags tells nf_tables netlink interface to reject deletions if
      tables and sets have content.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a8278400
    • Pablo Neira Ayuso's avatar
      netlink: add NLM_F_NONREC flag for deletion requests · 2335ba70
      Pablo Neira Ayuso authored
      In the last NFWS in Faro, Portugal, we discussed that netlink is lacking
      the semantics to request non recursive deletions, ie. do not delete an
      object iff it has child objects that hang from this parent object that
      the user requests to be deleted.
      
      We need this new flag to solve a problem for the iptables-compat
      backward compatibility utility, that runs iptables commands using the
      existing nf_tables netlink interface. Specifically, custom chains in
      iptables cannot be deleted if there are rules in it, however, nf_tables
      allows to remove any chain that is populated with content. To sort out
      this asymmetry, iptables-compat userspace sets this new NLM_F_NONREC
      flag to obtain the same semantics that iptables provides.
      
      This new flag should only be used for deletion requests. Note this new
      flag value overlaps with the existing:
      
      * NLM_F_ROOT for get requests.
      * NLM_F_REPLACE for new requests.
      
      However, those flags should not ever be used in deletion requests.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2335ba70
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: add nf_tables_addchain() · 4035285f
      Pablo Neira Ayuso authored
      Wrap the chain addition path in a function to make it more maintainable.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      4035285f
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: add nf_tables_updchain() · 2c4a488a
      Pablo Neira Ayuso authored
      nf_tables_newchain() is too large, wrap the chain update path in a
      function to make it more maintainable.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2c4a488a
    • Varsha Rao's avatar
      net: Remove CONFIG_NETFILTER_DEBUG and _ASSERT() macros. · 9efdb14f
      Varsha Rao authored
      This patch removes CONFIG_NETFILTER_DEBUG and _ASSERT() macros as they
      are no longer required. Replace _ASSERT() macros with WARN_ON().
      Signed-off-by: default avatarVarsha Rao <rvarsha016@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      9efdb14f
    • Varsha Rao's avatar
      net: Replace NF_CT_ASSERT() with WARN_ON(). · 44d6e2f2
      Varsha Rao authored
      This patch removes NF_CT_ASSERT() and instead uses WARN_ON().
      Signed-off-by: default avatarVarsha Rao <rvarsha016@gmail.com>
      44d6e2f2
    • Florian Westphal's avatar
      netfilter: remove unused hooknum arg from packet functions · d1c1e39d
      Florian Westphal authored
      tested with allmodconfig build.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      d1c1e39d
    • Pablo M. Bermudo Garay's avatar
      netfilter: nft_limit: add stateful object type · a6912055
      Pablo M. Bermudo Garay authored
      Register a new limit stateful object type into the stateful object
      infrastructure.
      Signed-off-by: default avatarPablo M. Bermudo Garay <pablombg@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a6912055
    • Pablo M. Bermudo Garay's avatar
      netfilter: nft_limit: replace pkt_bytes with bytes · 6e323887
      Pablo M. Bermudo Garay authored
      Just a small refactor patch in order to improve the code readability.
      Signed-off-by: default avatarPablo M. Bermudo Garay <pablombg@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6e323887
    • Pablo M. Bermudo Garay's avatar
      netfilter: nf_tables: add select_ops for stateful objects · dfc46034
      Pablo M. Bermudo Garay authored
      This patch adds support for overloading stateful objects operations
      through the select_ops() callback, just as it is implemented for
      expressions.
      
      This change is needed for upcoming additions to the stateful objects
      infrastructure.
      Signed-off-by: default avatarPablo M. Bermudo Garay <pablombg@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      dfc46034
    • Vishwanath Pai's avatar
      netfilter: xt_hashlimit: add rate match mode · bea74641
      Vishwanath Pai authored
      This patch adds a new feature to hashlimit that allows matching on the
      current packet/byte rate without rate limiting. This can be enabled
      with a new flag --hashlimit-rate-match. The match returns true if the
      current rate of packets is above/below the user specified value.
      
      The main difference between the existing algorithm and the new one is
      that the existing algorithm rate-limits the flow whereas the new
      algorithm does not. Instead it *classifies* the flow based on whether
      it is above or below a certain rate. I will demonstrate this with an
      example below. Let us assume this rule:
      
      iptables -A INPUT -m hashlimit --hashlimit-above 10/s -j new_chain
      
      If the packet rate is 15/s, the existing algorithm would ACCEPT 10
      packets every second and send 5 packets to "new_chain".
      
      But with the new algorithm, as long as the rate of 15/s is sustained,
      all packets will continue to match and every packet is sent to new_chain.
      
      This new functionality will let us classify different flows based on
      their current rate, so that further decisions can be made on them based on
      what the current rate is.
      
      This is how the new algorithm works:
      We divide time into intervals of 1 (sec/min/hour) as specified by
      the user. We keep track of the number of packets/bytes processed in the
      current interval. After each interval we reset the counter to 0.
      
      When we receive a packet for match, we look at the packet rate
      during the current interval and the previous interval to make a
      decision:
      
      if [ prev_rate < user and cur_rate < user ]
              return Below
      else
              return Above
      
      Where cur_rate is the number of packets/bytes seen in the current
      interval, prev is the number of packets/bytes seen in the previous
      interval and 'user' is the rate specified by the user.
      
      We also provide flexibility to the user for choosing the time
      interval using the option --hashilmit-interval. For example the user can
      keep a low rate like x/hour but still keep the interval as small as 1
      second.
      
      To preserve backwards compatibility we have to add this feature in a new
      revision, so I've created revision 3 for hashlimit. The two new options
      we add are:
      
      --hashlimit-rate-match
      --hashlimit-rate-interval
      
      I have updated the help text to add these new options. Also added a few
      tests for the new options.
      Suggested-by: default avatarIgor Lubashev <ilubashe@akamai.com>
      Reviewed-by: default avatarJosh Hunt <johunt@akamai.com>
      Signed-off-by: default avatarVishwanath Pai <vpai@akamai.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      bea74641
    • David S. Miller's avatar
      Merge branch 'for-upstream' of... · 45865dab
      David S. Miller authored
      Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
      
      Johan Hedberg says:
      
      ====================
      pull request: bluetooth-next 2017-09-03
      
      Here's one last bluetooth-next pull request for the 4.14 kernel:
      
       - NULL pointer fix in ca8210 802.15.4 driver
       - A few "const" fixes
       - New Kconfig option for disabling legacy interfaces
      
      Please let me know if there are any issues pulling. Thanks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45865dab
    • David S. Miller's avatar
      Merge branch 'qualcomm-rmnet-Fix-comments-on-initial-patchset' · f98ce389
      David S. Miller authored
      Subash Abhinov Kasiviswanathan says:
      
      ====================
      net: qualcomm: rmnet: Fix comments on initial patchset
      
      This series fixes the comments from Dan on the first patch series.
      
      Fixes a memory corruption which could occur if mux_id was higher than 32.
      Remove the RMNET_LOCAL_LOGICAL_ENDPOINT which is no longer used.
      Make a log message more useful.
      Combine __rmnet_set_endpoint_config() with rmnet_set_endpoint_config().
      Set the mux_id in rmnet_vnd_newlink().
      Set the ingress and egress data format directly in newlink.
      Implement ndo_get_iflink to find the real_dev.
      Rename the real_dev_info to port to make it similar to other drivers.
      
      The conversion of rmnet_devices to a list and hash lookup will be sent
      as part of a seperate patch.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f98ce389
    • Subash Abhinov Kasiviswanathan's avatar
      net: qualcomm: rmnet: Rename real_dev_info to port · b665f4f8
      Subash Abhinov Kasiviswanathan authored
      Make it similar to drivers like ipvlan / macvlan so it is easier to read.
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Cc: Dan Williams <dcbw@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b665f4f8
    • Subash Abhinov Kasiviswanathan's avatar
      net: qualcomm: rmnet: Implement ndo_get_iflink · b752eff5
      Subash Abhinov Kasiviswanathan authored
      This makes it easier to find out the parent dev.
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Cc: Dan Williams <dcbw@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b752eff5
    • Subash Abhinov Kasiviswanathan's avatar
      net: qualcomm: rmnet: Refactor the new rmnet dev creation · 032ee468
      Subash Abhinov Kasiviswanathan authored
      Data format can be directly set from rmnet_newlink() since the
      rmnet real dev info is already available.
      
      Since __rmnet_get_real_dev_info() is no longer used in rmnet_config.c
      after removal of those functions, move content to
      rmnet_get_real_dev_info().
      
      __rmnet_set_endpoint_config() is collapsed into
      rmnet_set_endpoint_config() since only mux_id was being set additionally
      within it. Remove an unnecessary mux_id check.
      
      Set the mux_id for the rmnet_dev within rmnet_vnd_newlink() itself.
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Cc: Dan Williams <dcbw@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      032ee468
    • Subash Abhinov Kasiviswanathan's avatar
      net: qualcomm: rmnet: Move the device creation log · 2d516c0d
      Subash Abhinov Kasiviswanathan authored
      The current log is not very useful as it does not log the device
      name since it it is prior to registration -
      
      (unnamed net_device) (uninitialized): Setting up device
      
      Modify to log after the device registration -
      
      rmnet1: rmnet dev created
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d516c0d
    • Subash Abhinov Kasiviswanathan's avatar
      net: qualcomm: rmnet: Remove the unused endpoint -1 · 61bf5490
      Subash Abhinov Kasiviswanathan authored
      This was used only in the original patch series where the IOCTLs were
      present and is no longer in use.
      
      Fixes: ceed73a2 ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Cc: Dan Williams <dcbw@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61bf5490
    • Subash Abhinov Kasiviswanathan's avatar
      net: qualcomm: rmnet: Fix memory corruption if mux_id is greater than 32 · 009e1b2b
      Subash Abhinov Kasiviswanathan authored
      rmnet_rtnl_validate() was checking for upto mux_id 254, however the
      rmnet_devices devices could hold upto 32 entries only. Fix this by
      increasing the size of the rmnet_devices.
      
      Fixes: ceed73a2 ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Cc: Dan Williams <dcbw@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      009e1b2b
    • David S. Miller's avatar
      Merge branch 'nfp-refactor-app-init-and-minor-flower-fixes' · 3cf2e08f
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      nfp: refactor app init, and minor flower fixes
      
      This series is a part 2 to what went into net as a simpler fix.
      In net we simply moved when existing callbacks are invoked to
      ensure flower app does not still use representors when lower
      netdev has already been destroyed.  In this series we add a
      callback to notify apps when vNIC netdevs are fully initialized
      and they are about to be destroyed.  This allows flower to spawn
      representors at the right time, while keeping the start/stop
      callbacks for what they are intended to be used - FW initialization
      over control channel.
      
      Patch 4 improves drop monitor interaction and patch 5 changes
      the default Kconfig selection of flower offload.  Patch 6 fixes
      locking around representor updates which got lost in net-next.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3cf2e08f
    • Jakub Kicinski's avatar
      nfp: flower: restore RTNL locking around representor updates · 9ce4fa54
      Jakub Kicinski authored
      When we moved to updating representors from a workqueue grabbing
      the RTNL somehow got lost in the process.  Restore it, and make
      sure RCU lock is not held while we are grabbing the RTNL.  RCU
      protects the representor table, so since we will be under RTNL
      we can drop RCU lock as soon as we find the netdev pointer.
      RTNL is needed for the dev_set_mtu() call.
      
      Fixes: 2dff1962 ("nfp: process MTU updates from firmware flower app")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ce4fa54
    • Jakub Kicinski's avatar
      nfp: build the flower offload by default · 7c8a2d8b
      Jakub Kicinski authored
      It's reasonable to assume that if user selects to build the NFP
      driver all offload capabilities will be enabled by default.
      Change the CONFIG_NFP_APP_FLOWER to default to enabled.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c8a2d8b
    • Jakub Kicinski's avatar
      nfp: be drop monitor friendly · 023a9284
      Jakub Kicinski authored
      Use dev_consume_skb_any() in place of dev_kfree_skb_any()
      when control frame has been successfully processed in flower
      and on the driver's main TX completion path.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      023a9284
    • Jakub Kicinski's avatar
      nfp: move the start/stop app callbacks back · 9d8b17be
      Jakub Kicinski authored
      Since representors are now created with a separate callback
      start/stop app callbacks can be moved again to their original
      location.  They are intended to app-specific init/clean up
      over the control channel.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d8b17be
    • Jakub Kicinski's avatar
      nfp: flower: base lifetime of representors on existence of lower vNIC · 192e6851
      Jakub Kicinski authored
      Create representors after lower vNIC is registered and destroy
      them before it is destroyed.  Move the code out of start/stop
      callbacks directly into vnic_init/clean callbacks.  Make sure
      SR-IOV callbacks don't try to create representors when lower
      device does not exist.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      192e6851
    • Jakub Kicinski's avatar
      nfp: separate app vNIC init/clean from alloc/free · c496291c
      Jakub Kicinski authored
      We currently only have one app callback for vNIC creation
      and destruction.  This is insufficient, because some actions
      have to be taken before netdev is registered, after it's
      registered and after it's unregistered.  Old callbacks
      were really corresponding to alloc/free actions.  Rename
      them and add proper init/clean.  Apps using representors
      will be able to use new callbacks to manage lifetime of
      upper devices.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c496291c
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2017-09-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 18a4ded9
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2017-09-03
      
      This series from Tariq includes micro data path optimization for mlx5e
      netdevice driver.
      
      Mainly Tariq introduces the following changes to NAPI and RX handling
      path of the driver:
       - RX ring structure reorganizing
       - Trivial code refactoring and optimization
       - NAPI busy-poll for when fast UMR is in progress
       - Non-atomic state operations in NAPI context
       - Remove unnecessary fields from fast path structures
       - page-cache micro optimization
       - Rely on NAPI to avoid missing an IRQ for RX/TX shared NAPI contexts
       - Stop NAPI when irq changes affinity
       - Distribute RSS table among all RX rings
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18a4ded9
    • David S. Miller's avatar
      Merge branch 'mlxsw-Offloading-GRE-tunnels' · ccfdf21b
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      mlxsw: Offloading GRE tunnels
      
      Petr says:
      
      This patch series introduces to mlxsw driver support for offloading
      IP-in-IP tunnels in general, and for (subset of) GRE in particular.
      
      This patchset supports two ways of configuring GRE:
      
      - So called "hierarchical configuration", where the GRE device has a bound
        dummy device, which is in a different VRF. The VRF with host traffic is
        called "overlay", the one with encapsulated traffic is called "underlay".
      
      - So called "flat configuration", where the GRE device doesn't have a bound
        device, and overlay and underlay are both in the same VRF (possibly the
        default one).
      
      Two routes are then interesting: a route that directs traffic to a GRE
      device (which would typically be in overlay VRF, but could be in another
      one), and a local route for the tunnel's local address (in underlay).
      Handling of these two route types is then introduced as patches to support,
      respectively, IPv4 and IPv6 encapsulation and IPv4 decapsulation.
      
      The encap and decap routes then reference a loopback device, a new type of
      RIF introduced by this patchset for the specific use of offloading tunnels.
      
      The encap and decap code is abstract with respect to the particulars of
      individual L3 tunnel types. This patchset introduces support for GRE
      tunnels in particular.
      
      Limitations:
      
      - Each tunnel needs to have a different local address (within a given VRF).
        When two tunnels are used that are in conflict, FIB abort is triggered
        and the driver ceases offloading FIBs. Full handling of such
        configurations needs special setup in the hardware, such that the tunnels
        that share an address are dispatched correctly according to their key (or
        lack thereof). That's currently not implemented, and to keep things
        deterministic, the driver triggers FIB abort.
      
      - A next hop that uses an incompletely-specified tunnel (e.g. such that are
        used for LWT) is not offloaded, but doesn't trigger FIB abort like the
        above. If such routes end up being in a de facto conflict with other
        tunnels, then if there already is an offload for that address, the
        traffic for the conflicting tunnel will end up mismatching the
        configuration of the offloaded tunnel, and thus gets to slow path through
        an error trap.
      
      - GRE checksumming and sequence numbers are not supported and TTL and TOS
        need to be set to inherit. Tunnels with a different configuration are not
        offloaded and their traffic is trapping to slow path.
      
        Note in particular that TOS of inherit is not the default configuration
        and needs to be explicitly specified when the tunnel is created.
      
      - The only feature that is not graciously handled is that if a change is
        made to the tunnel, e.g. through "ip tunnel change", such changes are not
        reflected in the driver. There is currently no notification mechanism for
        these changes. Introduction of this mechanism and its leverage in the
        driver will be subject of follow-up work. For now this limitation can be
        worked around by removing and re-adding the encap route.
      
      ---
      v1->v2:
      -fix order of patch 5
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ccfdf21b
    • Petr Machata's avatar
      mlxsw: spectrum_router: Support GRE tunnels · ee954d1a
      Petr Machata authored
      This patch introduces callbacks and tunnel type to offload GRE tunnels.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee954d1a
    • Petr Machata's avatar
      mlxsw: spectrum_router: Add loopback accessors · 92107cfb
      Petr Machata authored
      struct mlxsw_sp_rif is a router-private structure, and therefore
      everything related to it is as well: parameters, and derived RIF types
      including loopbacks. IPIP module needs access to some details of
      loopback interfaces, but exporting all the RIF shebang would create too
      large an interface.
      
      So instead export just the bare minimum necessary: accessors for RIF
      index and underlay VRF ID.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92107cfb
    • Petr Machata's avatar
      mlxsw: spectrum: Register for IPIP_DECAP_ERROR trap · 86484de2
      Petr Machata authored
      These traps are generated for packets that fail checks for source IP,
      encapsulation type, or GRE key. Trap these packets to CPU for follow-up
      handling by the kernel, which will send ICMP destination unreachable
      responses.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86484de2
    • Petr Machata's avatar
      mlxsw: spectrum_router: Use existing decap route · 1cc38fb1
      Petr Machata authored
      The local route that points at IPIP's underlay device (decap route) can
      be present long before the GRE device. Thus when an encap route is
      added, it's necessary to look inside the underlay FIB if the decap route
      is already present. If so, the current trap offload needs to be
      withdrawn and replaced with a decap offload.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cc38fb1
    • Petr Machata's avatar
      mlxsw: spectrum_router: Support IPv4 underlay decap · 4607f6d2
      Petr Machata authored
      Unlike encapsulation, which is represented by a next hop forwarding to
      an IPIP tunnel, decapsulation is a type of local route. It is created
      for local routes whose prefix corresponds to the local address of one of
      offloaded IPIP tunnels. When the tunnel is removed (i.e. all the encap
      next hops are removed), the decap offload is migrated back to a trap for
      resolution in slow path.
      
      This patch assumes that decap route is already present when encap route
      is added. A follow-up patch will fix this issue.
      
      Note that this patch only supports IPv4 underlay. Support for IPv6
      underlay will be subject to follow-up work apart from this patchset.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4607f6d2
    • Petr Machata's avatar
      mlxsw: spectrum_router: Support IPv6 overlay encap · 8f28a309
      Petr Machata authored
      Add the missing bits to recognize IPv6 next hops as IPIP ones to enable
      offloading of IPv6 overlay encapsulation.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f28a309
    • Petr Machata's avatar
      mlxsw: spectrum_router: Support IPv4 overlay encap · 1012b9ac
      Petr Machata authored
      This introduces some common code for tracking of offloaded IP-in-IP
      tunnels, and support for offloading IPv4 overlay encapsulating routes in
      particular. A follow-up patch will introduce IPv6 overlay as well.
      
      Offloaded tunnels are kept in a linked list of mlxsw_sp_ipip_entry
      objects hooked up in mlxsw_sp_router. A network device that represents
      the tunnel is used as a key to look up the corresponding IPIP entry.
      Note that in the future, more general keying mechanism will be needed,
      because parts of the tunnel information can be provided by the route.
      
      IPIP entries are reference counted, because several next hops may end up
      using the same tunnel, and we only want to offload it once.
      
      Encapsulation path hooks into next hop handling. Routes that forward to
      a tunnel are now considered gateway routes, thus giving them the same
      treatment that other remote routes get. An IPIP next hop type is
      introduced.
      
      Details of individual tunnel types are kept in an array of
      mlxsw_sp_ipip_ops objects. If a tunnel type doesn't match any of the
      known tunnel types, the next-hop is not considered an IPIP next hop.
      
      The list of IPIP tunnel types is currently empty, follow-up patches will
      add support for GRE. Traffic to IPIP tunnel types that are not
      explicitly recognized by the driver traps and is handled in slow path.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1012b9ac
    • Petr Machata's avatar
      mlxsw: spectrum_router: Make nexthops typed · 35225e47
      Petr Machata authored
      In the router, some next hops may reference an encapsulating netdevice,
      such as GRE or IPIP. To properly offload these next hops, mlxsw needs to
      keep track of whether a given next hop is a regular Ethernet entry, or
      an IP-in-IP tunneling entry.
      
      To facilitate this book-keeping, add a type field to struct
      mlxsw_sp_nexthop. There is, as of this patch, only one next hop type:
      MLXSW_SP_NEXTHOP_TYPE_ETH. Follow-up patches will introduce the IP-in-IP
      variant.
      
      There are several places where next hops are initialized in the IPv4
      path. Instead of replicating the logic at every one of them, factor it
      out to a function mlxsw_sp_nexthop4_type_init(). The corresponding fini
      is actually protocol-neutral, so put it to mlxsw_sp_nexthop_type_fini(),
      but create a corresponding protocoled _fini function that dispatches to
      the protocol-neutral one.
      
      The IPv6 path is simpler, but for symmetry with IPv4, create the same
      suite of functions with corresponding logic.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35225e47