1. 24 Mar, 2017 38 commits
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · ba82427d
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2017-03-23
      
      This series contains updates to i40e and i40e.txt documentation.
      
      Jake provides all the changes in the series which are centered around
      ntuple filter fixes and additional support.  Fixed the current
      implementation of .set_rxnfc, where we were not reading the mask field
      for filter entries which was resulting in filters not behaving as
      expected and not working correctly.  When cleaning up after disabling
      flow director support, ensure that the default input set is correctly
      reprogrammed.  Since the hardware only supports a single input set for
      all flows of that type, the driver shall only allow the input set to
      change if there are no other configured filters for that flow type, so
      add support to detect when we can update the input set for each flow
      type.  Align the driver to other drivers to partition the ring_cookie
      value into 8bits of VF index, along with 32bits of queue number instead
      of using the user-def field.  Added support to parse the user-def field
      into a data structure format to allow future extensions of the user-def
      filed by keeping all the code that read/writes the field into a single
      location.  Added support for flexible payloads passed via ethtool
      user-def field.  We support a single flexible word (2byte) value per
      protocol type, and we handle the FLX_PIT register using a list of
      flexible entries so that each flow type may be configured separately.
      Enabled flow director filters for SCTPv4 packets using the ethtool
      ntuple interface to enable filters.  Updated the documentation on the
      i40e driver to include the newly added support to ntuple filters.
      Reduced complexity of a if-continue-else-break section of code by
      taking advantage of using hlist_for_each_entry_continue() instead.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba82427d
    • David Ahern's avatar
      net: mpls: Fix setting ttl_propagate for rt2 · 6a18c312
      David Ahern authored
      Fix copy and paste error setting rt_ttl_propagate.
      
      Fixes: 5b441ac8 ("mpls: allow TTL propagation to IP packets to be configured")
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: default avatarRobert Shearman <rshearma@brocade.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a18c312
    • Gao Feng's avatar
      tcp: sysctl: Fix a race to avoid unexpected 0 window from space · c4836742
      Gao Feng authored
      Because sysctl_tcp_adv_win_scale could be changed any time, so there
      is one race in tcp_win_from_space.
      For example,
      1.sysctl_tcp_adv_win_scale<=0 (sysctl_tcp_adv_win_scale is negative now)
      2.space>>(-sysctl_tcp_adv_win_scale) (sysctl_tcp_adv_win_scale is postive now)
      
      As a result, tcp_win_from_space returns 0. It is unexpected.
      
      Certainly if the compiler put the sysctl_tcp_adv_win_scale into one
      register firstly, then use the register directly, it would be ok.
      But we could not depend on the compiler behavior.
      Signed-off-by: default avatarGao Feng <fgao@ikuai8.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4836742
    • Alexey Dobriyan's avatar
      net: make in_aton() 32-bit internally · e013fb7c
      Alexey Dobriyan authored
      Converting IPv4 address doesn't need 64-bit arithmetic.
      
      Space savings: 10 bytes!
      
      	add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-10 (-10)
      	function                          old     new   delta
      	in_aton                            96      86     -10
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e013fb7c
    • Felix Manlunas's avatar
      liquidio: do not reset Octeon if NIC firmware was preloaded · 7cc61db9
      Felix Manlunas authored
      The PF driver is incorrectly resetting Octeon when the module parameter
      "fw_type=none" is there.  "fw_type=none" means the PF should not load any
      firmware to the NIC because Octeon is already running preloaded firmware.
      
      Fix it by putting an if (fw_type != none) around the reset code.
      
      Because the Octeon reset is now conditionally gone, when unloading the
      driver, conditionally send the RESET_PF command to the firmware who will
      then free up PF-related data structures.
      Signed-off-by: default avatarFelix Manlunas <felix.manlunas@cavium.com>
      Signed-off-by: default avatarSatanand Burla <satananda.burla@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7cc61db9
    • subashab@codeaurora.org's avatar
      net: Add sysctl to toggle early demux for tcp and udp · dddb64bc
      subashab@codeaurora.org authored
      Certain system process significant unconnected UDP workload.
      It would be preferrable to disable UDP early demux for those systems
      and enable it for TCP only.
      
      By disabling UDP demux, we see these slight gains on an ARM64 system-
      782 -> 788Mbps unconnected single stream UDPv4
      633 -> 654Mbps unconnected UDPv4 different sources
      
      The performance impact can change based on CPU architecure and cache
      sizes. There will not much difference seen if entire UDP hash table
      is in cache.
      
      Both sysctls are enabled by default to preserve existing behavior.
      
      v1->v2: Change function pointer instead of adding conditional as
      suggested by Stephen.
      
      v2->v3: Read once in callers to avoid issues due to compiler
      optimizations. Also update commit message with the tests.
      
      v3->v4: Store and use read once result instead of querying pointer
      again incorrectly.
      
      v4->v5: Refactor to avoid errors due to compilation with IPV6={m,n}
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Tom Herbert <tom@herbertland.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dddb64bc
    • David S. Miller's avatar
      Merge branch 'systemport-tx-napi-improvements' · 8fa96e3b
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      net: systemport: TX/NAPI improvements
      
      This patch series builds up on Doug's latest changes done in BCMGENET to reduce
      the number of spurious interrupts in NAPI, simplify pointer arithmetic and
      finally tracking of per TX ring statistics to be SMP friendly.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8fa96e3b
    • Florian Fainelli's avatar
      net: systemport: Simplify circular pointer arithmetic · e9d7af78
      Florian Fainelli authored
      Similar to c298ede2 ("net: bcmgenet: simplify circular pointer
      arithmetic") we don't need to complex arthimetic since we always have a
      ring size that is a power of 2.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9d7af78
    • Florian Fainelli's avatar
      net: systemport: Clear status to reduce spurious interrupts · 6baa785a
      Florian Fainelli authored
      Do something similar to commit d5810ca3 ("net: bcmgenet: clear
      status to reduce spurious interrupts") and clear interrupts right before
      servicing them. This reduces the number of interrupts by 10K
      interrupts/sec for a TX TCP session 1Gbits/sec.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6baa785a
    • Florian Fainelli's avatar
      net: systemport: Track per TX ring statistics · 30defeb2
      Florian Fainelli authored
      bcm_sysport_tx_reclaim_one() is currently summing TX bytes/packets in a
      way that is not SMP friendly, mutliples CPUs could run
      bcm_sysport_tx_reclaim_one() independently and still update
      stats->tx_bytes and stats->tx_packets, cloberring the other CPUs
      statistics.
      
      Fix this by tracking per TX rings the number of bytes, packets,
      dropped and errors statistics, and provide a bcm_sysport_get_nstats()
      function which aggregates everything and returns a consistent output.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30defeb2
    • David S. Miller's avatar
      Merge branch 'phy-mdio-split' · 12459cbd
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      net: phy: Allow splitting MDIO bus/device support
      
      This patch series allows building support for MDIO bus controllers which
      are sometimes usable and necessary in cases where there are no Ethernet PHYs.
      
      Changes in v3:
      - corrected of_mdio compile guards for prototypes vs. stubs
      - added a missing OF_MDIO dependency for MDIO_BCM_UNIMAC
      - fixed Kbuild bot reported errors against mdio-bitbang
      
      Changes in v2:
      - implement Russell's feedback
      - solve the circular dependency in the CONFIG_MDIO_DEVICE + CONFIG_PHYLIB case
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      12459cbd
    • Florian Fainelli's avatar
      net: phy: Allow splitting MDIO bus/device support from PHYs · 90eff909
      Florian Fainelli authored
      Introduce a new configuration symbol: MDIO_DEVICE which allows building
      the MDIO devices and bus code, without pulling in the entire Ethernet
      PHY library and devices code.
      
      PHYLIB nows select MDIO_DEVICE and the relevant Makefile files are
      updated to reflect that.
      
      When MDIO_DEVICE (MDIO bus/device only) is selected, but not PHYLIB, we
      have mdio-bus.ko as a loadable module, and it does not have a
      module_exit() function because the safety of removing a bus class is
      unclear.
      
      When both MDIO_DEVICE and PHYLIB are enabled, we need to assemble
      everything into a common loadable module: libphy.ko because of nasty
      circular dependencies between phy.c, phy_device.c and mdio_bus.c which
      are really tough to untangle.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90eff909
    • Florian Fainelli's avatar
      net: phy: MDIO_BCM_UNIMAC should depend on OF_MDIO · 17487eeb
      Florian Fainelli authored
      The Broadcom MDIO UniMAC driver uses routines provided by of_mdio.c which is
      guarded by CONFIG_OF_MDIO.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17487eeb
    • Florian Fainelli's avatar
      of_mdio: Correct check against CONFIG_OF · e6e14f63
      Florian Fainelli authored
      CONFIG_OF_MDIO is actually what triggers the build of drivers/of/of_mdio.c, so
      providing inline stubs when CONFIG_OF_MDIO=y should be based on that symbol as
      well.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6e14f63
    • Jiri Pirko's avatar
      net: sched: choke: remove dead filter classify code · 5952fde1
      Jiri Pirko authored
      sch_choke is classless qdisc so it does not define cl_ops. Therefore
      filter_list cannot be ever changed, being NULL all the time.
      Reason is this check in tc_ctl_tfilter:
      
      	/* Is it classful? */
      	cops = q->ops->cl_ops;
      	if (!cops)
      		return -EINVAL;
      
      So remove this dead code.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5952fde1
    • LABBE Corentin's avatar
      net: stmmac: add set_mac to the stmmac_ops · 270c7759
      LABBE Corentin authored
      Two different set_mac functions exists but stmmac_dwmac4_set_mac() is
      only used for enabling and never for disabling.
      So on dwmac4, the MAC RX/TX is never disabled.
      
      This patch add a generic function pointer set_mac() to stmmac_ops and
      replace all call to stmmac_set_mac/stmmac_dwmac4_set_mac by a call to
      this pointer.
      
      Since dwmac4_ops is const, set_mac cannot be modified after, and so dwmac4_ops
      is duplioacted like dwmac4_dma_ops.
      Signed-off-by: default avatarCorentin Labbe <clabbe.montjoie@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      270c7759
    • Geliang Tang's avatar
      isdn: use setup_timer · aff55a36
      Geliang Tang authored
      Use setup_timer() instead of init_timer() to simplify the code.
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aff55a36
    • David S. Miller's avatar
      Merge branch 'bridge-ext-learned-entries' · 90966438
      David S. Miller authored
      Nikolay Aleksandrov says
      
      ====================
      net: bridge: allow user-space to add ext learned entries
      
      This set adds the ability to add externally learned entries from
      user-space. For symmetry and proper function we need to allow SW entries
      to take over HW learned ones (similar to how HW can take over SW entries
      currently) which is needed for our use case (evpn) where we have pure SW
      ports and HW ports mixed in a single bridge. This does not play well with
      switchdev devices currently because there's no feedback when the entry is
      taken over, but this case has never worked anyway and feedback can be
      easily added when needed.
      Patch 02 simply allows to use NTF_EXT_LEARNED from user-space, we already
      have Quagga patches that make use of this functionality.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90966438
    • Nikolay Aleksandrov's avatar
      net: bridge: allow to add externally learned entries from user-space · eb100e0e
      Nikolay Aleksandrov authored
      The NTF_EXT_LEARNED flag was added for switchdev and externally learned
      entries, but it can also be used for entries learned via a software
      in user-space which requires dynamic entries that do not expire.
      One such case that we have is with quagga and evpn which need dynamic
      entries but also require to age them themselves.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb100e0e
    • Nikolay Aleksandrov's avatar
      net: bridge: allow SW learn to take over HW fdb entries · 7e26bf45
      Nikolay Aleksandrov authored
      Allow to take over an entry which was previously learned via HW when it
      shows up from a SW port. This is analogous to how HW takes over SW learned
      entries already.
      Suggested-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e26bf45
    • Ido Schimmel's avatar
      mlxsw: Remove debugfs interface · 9a32562b
      Ido Schimmel authored
      We don't use it during development and we can't extend it either, so
      remove it.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a32562b
    • Jacob Keller's avatar
      i40e: make use of hlist_for_each_entry_continue · 584a8870
      Jacob Keller authored
      Replace a complex if->continue->else->break construction in
      i40e_next_filter. We can simply use hlist_for_each_entry_continue
      instead. This drops a lot of confusing code. The resulting code is much
      easier to understand the intention, and follows the more normal pattern
      for using hlist loops. We could have also used a break with a "return
      next" at the end of the function, instead of return NULL, but the
      current implementation is explicitly clear that when you reach the end
      of the loop you get a NULL value. The alternative construction is less
      clear since the reader would have to know that next is NULL at the end
      of the loop.
      
      Change-Id: Ife74ca451dd79d7f0d93c672bd42092d324d4a03
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      584a8870
    • Jacob Keller's avatar
      i40e: document drivers use of ntuple filters · 55877012
      Jacob Keller authored
      Add documentation describing the drivers use of ethtool ntuple filters,
      including the limitations that it has due to hardware, as well as how it
      reads and parses the user-def data block.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      55877012
    • Jacob Keller's avatar
      i40e: add support for SCTPv4 FDir filters · f223c875
      Jacob Keller authored
      Enable FDir filters for SCTPv4 packets using the ethtool ntuple
      interface to enable filters. The ethtool API does not allow masking on
      the verification tag.
      
      Change-Id: I093e88a8143994c7e6f4b7b17a0bd5cf861d18e4
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      f223c875
    • Jacob Keller's avatar
      i40e: implement support for flexible word payload · 0e588de1
      Jacob Keller authored
      Add support for flexible payloads passed via ethtool user-def field.
      This support is somewhat limited due to hardware design. The input set
      can only be programmed once per filter type, and the flexible offset is
      part of this filter input set. This means that the user cannot program
      both a regular and a flexible filter at the same time for a given flow
      type. Additionally, the user may not program two flexible filters of the
      same flow type with different offsets, although they are allowed to
      configure different values at that offset location.
      
      We support a single flexible word (2byte) value per protocol type, and
      we handle the FLX_PIT register using a list of flexible entries so that
      each flow type may be configured separately.
      
      Due to hardware implementation, the flexible data is offset from the
      start of the packet payload, and thus may not be in part of the header
      data. For this reason, the offset provided by the user defined data is
      interpreted as a byte offset from the start of the matching payload.
      Previous implementations have tried to represent the offset as from the
      start of the frame, but this is not feasible because header sizes may
      change due to options.
      
      Change-Id: 36ed27995e97de63f9aea5ade5778ff038d6f811
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      0e588de1
    • Jacob Keller's avatar
      i40e: add parsing of flexible filter fields from userdef · e793095e
      Jacob Keller authored
      Add code to parse the user-def field into a data structure format. This
      code is intended to allow future extensions of the user-def field by
      keeping all code that actually reads and writes the field into a single
      location. This ensures that we do not litter the driver with references
      to the user-def field and minimizes the amount of bitwise operations we
      need to do on the data.
      
      Add code which parses the lower 32bits into a flexible word and its
      offset. This will be used in a future patch to enable flexible filters
      which can match on some arbitrary data in the packet payload. For now,
      we just return -EOPNOTSUPP when this is used.
      
      Add code to fill in the user-def field when reporting the filter back,
      even though we don't actually implement any user-def fields yet.
      
      Additionally, ensure that we mask the extended FLOW_EXT bit from the
      flow_type now that we will be accepting filters which have the FLOW_EXT
      bit set (and thus make use of the user-def field).
      
      Change-Id: I238845035c179380a347baa8db8223304f5f6dd7
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      e793095e
    • Jacob Keller's avatar
      i40e: partition the ring_cookie to get VF index · 43b15697
      Jacob Keller authored
      Do not use the user-def field for determining the VF target. Instead,
      similar to ixgbe, partition the ring_cookie value into 8bits of VF
      index, along with 32bits of queue number. This is better than using the
      user-def field, because it leaves the field open for extension in
      a future patch which will enable flexible data. Also, this matches with
      convention used by ixgbe and other drivers.
      
      Change-Id: Ie36745186d817216b12f0313b99ec95cb8a9130c
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      43b15697
    • Jacob Keller's avatar
      i40e: allow changing input set for ntuple filters · 9229e993
      Jacob Keller authored
      Add support to detect when we can update the input set for each flow
      type.
      
      Because the hardware only supports a single input set for all flows of
      that matching type, the driver shall only allow the input set to change
      if there are no other configured filters for that flow type.
      
      Thus, the first filter added for each flow type is allowed to change the
      input set, and all future filters must match the same input set. Display
      a diagnostic message whenever the filter input set changes, and
      a warning whenever a filter cannot be accepted because it does not match
      the configured input set.
      
      Change-Id: Ic22e1c267ae37518bb036aca4a5694681449f283
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      9229e993
    • Jacob Keller's avatar
      i40e: restore default input set for each flow type · 3bcee1e6
      Jacob Keller authored
      Ensure that the default input set is correctly reprogrammed when
      cleaning up after disabling flow director support. This ensures that the
      programmed value will be in a clean state.
      
      Although we do not yet have support for SCTPv4 filters, a future patch
      will add support for this protocol, so we will correctly restore the
      SCTPv4 input set here as well. Note that strictly speaking the default
      hardware value for SCTP includes matching the verification tag. However,
      the ethtool API does not have support for specifying this value, so
      there is no reason to keep the verification field enabled.
      
      This patch is the next step on the way to enabling partial tuple filters
      which will be implemented in a following patch.
      
      Change-Id: Ic22e1c267ae37518bb036aca4a5694681449f283
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      3bcee1e6
    • Jacob Keller's avatar
      i40e: check current configured input set when adding ntuple filters · 36777d9f
      Jacob Keller authored
      Do not assume that hardware has been programmed with the default mask,
      but instead read the input set registers to determine what is currently
      programmed. This ensures that all programmed filters match exactly how
      the hardware will interpret them, avoiding confusion regarding filter
      behavior.
      
      This sets the initial ground-work for allowing custom input sets where
      some fields are disabled. A future patch will fully implement this
      feature.
      
      Instead of using bitwise negation, we'll just explicitly check for the
      correct value. The use of htonl and htons are used to silence sparse
      warnings. The compiler should be able to handle the constant value and
      avoid actually performing a byteswap.
      
      Change-Id: I3d8db46cb28ea0afdaac8c5b31a2bfb90e3a4102
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      36777d9f
    • Jacob Keller's avatar
      i40e: correctly honor the mask fields for ETHTOOL_SRXCLSRLINS · faa16e0f
      Jacob Keller authored
      The current implementation of .set_rxnfc does not properly read the mask
      field for filter entries. This results in incorrect driver behavior, as
      we do not reject filters which have masks set to ignore some fields. The
      current implementation simply assumes that every part of the tuple or
      "input set" is specified. This results in filters not behaving as
      expected, and not working correctly.
      
      As a first step in supporting some partial filters, add code which
      checks the mask fields and rejects any filters which do not have an
      acceptable mask. For now, we just assume that all fields must be set.
      
      This will get the driver one step towards allowing some partial filters.
      At a minimum, the ethtool commands which previously installed filters
      that would not function will now return a non-zero exit code indicating
      failure instead.
      
      We should now be meeting the minimum requirements of the .set_rxnfc API,
      by ensuring that all filters we program have a valid mask value for each
      field.
      
      Finally, add code to report the mask correctly so that the ethtool
      command properly reports the mask to the user.
      
      Note that the typecast to (__be16) when checking source and destination
      port masks is required because the ~ bitwise negation operator does not
      correctly handle variables other than integer size.
      
      Change-Id: Ia020149e07c87aa3fcec7b2283621b887ef0546f
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      faa16e0f
    • Davide Caratti's avatar
      sched: act_csum: don't mangle TCP and UDP GSO packets · add641e7
      Davide Caratti authored
      after act_csum computes the checksum on skbs carrying GSO TCP/UDP packets,
      subsequent segmentation fails because skb_needs_check(skb, true) returns
      true. Because of that, skb_warn_bad_offload() is invoked and the following
      message is displayed:
      
      WARNING: CPU: 3 PID: 28 at net/core/dev.c:2553 skb_warn_bad_offload+0xf0/0xfd
      <...>
      
        [<ffffffff8171f486>] skb_warn_bad_offload+0xf0/0xfd
        [<ffffffff8161304c>] __skb_gso_segment+0xec/0x110
        [<ffffffff8161340d>] validate_xmit_skb+0x12d/0x2b0
        [<ffffffff816135d2>] validate_xmit_skb_list+0x42/0x70
        [<ffffffff8163c560>] sch_direct_xmit+0xd0/0x1b0
        [<ffffffff8163c760>] __qdisc_run+0x120/0x270
        [<ffffffff81613b3d>] __dev_queue_xmit+0x23d/0x690
        [<ffffffff81613fa0>] dev_queue_xmit+0x10/0x20
      
      Since GSO is able to compute checksum on individual segments of such skbs,
      we can simply skip mangling the packet.
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      add641e7
    • Jie Deng's avatar
      net: dwc-xlgmac: use dual license · 67ff2c71
      Jie Deng authored
      The driver "dwc-xlgmac" is dual-licensed.
      Declare the dual license with MODULE_LICENSE().
      Signed-off-by: default avatarJie Deng <jiedeng@synopsys.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67ff2c71
    • Jie Deng's avatar
      net: dwc-xlgmac: declaration of dual license in headers · ea8c1c64
      Jie Deng authored
      The driver "dwc-xlgmac" is dual-licensed. This patch adds
      declaration of dual license in file headers.
      Signed-off-by: default avatarJie Deng <jiedeng@synopsys.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea8c1c64
    • David S. Miller's avatar
      Merge branch 'bpf-socket-cookie-uid' · 101a6e83
      David S. Miller authored
      Chenbo Feng says:
      
      ====================
      net: core: Two Helper function about socket information
      
      Introduce two eBpf helper function to get the socket cookie and
      socket uid for each packet. The helper function is useful when
      the *sk field inside sk_buff is not empty. These helper functions
      can be used on socket and uid based traffic monitoring programs.
      
      Change since V7:
      * change the user namespace of uid helper function to sock_net(sk)->user_ns
      
      Change since V6:
      * change the user namespace of uid helper function back to init_user_ns
        since in some situation, for example, pinned bpf object, the current
        user namespace is not always applicable.
      
      Change since V5:
      * Delete unnecessary blank lines in sample program.
      * Refine the variable orders in get_uid helper function.
      
      Change since V4:
      * Using current user namespace to get uid instead of using init_ns.
      * Add compiling setup of example program in to Makefile.
      * Change the name style of the example program binaries.
      
      Change since V3:
      * Fixed some typos and incorrect comments in sample program
      * replaced raw insns with BPF_STX_XADD and add it to libbpf.h
      * Use a temp dir as mount point instead and added a check for
        the user input string.
      * Make the get uid helper function returns the user namespace uid
        instead of kuid.
      * Return a overflowuid instead of 0 when no uid information is found.
      
      Change since V2:
      * Add a sample program to demostrate the usage of the helper function.
      * Moved the helper function proto invoking place.
      * Add function header into tools/include
      * Apply sk_to_full_sk() before getting uid.
      
      Change since V1:
      * Removed the unnecessary declarations and export command
      * resolved conflict with master branch.
      * Examine if the socket is a full socket before getting the uid.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      101a6e83
    • Chenbo Feng's avatar
      A Sample of using socket cookie and uid for traffic monitoring · 51570a5a
      Chenbo Feng authored
      Add a sample program to demostrate the possible usage of
      get_socket_cookie and get_socket_uid helper function. The program will
      store bytes and packets counting of in/out traffic monitored by iptables
      and store the stats in a bpf map in per socket base. The owner uid of
      the socket will be stored as part of the data entry. A shell script for
      running the program is also included.
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarChenbo Feng <fengc@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51570a5a
    • Chenbo Feng's avatar
      Add a eBPF helper function to retrieve socket uid · 6acc5c29
      Chenbo Feng authored
      Returns the owner uid of the socket inside a sk_buff. This is useful to
      perform per-UID accounting of network traffic or per-UID packet
      filtering. The socket need to be a fullsock otherwise overflowuid is
      returned.
      Signed-off-by: default avatarChenbo Feng <fengc@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6acc5c29
    • Chenbo Feng's avatar
      Add a helper function to get socket cookie in eBPF · 91b8270f
      Chenbo Feng authored
      Retrieve the socket cookie generated by sock_gen_cookie() from a sk_buff
      with a known socket. Generates a new cookie if one was not yet set.If
      the socket pointer inside sk_buff is NULL, 0 is returned. The helper
      function coud be useful in monitoring per socket networking traffic
      statistics and provide a unique socket identifier per namespace.
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarChenbo Feng <fengc@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91b8270f
  2. 23 Mar, 2017 2 commits