1. 26 Aug, 2017 17 commits
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 49107fcb
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2017-08-25
      
      This series contains updates to i40e and i40evf only.
      
      Mitch adjusts the max packet size to account for two VLAN tags.
      
      Sudheer provides a fix to ensure that the watchdog timer is scheduled
      immediately after admin queue operations are scheduled in i40evf_down().
      Fixes an issue by adding locking around the admin queue command and
      update of state variables so that adminq_subtask will have the accurate
      information whenever it gets scheduled.
      
      Anjali fixes a bug where the PF flag setup should happen before the VMDq
      RSS queue count is initialized for VMDq VSI to get the right number of
      queues for RSS in the case of x722 devices.  Fixed a problem with the
      hardware ATR eviction feature where the NVM setting was incorrect.
      
      Jake separates the flags into two types, hw_features and flags.  The
      hw_features flags contain a set of features which are enabled at init
      time and will not contain feature flags that can be toggled.  Everything
      else will remain in the flags variable, and can be modified anytime
      during run time.  We should not be directly copying a cpumask_t, since
      it is bitmap and might not be copied correctly, so use cpumask_copy()
      instead.
      
      Stefan Assmann makes vf _offload_flags more "generic" by renaming it to
      vf_cap_flags, which allows other capabilities besides offloading to be
      added.
      
      Alan makes it such that if adaptive-rx/tx is enabled, the user cannot
      make any manual adjustments to interrupt moderation.  Also makes it so
      that if ITR is disabled by adaptive-rx/tx is then enabled, ITR will be
      re-enabled.
      
      v2: Dropped patches #1 & #8 from the original patch series submission,
          while Jesse and Jake re-work their patches based on feedback from
          David Miller.  Also removed the duplicate patch 3 that was
          accidentally sent out twice in the previous submission.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49107fcb
    • David S. Miller's avatar
      Merge branch 'nfp-SR-IOV-ndos-support' · fac0cef9
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      nfp: SR-IOV ndos support
      
      This set adds basic SR-IOV including setting/getting VF MAC addresses,
      VLANs, link state and spoofcheck settings.  It is wired up for both
      vNICs and representors (note: ip link will not report VF settings on
      VF/PF representors because they are not linked to the PF PCI device).
      
      Pablo and team add the basic implementation, Simon and Dirk follow
      up with the representor plumbing.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fac0cef9
    • Simon Horman's avatar
      nfp: add basic SR-IOV ndo functions to representors · 6abd224b
      Simon Horman authored
      Add basic ndo_set/get_vf to support SR-IOV on all types
      of port representors.
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6abd224b
    • Pablo Cascón's avatar
      nfp: add basic SR-IOV ndo functions · 25528d90
      Pablo Cascón authored
      Add basic ndo_set/get_vf to support SR-IOV.
      
      VF to egress phy static mapping by now.
      
      Use vfcfg ABI version 2 to write the info to the FW and collect
      the return value from the mailbox.
      Signed-off-by: default avatarPablo Cascón <pablo.cascon@netronome.com>
      Signed-off-by: default avatarJimmy Kizito <jimmy.kizito@netronome.com>
      Signed-off-by: default avatarRami Tomer <rami.tomer@netronome.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25528d90
    • Eric Dumazet's avatar
      tcp: fix hang in tcp_sendpage_locked() · bd9dfc54
      Eric Dumazet authored
      syszkaller got a hang in tcp stack, related to a bug in
      tcp_sendpage_locked()
      
      root@syzkaller:~# cat /proc/3059/stack
      [<ffffffff83de926c>] __lock_sock+0x1dc/0x2f0
      [<ffffffff83de9473>] lock_sock_nested+0xf3/0x110
      [<ffffffff8408ce01>] tcp_sendmsg+0x21/0x50
      [<ffffffff84163b6f>] inet_sendmsg+0x11f/0x5e0
      [<ffffffff83dd8eea>] sock_sendmsg+0xca/0x110
      [<ffffffff83dd9547>] kernel_sendmsg+0x47/0x60
      [<ffffffff83de35dc>] sock_no_sendpage+0x1cc/0x280
      [<ffffffff8408916b>] tcp_sendpage_locked+0x10b/0x160
      [<ffffffff84089203>] tcp_sendpage+0x43/0x60
      [<ffffffff841641da>] inet_sendpage+0x1aa/0x660
      [<ffffffff83dd4fcd>] kernel_sendpage+0x8d/0xe0
      [<ffffffff83dd50ac>] sock_sendpage+0x8c/0xc0
      [<ffffffff81b63300>] pipe_to_sendpage+0x290/0x3b0
      [<ffffffff81b67243>] __splice_from_pipe+0x343/0x750
      [<ffffffff81b6a459>] splice_from_pipe+0x1e9/0x330
      [<ffffffff81b6a5e0>] generic_splice_sendpage+0x40/0x50
      [<ffffffff81b6b1d7>] SyS_splice+0x7b7/0x1610
      [<ffffffff84d77a01>] entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Fixes: 306b13eb ("proto_ops: Add locked held versions of sendmsg and sendpage")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Tom Herbert <tom@quantonium.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd9dfc54
    • David S. Miller's avatar
      Merge branch 'net_sched-clean-up-tc-classes-and-u32-filter' · 86df4d2e
      David S. Miller authored
      Cong Wang says:
      
      ====================
      net_sched: clean up tc classes and u32 filter
      
      Patch 1 and patch 2 prepare for patch 3. Major changes
      are in patch 3 and patch 4, details are there too.
      
      v2: Add patch 1 and 2, group all into a patchset
          Fix a coding style issue in patch 4
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86df4d2e
    • WANG Cong's avatar
      net_sched: kill u32_node pointer in Qdisc · 3cd904ec
      WANG Cong authored
      It is ugly to hide a u32-filter-specific pointer inside Qdisc,
      this breaks the TC layers:
      
      1. Qdisc is a generic representation, should not have any specific
         data of any type
      
      2. Qdisc layer is above filter layer, should only save filters in
         the list of struct tcf_proto.
      
      This pointer is used as the head of the chain of u32 hash tables,
      that is struct tc_u_hnode, because u32 filter is very special,
      it allows to create multiple hash tables within one qdisc and
      across multiple u32 filters.
      
      Instead of using this ugly pointer, we can just save it in a global
      hash table key'ed by (dev ifindex, qdisc handle), therefore we can
      still treat it as a per qdisc basis data structure conceptually.
      
      Of course, because of network namespaces, this key is not unique
      at all, but it is fine as we already have a pointer to Qdisc in
      struct tc_u_common, we can just compare the pointers when collision.
      
      And this only affects slow paths, has no impact to fast path,
      thanks to the pointer ->tp_c.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3cd904ec
    • WANG Cong's avatar
      net_sched: remove tc class reference counting · 143976ce
      WANG Cong authored
      For TC classes, their ->get() and ->put() are always paired, and the
      reference counting is completely useless, because:
      
      1) For class modification and dumping paths, we already hold RTNL lock,
         so all of these ->get(),->change(),->put() are atomic.
      
      2) For filter bindiing/unbinding, we use other reference counter than
         this one, and they should have RTNL lock too.
      
      3) For ->qlen_notify(), it is special because it is called on ->enqueue()
         path, but we already hold qdisc tree lock there, and we hold this
         tree lock when graft or delete the class too, so it should not be gone
         or changed until we release the tree lock.
      
      Therefore, this patch removes ->get() and ->put(), but:
      
      1) Adds a new ->find() to find the pointer to a class by classid, no
         refcnt.
      
      2) Move the original class destroy upon the last refcnt into ->delete(),
         right after releasing tree lock. This is fine because the class is
         already removed from hash when holding the lock.
      
      For those who also use ->put() as ->unbind(), just rename them to reflect
      this change.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      143976ce
    • WANG Cong's avatar
      net_sched: introduce tclass_del_notify() · 14546ba1
      WANG Cong authored
      Like for TC actions, ->delete() is a special case,
      we have to prepare and fill the notification before delete
      otherwise would get use-after-free after we remove the
      reference count.
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14546ba1
    • WANG Cong's avatar
      net_sched: get rid of more forward declarations · 27d7f07c
      WANG Cong authored
      This is not needed if we move them up properly.
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27d7f07c
    • Dan Carpenter's avatar
      hinic: skb_pad() frees on error · 7d8697af
      Dan Carpenter authored
      The skb_pad() function frees the skb on error, so this code has a double
      free.
      
      Fixes: 00e57a6d ("net-next/hinic: Add Tx operation")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d8697af
    • David S. Miller's avatar
      Merge branch 'ipv6-sr-updates' · cf4828d1
      David S. Miller authored
      David Lebrun says:
      
      ====================
      net: updates for IPv6 Segment Routing
      
      v2: seg6_lwt_headroom() is not relevant for lwtunnel_input_redirect()
          use cases, and L2ENCAP only uses this redirection. Fix incoherence
          between arbitrary MAC header size support and fixed headroom
          computation by setting only LWTUNNEL_STATE_INPUT_REDIRECT for L2ENCAP
          mode.
      
      This patch series provides several updates for the SRv6 implementation. The
      first patch leverages the existing infrastructure to support encapsulation
      of IPv4 packets. The second patch implements the T.Encaps.L2 SR function,
      enabling to encapsulate an L2 Ethernet frame within an IPv6+SRH packet.
      The last three patches update the seg6local lightweight tunnel, and mainly
      implement four new actions: End.T, End.DX2, End.DX4 and End.DT6.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf4828d1
    • David Lebrun's avatar
      ipv6: sr: implement additional seg6local actions · 891ef8dd
      David Lebrun authored
      This patch implements the following seg6local actions.
      
      - SEG6_LOCAL_ACTION_END_T: regular SRH processing and forward to the
        next-hop looked up in the specified routing table.
      
      - SEG6_LOCAL_ACTION_END_DX2: decapsulate an L2 frame and forward it to
        the specified network interface.
      
      - SEG6_LOCAL_ACTION_END_DX4: decapsulate an IPv4 packet and forward it,
        possibly to the specified next-hop.
      
      - SEG6_LOCAL_ACTION_END_DT6: decapsulate an IPv6 packet and forward it
        to the next-hop looked up in the specified routing table.
      Signed-off-by: default avatarDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      891ef8dd
    • David Lebrun's avatar
      ipv6: sr: add helper functions for seg6local · d7a669dd
      David Lebrun authored
      This patch adds three helper functions to be used with the seg6local packet
      processing actions.
      
      The decap_and_validate() function will be used by the End.D* actions, that
      decapsulate an SR-enabled packet.
      
      The advance_nextseg() function applies the fundamental operations to update
      an SRH for the next segment.
      
      The lookup_nexthop() function helps select the next-hop for the processed
      SR packets. It supports an optional next-hop address to route the packet
      specifically through it, and an optional routing table to use.
      Signed-off-by: default avatarDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7a669dd
    • David Lebrun's avatar
      ipv6: sr: enforce IPv6 packets for seg6local lwt · 6285217f
      David Lebrun authored
      This patch ensures that the seg6local lightweight tunnel is used solely
      with IPv6 routes and processes only IPv6 packets.
      Signed-off-by: default avatarDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6285217f
    • David Lebrun's avatar
      ipv6: sr: add support for encapsulation of L2 frames · 38ee7f2d
      David Lebrun authored
      This patch implements the L2 frame encapsulation mechanism, referred to
      as T.Encaps.L2 in the SRv6 specifications [1].
      
      A new type of SRv6 tunnel mode is added (SEG6_IPTUN_MODE_L2ENCAP). It only
      accepts packets with an existing MAC header (i.e., it will not work for
      locally generated packets). The resulting packet looks like IPv6 -> SRH ->
      Ethernet -> original L3 payload. The next header field of the SRH is set to
      NEXTHDR_NONE.
      
      [1] https://tools.ietf.org/html/draft-filsfils-spring-srv6-network-programming-01Signed-off-by: default avatarDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38ee7f2d
    • David Lebrun's avatar
      ipv6: sr: add support for ip4ip6 encapsulation · 32d99d0b
      David Lebrun authored
      This patch enables the SRv6 encapsulation mode to carry an IPv4 payload.
      All the infrastructure was already present, I just had to add a parameter
      to seg6_do_srh_encap() to specify the inner packet protocol, and perform
      some additional checks.
      
      Usage example:
      ip route add 1.2.3.4 encap seg6 mode encap segs fc00::1,fc00::2 dev eth0
      Signed-off-by: default avatarDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      32d99d0b
  2. 25 Aug, 2017 23 commits
    • Sudheer Mogilappagari's avatar
      i40e: synchronize nvmupdate command and adminq subtask · 2bf01935
      Sudheer Mogilappagari authored
      During NVM update, state machine gets into unrecoverable state because
      i40e_clean_adminq_subtask can get scheduled after the admin queue
      command but before other state variables are updated. This causes
      incorrect input to i40e_nvmupd_check_wait_event and state transitions
      don't happen.
      
      This issue existed before but surfaced after commit 373149fc
      ("i40e: Decrease the scope of rtnl lock")
      
      This fix adds locking around admin queue command and update of
      state variables so that adminq_subtask will have accurate information
      whenever it gets scheduled.
      Signed-off-by: default avatarSudheer Mogilappagari <sudheer.mogilappagari@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      2bf01935
    • Alan Brady's avatar
      i40e: prevent changing ITR if adaptive-rx/tx enabled · 06b2decd
      Alan Brady authored
      Currently the driver allows the user to change (or even disable)
      interrupt moderation if adaptive-rx/tx is enabled when this should
      not be the case.
      
      Adaptive RX/TX will not respect the user's ITR settings so
      allowing the user to change it is weird.  This bug would also
      allow the user to disable interrupt moderation with adaptive-rx/tx
      enabled which doesn't make much sense either.
      
      This patch makes it such that if adaptive-rx/tx is enabled, the user
      cannot make any manual adjustments to interrupt moderation.  It also
      makes it so that if ITR is disabled but adaptive-rx/tx is then
      enabled, ITR will be re-enabled.
      Signed-off-by: default avatarAlan Brady <alan.brady@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      06b2decd
    • Jacob Keller's avatar
      i40e: use cpumask_copy instead of direct assignment · 7e4d01e7
      Jacob Keller authored
      According to the header file cpumask.h, we shouldn't be directly copying
      a cpumask_t, since its a bitmap and might not be copied correctly. Lets
      use the provided cpumask_copy() function instead.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      7e4d01e7
    • Alan Brady's avatar
      i40evf: use netdev variable in reset task · f0db7892
      Alan Brady authored
      If we're going to bother initializing a variable to reference it we might
      as well use it.
      Signed-off-by: default avatarAlan Brady <alan.brady@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      f0db7892
    • Stefan Assmann's avatar
      i40e/i40evf: rename vf_offload_flags to vf_cap_flags in struct virtchnl_vf_resource · fbb113f7
      Stefan Assmann authored
      The current name of vf_offload_flags indicates that the bitmap is
      limited to offload related features. Make this more generic by renaming
      it to vf_cap_flags, which allows for other capabilities besides
      offloading to be added.
      Signed-off-by: default avatarStefan Assmann <sassmann@kpanic.de>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      fbb113f7
    • Jacob Keller's avatar
      i40e: move check for avoiding VID=0 filters into i40e_vsi_add_vlan · fcf6cfc8
      Jacob Keller authored
      In i40e_vsi_add_vlan we treat attempting to add VID=0 as an error,
      because it does not do what the caller might expect. We already special
      case VID=0 in i40e_vlan_rx_add_vid so that we avoid this error when
      adding the VLAN.
      
      This special casing is necessary so that we do not add the VLAN=0 filter
      since we don't want to stop receiving untagged traffic. Unfortunately,
      not all callers of i40e_vsi_add_vlan are aware of this, including when
      we add VLANs from a VF device.
      
      Rather than special casing every single caller of i40e_vsi_add_vlan,
      lets just move this check internally. This makes the code simpler
      because the caller does not need to be aware of how VLAN=0 is special,
      and we don't forget to add this check in new places.
      
      This fixes a harmless error message displaying when adding a VLAN from
      within a VF. The message was meaningless but there is no reason to
      confuse end users and system administrators, and this is now avoided.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      fcf6cfc8
    • Jacob Keller's avatar
      i40e/i40evf: use cmpxchg64 when updating private flags in ethtool · 841c950d
      Jacob Keller authored
      When a user gives an invalid command to change a private flag which is
      not supported, either because it is read-only, or the device is not
      capable of the feature, we simply ignore the request.
      
      A naive solution would simply be to report error codes when one of the
      flags was not supported. However, this causes problems because it makes
      the operation not atomic. If a user requests multiple private flags
      together at once we could end up changing one before failing at the
      second flag.
      
      We can do a bit better if we instead update a temporary copy of the
      flags variable in the loop, and then copy it into place after. If we
      aren't careful this has the pitfall of potentially silently overwriting
      any changes caused by other threads.
      
      Avoid this by using cmpxchg64 which will compare and swap the flags
      variable only if it currently matched the old value. We'll report
      -EAGAIN in the (hopefully rare!) case where the cmpxchg64 fails.
      
      This ensures that we can properly report when flags are not supported in
      an atomic fashion without the risk of overwriting other threads changes.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      841c950d
    • Anjali Singhai Jain's avatar
      i40e: Detect ATR HW Evict NVM issue and disable the feature · 10a955ff
      Anjali Singhai Jain authored
      This patch fixes a problem with the HW ATR eviction feature where the
      NVM setting was incorrect.  This patch detects the issue on X720
      adapters and disables the feature if the NVM setting is incorrect.
      
      Without this patch, HW ATR Evict feature does not work on broken NVMs
      and is not detected either.  If the HW ATR Evict feature is disabled
      the SW Eviction feature will take effect.
      Signed-off-by: default avatarAnjali Singhai Jain <anjali.singhai@intel.com>
      Signed-off-by: default avatarAlice Michael <alice.michael@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      10a955ff
    • Jacob Keller's avatar
      i40e: remove workaround for Open Firmware MAC address · 28921a0c
      Jacob Keller authored
      Since commit b499ffb0 ("i40e: Look up MAC address in Open Firmware
      or IDPROM"), we've had support for obtaining the MAC address
      form Open Firmware or IDPROM.
      
      This code relied on sending the Open Firmware address directly to the
      device firmware instead of relying on our MAC/VLAN filter list. Thus,
      a work around was introduced in commit b1b15df5 ("i40e: Explicitly
      write platform-specific mac address after PF reset")
      
      We refactored the Open Firmware address enablement code in the ill-named
      commit 41c4c2b5 ("i40e: allow look-up of MAC address from Open
      Firmware or IDPROM")
      
      Since this refactor, we no longer even set I40E_FLAG_PF_MAC. Further, we
      don't need this work around, because we actually store the MAC address
      as part of the MAC/VLAN filter hash. Thus, we will restore the address
      correctly upon reset.
      
      The refactor above failed to revert the workaround, so do that now.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      28921a0c
    • Jacob Keller's avatar
      i40e: separate hw_features from runtime changing flags · d36e41dc
      Jacob Keller authored
      The number of flags found in pf->flags has grown quite large, and there
      are a lot of different types of flags. Most of the flags are simply
      hardware features which are enabled on some firmware or some MAC types.
      Other flags are dynamic run-time flags which enable or disable certain
      features of the driver.
      
      Separate these two types of flags into pf->hw_features and pf->flags.
      The hw_features list will contain a set of features which are enabled at
      init time. This will not contain toggles or otherwise dynamically
      changing features. These flags should not need atomic protections, as
      they will be set once during init and then be essentially read only.
      
      Everything else will remain in the flags variable. These flags may be
      modified at any time during run time. A future patch may wish to convert
      these flags into set_bit/clear_bit/test_bit or similar approach to
      ensure atomic correctness.
      
      The I40E_FLAG_MFP_ENABLED flag may be a good fit for hw_features but
      currently is used by ethtool in the private flags settings, and thus has
      been left as part of flags.
      
      Additionally, I40E_FLAG_DCB_CAPABLE may be a good fit for the
      hw_features but this patch has not tried to untangle it yet.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d36e41dc
    • Anjali Singhai Jain's avatar
      i40e: Fix a bug with VMDq RSS queue allocation · 5a433199
      Anjali Singhai Jain authored
      The X722 pf flag setup should happen before the VMDq RSS queue count is
      initialized for VMDq VSI to get the right number of queues for RSS in
      case of X722 devices.
      Signed-off-by: default avatarAnjali Singhai Jain <anjali.singhai@intel.com>
      Signed-off-by: default avatarAlice Michael <alice.michael@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      5a433199
    • Sudheer Mogilappagari's avatar
      i40evf: prevent VF close returning before state transitions to DOWN · fe2647ab
      Sudheer Mogilappagari authored
      Currently i40evf_close() can return before state transitions to
      __I40EVF_DOWN because of the latency involved in processing and
      receiving response from PF driver and scheduling of VF watchdog_task.
      Due to this inconsistency an immediate call to i40evf_open() fails
      because state is still DOWN_PENDING.
      
      When a VF interface is in up state and we try to add it as slave,
      The bonding driver calls dev_close() and dev_open() in short duration
      resulting in dev_open returning error. The ifenslave command needs
      to be run again for dev_open to succeed.
      
      This fix ensures that watchdog timer is scheduled immediately after
      admin queue operations are scheduled in i40evf_down(). In addition a
      wait condition is added at the end of i40evf_close so that function
      wont return when state is still DOWN_PENDING. The timeout value is
      chosen after some profiling and includes some buffer.
      Signed-off-by: default avatarSudheer Mogilappagari <sudheer.mogilappagari@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      fe2647ab
    • Mitch Williams's avatar
      i40e/i40evf: adjust packet size to account for double VLANs · 1e3a5fd5
      Mitch Williams authored
      Now that the kernel supports double VLAN tags, we should at least play
      nice. Adjust the max packet size to account for two VLAN tags, not just
      one.
      Signed-off-by: default avatarMitch Williams <mitch.a.williams@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1e3a5fd5
    • Eric Biggers's avatar
      strparser: initialize all callbacks · 3fd87127
      Eric Biggers authored
      commit bbb03029 ("strparser: Generalize strparser") added more
      function pointers to 'struct strp_callbacks'; however, kcm_attach() was
      not updated to initialize them.  This could cause the ->lock() and/or
      ->unlock() function pointers to be set to garbage values, causing a
      crash in strp_work().
      
      Fix the bug by moving the callback structs into static memory, so
      unspecified members are zeroed.  Also constify them while we're at it.
      
      This bug was found by syzkaller, which encountered the following splat:
      
          IP: 0x55
          PGD 3b1ca067
          P4D 3b1ca067
          PUD 3b12f067
          PMD 0
      
          Oops: 0010 [#1] SMP KASAN
          Dumping ftrace buffer:
             (ftrace buffer empty)
          Modules linked in:
          CPU: 2 PID: 1194 Comm: kworker/u8:1 Not tainted 4.13.0-rc4-next-20170811 #2
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Workqueue: kstrp strp_work
          task: ffff88006bb0e480 task.stack: ffff88006bb10000
          RIP: 0010:0x55
          RSP: 0018:ffff88006bb17540 EFLAGS: 00010246
          RAX: dffffc0000000000 RBX: ffff88006ce4bd60 RCX: 0000000000000000
          RDX: 1ffff1000d9c97bd RSI: 0000000000000000 RDI: ffff88006ce4bc48
          RBP: ffff88006bb17558 R08: ffffffff81467ab2 R09: 0000000000000000
          R10: ffff88006bb17438 R11: ffff88006bb17940 R12: ffff88006ce4bc48
          R13: ffff88003c683018 R14: ffff88006bb17980 R15: ffff88003c683000
          FS:  0000000000000000(0000) GS:ffff88006de00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 0000000000000055 CR3: 000000003c145000 CR4: 00000000000006e0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
          Call Trace:
           process_one_work+0xbf3/0x1bc0 kernel/workqueue.c:2098
           worker_thread+0x223/0x1860 kernel/workqueue.c:2233
           kthread+0x35e/0x430 kernel/kthread.c:231
           ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
          Code:  Bad RIP value.
          RIP: 0x55 RSP: ffff88006bb17540
          CR2: 0000000000000055
          ---[ end trace f0e4920047069cee ]---
      
      Here is a C reproducer (requires CONFIG_BPF_SYSCALL=y and
      CONFIG_AF_KCM=y):
      
          #include <linux/bpf.h>
          #include <linux/kcm.h>
          #include <linux/types.h>
          #include <stdint.h>
          #include <sys/ioctl.h>
          #include <sys/socket.h>
          #include <sys/syscall.h>
          #include <unistd.h>
      
          static const struct bpf_insn bpf_insns[3] = {
              { .code = 0xb7 }, /* BPF_MOV64_IMM(0, 0) */
              { .code = 0x95 }, /* BPF_EXIT_INSN() */
          };
      
          static const union bpf_attr bpf_attr = {
              .prog_type = 1,
              .insn_cnt = 2,
              .insns = (uintptr_t)&bpf_insns,
              .license = (uintptr_t)"",
          };
      
          int main(void)
          {
              int bpf_fd = syscall(__NR_bpf, BPF_PROG_LOAD,
                                   &bpf_attr, sizeof(bpf_attr));
              int inet_fd = socket(AF_INET, SOCK_STREAM, 0);
              int kcm_fd = socket(AF_KCM, SOCK_DGRAM, 0);
      
              ioctl(kcm_fd, SIOCKCMATTACH,
                    &(struct kcm_attach) { .fd = inet_fd, .bpf_fd = bpf_fd });
          }
      
      Fixes: bbb03029 ("strparser: Generalize strparser")
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Tom Herbert <tom@quantonium.net>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3fd87127
    • Haiyang Zhang's avatar
      hv_netvsc: Fix rndis_filter_close error during netvsc_remove · c6f71c41
      Haiyang Zhang authored
      We now remove rndis filter before unregister_netdev(), which calls
      device close. It involves closing rndis filter already removed.
      
      This patch fixes this error.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6f71c41
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2017-08-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 0cf3f4c3
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2017-08-24
      
      This series includes updates to mlx5 core driver.
      
      From Gal and Saeed, three cleanup patches.
      From Matan, Low level flow steering improvements and optimizations,
       - Use more efficient data structures for flow steering objects handling.
       - Add tracepoints to flow steering operations.
       - Overall these patches improve flow steering rule insertion rate by a
         factor of seven in large scales (~50K rules or more).
      
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cf3f4c3
    • Dan Carpenter's avatar
      hinic: uninitialized variable in hinic_api_cmd_init() · 256fbe11
      Dan Carpenter authored
      We never set the error code in this function.
      
      Fixes: eabf0fad ("net-next/hinic: Initialize api cmd resources")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      256fbe11
    • Florian Fainelli's avatar
      net: mv643xx_eth: Be drop monitor friendly · 43cee2d2
      Florian Fainelli authored
      txq_reclaim() does the normal transmit queue reclamation and
      rxq_deinit() does the RX ring cleanup, none of these are packet drops,
      so use dev_consume_skb() for both locations.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43cee2d2
    • Florian Fainelli's avatar
      tg3: Be drop monitor friendly · 1e9d8e7a
      Florian Fainelli authored
      tg3_tx() does the normal packet TX completion,
      tigon3_dma_hwbug_workaround() and tg3_tso_bug() both need to allocate a
      new SKB that is suitable to workaround HW bugs, and finally
      tg3_free_rings() is doing ring cleanup. Use dev_consume_skb_any() for
      these 3 locations to be SKB drop monitor friendly.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e9d8e7a
    • David S. Miller's avatar
      Merge branch 'ipv6-Route-ICMPv6-errors-with-the-flow-when-ECMP-in-use' · 45c7ec9d
      David S. Miller authored
      Jakub Sitnicki says:
      
      ====================
      ipv6: Route ICMPv6 errors with the flow when ECMP in use
      
      This patch set is another take at making Path MTU Discovery work when
      server nodes are behind a router employing multipath routing in a
      load-balance or anycast setup (that is, when not every end-node can be
      reached by every path). The problem has been well described in RFC 7690
      [1], but in short - in such setups ICMPv6 PTB errors are not guaranteed
      to be routed back to the server node that sent a reply that exceeds path
      MTU.
      
      The proposed solution is two-fold:
      
       (1) on the server side - reflect the Flow Label [2]. This can be done
           without modifying the application using a new per-netns sysctl knob
           that has been proposed independently of this patchset in the patch
           entitled "ipv6: Add sysctl for per namespace flow label
           reflection" [3].
      
       (2) on the ECMP router - make the ipv6 routing subsystem look into the
           ICMPv6 error packets and compute the flow-hash from its payload,
           i.e. the offending packet that triggered the error. This is the
           same behavior as ipv4 stack has already.
      
      With both parts in place Path MTU Discovery can work past the ECMP
      router when using IPv6.
      
      [1] https://tools.ietf.org/html/rfc7690
      [2] https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
      [3] http://patchwork.ozlabs.org/patch/804870/
      
      v1 -> v2:
       - don't use "extern" in external function declaration in header file
       - style change, put as many arguments as possible on the first line of
         a function call, and align consecutive lines to the first argument
       - expand the cover letter based on the feedback
      
      v2 -> v3:
       - switch to computing flow-hash using flow dissector to align with
         recent changes to multipath routing in ipv4 stack
       - add a sysctl knob for enabling flow label reflection per netns
      
      ---
      
      Testing has covered multipath routing of ICMPv6 PTB errors in forward
      and local output path in a simple use-case of an HTTP server sending a
      reply which is over the path MTU size [3]. I have also checked if the
      flows get evenly spread over multiple paths (i.e. if there are no
      regressions) [4].
      
      [3] https://github.com/jsitnicki/tools/tree/master/net/tests/ecmp/pmtud
      [4] https://github.com/jsitnicki/tools/tree/master/net/tests/ecmp/load-balance
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45c7ec9d
    • Jakub Sitnicki's avatar
      ipv6: Use multipath hash from flow info if available · b673d6cc
      Jakub Sitnicki authored
      Allow our callers to influence the choice of ECMP link by honoring the
      hash passed together with the flow info. This allows for special
      treatment of ICMP errors which we would like to route over the same path
      as the IPv6 datagram that triggered the error.
      
      Also go through rt6_multipath_hash(), in the usual case when we aren't
      dealing with an ICMP error, so that there is one central place where
      multipath hash is computed.
      Signed-off-by: default avatarJakub Sitnicki <jkbs@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b673d6cc
    • Jakub Sitnicki's avatar
      ipv6: Fold rt6_info_hash_nhsfn() into its only caller · 956b4531
      Jakub Sitnicki authored
      Commit 644d0e65 ("ipv6 Use get_hash_from_flowi6 for rt6 hash") has
      turned rt6_info_hash_nhsfn() into a one-liner, so it no longer makes
      sense to keep it around. Also remove the accompanying comment that has
      become outdated.
      Signed-off-by: default avatarJakub Sitnicki <jkbs@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      956b4531
    • Jakub Sitnicki's avatar
      ipv6: Compute multipath hash for ICMP errors from offending packet · 23aebdac
      Jakub Sitnicki authored
      When forwarding or sending out an ICMPv6 error, look at the embedded
      packet that triggered the error and compute a flow hash over its
      headers.
      
      This let's us route the ICMP error together with the flow it belongs to
      when multipath (ECMP) routing is in use, which in turn makes Path MTU
      Discovery work in ECMP load-balanced or anycast setups (RFC 7690).
      
      Granted, end-hosts behind the ECMP router (aka servers) need to reflect
      the IPv6 Flow Label for PMTUD to work.
      
      The code is organized to be in parallel with ipv4 stack:
      
        ip_multipath_l3_keys -> ip6_multipath_l3_keys
        fib_multipath_hash   -> rt6_multipath_hash
      Signed-off-by: default avatarJakub Sitnicki <jkbs@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23aebdac