1. 25 May, 2018 40 commits
    • Eran Ben Elisha's avatar
      net/mlx5e: Avoid reset netdev stats on configuration changes · 05909bab
      Eran Ben Elisha authored
      Move all RQ, SQ and channel counters from the channel objects into the
      priv structure.  With this change, counters will not be reset upon
      channel configuration changes.
      
      Channel's statistics for SQs which are associated with TCs higher than
      zero will be presented in ethtool -S, only for SQs which were opened at
      least once since the module was loaded (regardless of their open/close
      current status).  This is done in order to decrease the total amount of
      statistics presented and calculated for the common out of box use (no
      QoS).
      
      mlx5e_channel_stats is a compound of CH,RQ,SQs stats in order to
      create locality for the NAPI when handling TX and RX of the same
      channel.
      
      Align the new statistics struct per ring to avoid several channels
      update to the same cache line at the same time.
      Packet rate was tested, no degradation sensed.
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      CC: Qing Huang <qing.huang@oracle.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      05909bab
    • Shalom Lagziel's avatar
      net/mlx5e: Introducing new statistics rwlock · 868a01a2
      Shalom Lagziel authored
      Introduce a new read/write lock that will protect statistics gathering from
      netdev channels configuration changes.
      e.g. when channels are being replaced (increase/decrease number of rings)
      prevent statistic gathering (ndo_get_stats64) to read the statistics of
      in-active channels (channels that are being closed).
      
      Plus update channels software statistics on the fly when calling
      ndo_get_stats64, and remove it from stats periodic work.
      
      Fixes: 9218b44d ("net/mlx5e: Statistics handling refactoring")
      Signed-off-by: default avatarShalom Lagziel <shaloml@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      868a01a2
    • Saeed Mahameed's avatar
      net/mlx5e: Move phy link down events counter out of SW stats · 6ab75516
      Saeed Mahameed authored
      PHY link down events counter belongs to phy_counters group.
      although it has special handling, it doesn't mean it can't be there.
      
      Move it to phy_counters_grp handler.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      6ab75516
    • Tariq Toukan's avatar
      net/mlx5: Use order-0 allocations for all WQ types · 3a2f7033
      Tariq Toukan authored
      Complete the transition of all WQ types to use fragmented
      order-0 coherent memory instead of high-order allocations.
      
      CQ-WQ already uses order-0.
      Here we do the same for cyclic and linked-list WQs.
      
      This allows the driver to load cleanly on systems with a highly
      fragmented coherent memory.
      
      Performance tests:
      ConnectX-5 100Gbps, CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
      Packet rate of 64B packets, single transmit ring, size 8K.
      
      No degradation is sensed.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      3a2f7033
    • Tariq Toukan's avatar
      net/mlx5i: Use compilation flag in IPOIB header · 549322f2
      Tariq Toukan authored
      If CONFIG_MLX5_CORE_IPOIB is not set, compile-out the
      IPOIB related headers.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      549322f2
    • Tariq Toukan's avatar
      net/mlx5e: TX, Use actual WQE size for SQ edge fill · 043dc78e
      Tariq Toukan authored
      We fill SQ edge with NOPs to avoid WQEs wrap.
      Here, instead of doing that in advance for the maximum possible
      WQE size, we do it on-demand using the actual WQE size.
      We re-order some parts in mlx5e_sq_xmit to finish the calculation
      of WQE size (ds_cnt) before doing any writes to the WQE buffer.
      
      When SQ work queue is fragmented (introduced in an downstream patch),
      dealing with WQE wraps becomes more frequent. This change would drastically
      reduce the overhead in this case.
      
      Performance tests:
      ConnectX-5 100Gbps, CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
      Packet rate of 64B packets, single transmit ring, size 8K.
      
      Before: 14.9 Mpps
      After:  15.8 Mpps
      
      Improvement of 6%.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      043dc78e
    • Tariq Toukan's avatar
      net/mlx5e: Use WQ API functions instead of direct fields access · ddf385e3
      Tariq Toukan authored
      Use the WQ API to get the WQ size, and to map a counter
      into a WQ entry index.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      ddf385e3
    • Chris Mi's avatar
      net/mlx5e: Split offloaded eswitch TC rules for port mirroring · e4ad91f2
      Chris Mi authored
      If a TC rule needs to be split for mirroring, create two HW rules,
      in the first level and the second level flow tables accordingly.
      
      In the first level flow table, forward the packet to the mirror
      port and forward the packet to the second level flow table for
      further processing, eg. encap, vlan push or header re-write.
      
      Currently the matching is repeated in both stages.
      
      While here, simplify the setup of the vhca id valid indicator also
      in the existing code.
      Signed-off-by: default avatarChris Mi <chrism@mellanox.com>
      Reviewed-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      e4ad91f2
    • Chris Mi's avatar
      net/mlx5e: Parse mirroring action for offloaded TC eswitch flows · 592d3651
      Chris Mi authored
      Currently, we only support the mirred redirect TC sub-action. In order
      to support flow based vport mirroring, add support to parse the mirred
      mirror sub-action.
      
      For mirroring, user-space will typically set the action order such that
      the mirror port (mirror VF) sees packets as the original port (VF under
      mirroring) sent them or as it will receive them.
      
      In the general case, it means that packets are potentially sent to the
      mirror port before or after some actions were applied on them. To
      properly do that, we should follow on the exact action order as set for
      the flow and make sure this will also be the case when we program the HW
      offload.
      
      We introduce a counter for the output ports (attr->out_count), which we
      increase when parsing each mirred redirect/mirror sub-action and when
      dealing with encap.
      
      We introduce a counter (attr->mirror_count) telling us if split is
      needed. If no split is needed and mirroring is just multicasting to
      vport, the mirror count is zero, all the actions of the TC flow should
      apply on that single HW flow.
      
      If split is needed, the mirror count tells where to do the split, all
      non-mirred tc actions should apply only after the split.
      
      The mirror count is set while parsing the following actions encap/decap,
      header re-write, vlan push/pop.
      Signed-off-by: default avatarChris Mi <chrism@mellanox.com>
      Reviewed-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      592d3651
    • Chris Mi's avatar
      net/mlx5: E-switch, Create a second level FDB flow table · a842dd04
      Chris Mi authored
      If firmware supports the forward action with a destination list
      that includes a flow table, create a second level FDB flow table.
      
      This is going to be used for flow based mirroring under the switchdev
      offloads mode.
      Signed-off-by: default avatarChris Mi <chrism@mellanox.com>
      Reviewed-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      a842dd04
    • Chris Mi's avatar
      net/mlx5: Add cap bits for flow table destination in FDB table · b4563002
      Chris Mi authored
      If set, the FDB table supports the forward action with a
      destination list that includes a flow table.
      Signed-off-by: default avatarChris Mi <chrism@mellanox.com>
      Reviewed-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      b4563002
    • Chris Mi's avatar
      net/mlx5: E-Switch, Reorganize and rename fdb flow tables · 52fff327
      Chris Mi authored
      We have several fdb flow tables for each of the legacy and switchdev
      modes. In the switchdev mode, there are fast path and slow path flow
      tables. Towards adding more flow tables in upcoming patches, reorganize
      and rename the various existing ones to reflect their functionality.
      Signed-off-by: default avatarChris Mi <chrism@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      52fff327
    • Florian Fainelli's avatar
      net: dsa: dsa_loop: Make dynamic debugging helpful · e52cde71
      Florian Fainelli authored
      Remove redundant debug prints from phy_read/write since we can trace those
      calls through trace events. Enhance dynamic debug prints to print arguments
      which helps figuring how what is going on at the driver level with higher level
      configuration interfaces.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e52cde71
    • David S. Miller's avatar
      Merge branch 'ovs-ct-zone' · 910714f1
      David S. Miller authored
      Yi-Hung Wei says:
      
      ====================
      openvswitch: Support conntrack zone limit
      
      Currently, nf_conntrack_max is used to limit the maximum number of
      conntrack entries in the conntrack table for every network namespace.
      For the VMs and containers that reside in the same namespace,
      they share the same conntrack table, and the total # of conntrack entries
      for all the VMs and containers are limited by nf_conntrack_max.  In this
      case, if one of the VM/container abuses the usage the conntrack entries,
      it blocks the others from committing valid conntrack entries into the
      conntrack table.  Even if we can possibly put the VM in different network
      namespace, the current nf_conntrack_max configuration is kind of rigid
      that we cannot limit different VM/container to have different # conntrack
      entries.
      
      To address the aforementioned issue, this patch proposes to have a
      fine-grained mechanism that could further limit the # of conntrack entries
      per-zone.  For example, we can designate different zone to different VM,
      and set conntrack limit to each zone.  By providing this isolation, a
      mis-behaved VM only consumes the conntrack entries in its own zone, and
      it will not influence other well-behaved VMs.  Moreover, the users can
      set various conntrack limit to different zone based on their preference.
      
      The proposed implementation utilizes Netfilter's nf_conncount backend
      to count the number of connections in a particular zone.  If the number of
      connection is above a configured limitation, OVS will return ENOMEM to the
      userspace.  If userspace does not configure the zone limit, the limit
      defaults to zero that is no limitation, which is backward compatible to
      the behavior without this patch.
      
      The first patch defines the conntrack limit netlink definition, and the
      second patch provides the implementation.
      
      v4->v5:
        - Addresses comments from Parvin that include log error msg in
          ovs_ct_limit_init(), handle deletion for default limit, and
          add a common helper for get zone limit.
        - Rebases to master.
      
      v3->v4:
        - Addresses comments from Parvin that include simplify netlink API,
          and remove unncessary RCU lockings.
        - Rebases to master.
      
      v2->v3:
        - Addresses comments from Parvin that include using static keys to check
          if ovs_ct_limit features is used, only check ct_limit when a ct entry
          is unconfirmed, and reports rate limited warning messages when the ct
          limit is reached.
        - Rebases to master.
      
      v1->v2:
        - Fixes commit log typos suggested by Greg.
        - Fixes memory free issue that Julia found.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      910714f1
    • Yi-Hung Wei's avatar
      openvswitch: Support conntrack zone limit · 11efd5cb
      Yi-Hung Wei authored
      Currently, nf_conntrack_max is used to limit the maximum number of
      conntrack entries in the conntrack table for every network namespace.
      For the VMs and containers that reside in the same namespace,
      they share the same conntrack table, and the total # of conntrack entries
      for all the VMs and containers are limited by nf_conntrack_max.  In this
      case, if one of the VM/container abuses the usage the conntrack entries,
      it blocks the others from committing valid conntrack entries into the
      conntrack table.  Even if we can possibly put the VM in different network
      namespace, the current nf_conntrack_max configuration is kind of rigid
      that we cannot limit different VM/container to have different # conntrack
      entries.
      
      To address the aforementioned issue, this patch proposes to have a
      fine-grained mechanism that could further limit the # of conntrack entries
      per-zone.  For example, we can designate different zone to different VM,
      and set conntrack limit to each zone.  By providing this isolation, a
      mis-behaved VM only consumes the conntrack entries in its own zone, and
      it will not influence other well-behaved VMs.  Moreover, the users can
      set various conntrack limit to different zone based on their preference.
      
      The proposed implementation utilizes Netfilter's nf_conncount backend
      to count the number of connections in a particular zone.  If the number of
      connection is above a configured limitation, ovs will return ENOMEM to the
      userspace.  If userspace does not configure the zone limit, the limit
      defaults to zero that is no limitation, which is backward compatible to
      the behavior without this patch.
      
      The following high leve APIs are provided to the userspace:
        - OVS_CT_LIMIT_CMD_SET:
          * set default connection limit for all zones
          * set the connection limit for a particular zone
        - OVS_CT_LIMIT_CMD_DEL:
          * remove the connection limit for a particular zone
        - OVS_CT_LIMIT_CMD_GET:
          * get the default connection limit for all zones
          * get the connection limit for a particular zone
      Signed-off-by: default avatarYi-Hung Wei <yihung.wei@gmail.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11efd5cb
    • Yi-Hung Wei's avatar
      openvswitch: Add conntrack limit netlink definition · 5972be6b
      Yi-Hung Wei authored
      Define netlink messages and attributes to support user kernel
      communication that uses the conntrack limit feature.
      Signed-off-by: default avatarYi-Hung Wei <yihung.wei@gmail.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5972be6b
    • David S. Miller's avatar
      Merge tag 'mlx5e-updates-2018-05-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · d7c52fc8
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5e-updates-2018-05-19
      
      This series contains updates for mlx5e netdevice driver with one subject,
      DSCP to priority mapping, in the first patch Huy adds the needed API in
      dcbnl, the second patch adds the needed mlx5 core capability bits for the
      feature, and all other patches are mlx5e (netdev) only changes to add
      support for the feature.
      
      From: Huy Nguyen
      
      Dscp to priority mapping for Ethernet packet:
      
      These patches enable differentiated services code point (dscp) to
      priority mapping for Ethernet packet. Once this feature is
      enabled, the packet is routed to the corresponding priority based on its
      dscp. User can combine this feature with priority flow control (pfc)
      feature to have priority flow control based on the dscp.
      
      Firmware interface:
      Mellanox firmware provides two control knobs for this feature:
        QPTS register allow changing the trust state between dscp and
        pcp mode. The default is pcp mode. Once in dscp mode, firmware will
        route the packet based on its dscp value if the dscp field exists.
      
        QPDPM register allow mapping a specific dscp (0 to 63) to a
        specific priority (0 to 7). By default, all the dscps are mapped to
        priority zero.
      
      Software interface:
      This feature is controlled via application priority TLV. IEEE
      specification P802.1Qcd/D2.1 defines priority selector id 5 for
      application priority TLV. This APP TLV selector defines DSCP to priority
      map. This APP TLV can be sent by the switch or can be set locally using
      software such as lldptool. In mlx5 drivers, we add the support for net
      dcb's getapp and setapp call back. Mlx5 driver only handles the selector
      id 5 application entry (dscp application priority application entry).
      If user sends multiple dscp to priority APP TLV entries on the same
      dscp, the last sent one will take effect. All the previous sent will be
      deleted.
      
      This attribute combined with pfc attribute allows advanced user to
      fine tune the qos setting for specific priority queue. For example,
      user can give dedicated buffer for one or more priorities or user
      can give large buffer to certain priorities.
      
      The dcb buffer configuration will be controlled by lldptool.
      >> lldptool -T -i eth2 -V BUFFER prio 0,2,5,7,1,2,3,6
            maps priorities 0,1,2,3,4,5,6,7 to receive buffer 0,2,5,7,1,2,3,6
      >> lldptool -T -i eth2 -V BUFFER size 87296,87296,0,87296,0,0,0,0
            sets receive buffer size for buffer 0,1,2,3,4,5,6,7 respectively
      
      After discussion on mailing list with Jakub, Jiri, Ido and John, we agreed to
      choose dcbnl over devlink interface since this feature is intended to set
      port attributes which are governed by the netdev instance of that port, where
      devlink API is more suitable for global ASIC configurations.
      
      The firmware trust state (in QPTS register) is changed based on the
      number of dscp to priority application entries. When the first dscp to
      priority application entry is added by the user, the trust state is
      changed to dscp. When the last dscp to priority application entry is
      deleted by the user, the trust state is changed to pcp.
      
      When the port is in DSCP trust state, the transmit queue is selected
      based on the dscp of the skb.
      
      When the port is in DSCP trust state and vport inline mode is not NONE,
      firmware requires mlx5 driver to copy the IP header to the
      wqe ethernet segment inline header if the skb has it.
      This is done by changing the transmit queue sq's min inline mode to L3.
      Note that the min inline mode of sqs that belong to other features
      such as xdpsq, icosq are not modified.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7c52fc8
    • Bo Chen's avatar
      8139too: Remove unnecessary netif_napi_del() · a4567579
      Bo Chen authored
      The call to free_netdev() in __rtl8139_cleanup_dev() clears the network device
      napi list, and explicit calls to netif_napi_del() are unnecessary.
      Signed-off-by: default avatarBo Chen <chenbo@pdx.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4567579
    • David S. Miller's avatar
      Merge branch 'qed-ethtool-rx-flow-classification-enhancements' · a527af9c
      David S. Miller authored
      Manish Chopra says:
      
      ====================
      qed*: ethtool rx flow classification enhancements.
      
      This series re-structures the driver's ethtool rx flow
      classification flow, following that it adds other flow
      profiles and rx flow classification enhancements
      via "ethtool -N/-U"
      
      Please consider applying this to "net-next"
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a527af9c
    • Manish Chopra's avatar
      qed*: Support drop action classification · 608e00d0
      Manish Chopra authored
      With this patch, User can configure for the supported
      flows to be dropped. Added a stat "gft_filter_drop"
      as well to be populated in ethtool for the dropped flows.
      
      For example -
      
      ethtool -N p5p1 flow-type udp4 dst-port 8000 action -1
      ethtool -N p5p1 flow-type tcp4 scr-ip 192.168.8.1 action -1
      Signed-off-by: default avatarManish Chopra <manish.chopra@cavium.com>
      Signed-off-by: default avatarShahed Shaikh <shahed.shaikh@cavium.com>
      Signed-off-by: default avatarAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      608e00d0
    • Manish Chopra's avatar
      qede: Support flow classification to the VFs. · 39385ab0
      Manish Chopra authored
      With the supported classification modes [4 tuples based,
      udp port based, src-ip based], flows can be classified
      to the VFs as well. With this patch, flows can be re-directed
      to the requested VF provided in "action" field of command.
      
      Please note that driver doesn't really care about the queue bits
      in "action" field for the VFs. Since queue will be still chosen
      by FW using RSS hash. [I.e., the classification would be done
      according to vport-only]
      
      For examples -
      
      ethtool -N p5p1 flow-type udp4 dst-port 8000 action 0x100000000
      ethtool -N p5p1 flow-type tcp4 src-ip 192.16.6.10 action 0x200000000
      ethtool -U p5p1 flow-type tcp4 src-ip 192.168.40.100 dst-ip \
      	192.168.40.200 src-port 6660 dst-port 5550 \
      	action 0x100000000
      Signed-off-by: default avatarManish Chopra <manish.chopra@cavium.com>
      Signed-off-by: default avatarShahed Shaikh <shahed.shaikh@cavium.com>
      Signed-off-by: default avatarAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39385ab0
    • Manish Chopra's avatar
      qed*: Support other classification modes. · 3893fc62
      Manish Chopra authored
      Currently, driver supports flow classification to PF
      receive queues based on TCP/UDP 4 tuples [src_ip, dst_ip,
      src_port, dst_port] only.
      
      This patch enables to configure different flow profiles
      [For example - only UDP dest port or src_ip based] on the
      adapter so that classification can be done according to
      just those fields as well. Although, at a time just one
      type of flow configuration is supported due to limited
      number of flow profiles available on the device.
      
      For example -
      
      ethtool -N enp7s0f0 flow-type udp4 dst-port 45762 action 2
      ethtool -N enp7s0f0 flow-type tcp4 src-ip 192.16.4.10 action 1
      ethtool -N enp7s0f0 flow-type udp6 dst-port 45762 action 3
      Signed-off-by: default avatarManish Chopra <manish.chopra@cavium.com>
      Signed-off-by: default avatarShahed Shaikh <shahed.shaikh@cavium.com>
      Signed-off-by: default avatarAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3893fc62
    • Manish Chopra's avatar
      qede: Validate unsupported configurations · 89ffd14e
      Manish Chopra authored
      Validate and prevent some of the configurations for
      unsupported [by firmware] inputs [for example - mac ext,
      vlans, masks/prefix, tos/tclass] via ethtool -N/-U.
      Signed-off-by: default avatarManish Chopra <manish.chopra@cavium.com>
      Signed-off-by: default avatarShahed Shaikh <shahed.shaikh@cavium.com>
      Signed-off-by: default avatarAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89ffd14e
    • Manish Chopra's avatar
      qede: Refactor ethtool rx classification flow. · 87885310
      Manish Chopra authored
      This patch simplifies the ethtool rx flow configuration
      [via ethtool -U/-N] flow code base by dividing it logically
      into various APIs based on given protocols. It also separates
      various validations and calculations done along the flow
      in their own APIs.
      Signed-off-by: default avatarManish Chopra <manish.chopra@cavium.com>
      Signed-off-by: default avatarShahed Shaikh <shahed.shaikh@cavium.com>
      Signed-off-by: default avatarAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87885310
    • Arjun Vynipadath's avatar
      cxgb4/cxgb4vf: Notify link changes to OS-dependent code · e2f4f4e9
      Arjun Vynipadath authored
      We have a confusion of two different abstractions in the Common
      Code:  Physical Link (Port) and Logical Network Interface (Virtual
      Interface), and we haven't been properly managing the state of the
      intersection of those two abstractions.
      On the one hand we have the Physical state of the Link -- up or down --
      and on the other we have the logical state of the VI, enabled or not.
      {ethN} refers to both the Physical and Logical State. In this case,
      ifconfig only affects/interrogates the Logical State of a VI,
      and ethtool only deals with the Physical State. And these are different.
      
      So, just because we disable the VI, we don't really want to change the
      Physical Link Up/Down state.  Thus, the previous hack to set
      "lc->link_ok = 0" when we disable a VI is completely incorrect.
      
      Where we get into trouble is where the Physical Link State and the
      Logical VI State cross swords.  And that happens in
      t4_handle_get_port_info() where we need to manage/safe the Physical
      Link State, but we also need to know when the Logical VI State has
      changed and pass that back up to the OS-dependent Driver routine
      t4_os_link_changed() which is concerned about the Logical Interface.
      
      So we enable a VI and that causes Firmware to send us a new Port
      Information message, but if none of the Physical Link State
      particulars have changed, we don't call t4_os_link_changed().
      
      This fix uses the existing OS Contract APIs for the Common Code to
      inform the OS-dependent portion of the Host Driver when the "Link" (really
      Logical Network Interface) is "up" or "down". A new API
      t4_enable_pi_params() is added which calls t4_enable_vi_params() and,
      if that is successful, then calls back to the OS Contract API
      t4_os_link_changed() notifying the OS-dependent layer of the
      potential Link State change.
      
      Original Work by : Casey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarSantosh Rastapur <santosh@chelsio.com>
      Signed-off-by: default avatarArjun Vynipadath <arjun@chelsio.com>
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2f4f4e9
    • Ganesh Goudar's avatar
      cxgb4: clean up init_one · e8d45292
      Ganesh Goudar authored
      clean up init_one and use chip_ver consistently throughout
      init_one() for chip version.
      Signed-off-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e8d45292
    • Ganesh Goudar's avatar
      cxgb4/cxgb4vf: link management changes for new SFP · 57ccaedb
      Ganesh Goudar authored
      newer SFPs like SFP28 and QSFP28 Transceiver Modules present
      several new possibilities which we haven't faced before. Fix the
      assumptions in the code reflecting the more limited capabilities
      of previous Transceiver Module systems
      
      Original work by Casey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57ccaedb
    • YueHaibing's avatar
      net: fec: remove stale comment · b526e56b
      YueHaibing authored
      This comment is outdated as fec_ptp_ioctl has been replaced by fec_ptp_set/fec_ptp_get
      since commit 1d5244d0 ("fec: Implement the SIOCGHWTSTAMP ioctl")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Acked-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b526e56b
    • Martin Habets's avatar
      sfc: stop the TX queue before pushing new buffers · 0c235113
      Martin Habets authored
      efx_enqueue_skb() can push new buffers for the xmit_more functionality.
      We must stops the TX queue before this or else the TX queue does not get
      restarted and we get a netdev watchdog.
      
      In the error handling we may now need to unwind more than 1 packet, and
      we may need to push the new buffers onto the partner queue.
      
      v2: In the error leg also push this queue if xmit_more is set
      
      Fixes: e9117e50 ("sfc: Firmware-Assisted TSO version 2")
      Reported-by: default avatarJarod Wilson <jarod@redhat.com>
      Tested-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarMartin Habets <mhabets@solarflare.com>
      Acked-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c235113
    • Nikolay Aleksandrov's avatar
      net: bridge: add support for port isolation · 7d850abd
      Nikolay Aleksandrov authored
      This patch adds support for a new port flag - BR_ISOLATED. If it is set
      then isolated ports cannot communicate between each other, but they can
      still communicate with non-isolated ports. The same can be achieved via
      ACLs but they can't scale with large number of ports and also the
      complexity of the rules grows. This feature can be used to achieve
      isolated vlan functionality (similar to pvlan) as well, though currently
      it will be port-wide (for all vlans on the port). The new test in
      should_deliver uses data that is already cache hot and the new boolean
      is used to avoid an additional source port test in should_deliver.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d850abd
    • David S. Miller's avatar
      Merge branch 'nfp-offload-LAG-for-tc-flower-egress' · 9c590490
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      nfp: offload LAG for tc flower egress
      
      This series from John adds bond offload to the nfp driver.  Patch 5
      exposes the hash type for NETDEV_LAG_TX_TYPE_HASH to make sure nfp
      hashing matches that of the software LAG.  This may be unnecessarily
      conservative, let's see what LAG maintainers think :)
      
      John says:
      
      This patchset sets up the infrastructure and offloads output actions for
      when a TC flower rule attempts to egress a packet to a LAG port.
      
      Firstly it adds some of the infrastructure required to the flower app and
      to the nfp core. This includes the ability to change the MAC address of a
      repr, a function for combining lookup and write to a FW symbol, and the
      addition of private data to a repr on a per app basis.
      
      Patch 6 continues by implementing notifiers that track Linux bonds and
      communicates to the FW those which enslave reprs, along with the current
      state of reprs within the bond.
      
      Patch 7 ensures bonds are synchronised with FW by receiving and acting
      upon cmsgs sent to the kernel. These may request that a bond message is
      retransmitted when FW can process it, or may request a full sync of the
      bonds defined in the kernel.
      
      Patch 8 offloads a flower action when that action requires egressing to a
      pre-defined Linux bond.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c590490
    • John Hurley's avatar
      nfp: flower: compute link aggregation action · 7e24a593
      John Hurley authored
      If the egress device of an offloaded rule is a LAG port, then encode the
      output port to the NFP with a LAG identifier and the offloaded group ID.
      
      A prelag action is also offloaded which must be the first action of the
      series (although may appear after other pre-actions - e.g. tunnels). This
      causes the FW to check that it has the necessary information to output to
      the requested LAG port. If it does not, the packet is sent to the kernel
      before any other actions are applied to it.
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: default avatarPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e24a593
    • John Hurley's avatar
      nfp: flower: implement host cmsg handler for LAG · 2e1cc522
      John Hurley authored
      Adds the control message handler to synchronize offloaded group config
      with that of the kernel. Such messages are sent from fw to driver and
      feature the following 3 flags:
      
      - Data: an attached cmsg could not be processed - store for retransmission
      - Xon: FW can accept new messages - retransmit any stored cmsgs
      - Sync: full sync requested so retransmit all kernel LAG group info
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: default avatarPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e1cc522
    • John Hurley's avatar
      nfp: flower: monitor and offload LAG groups · bb9a8d03
      John Hurley authored
      Monitor LAG events via the NETDEV_CHANGEUPPER/NETDEV_CHANGELOWERSTATE
      notifiers to maintain a list of offloadable groups. Sync these groups with
      HW via a delayed workqueue to prevent excessive re-configuration. When the
      workqueue is triggered it may generate multiple control messages for
      different groups. These messages are linked via a batch ID and flags to
      indicate a new batch and the end of a batch.
      
      Update private data in each repr to track their LAG lower state flags. The
      state of a repr is used to determine the active netdevs that can be
      offloaded. For example, in active-backup mode, we only offload the netdev
      currently active.
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: default avatarPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb9a8d03
    • John Hurley's avatar
      net: include hash policy in LAG changeupper info · f44aa9ef
      John Hurley authored
      LAG upper event notifiers contain the tx type used by the LAG device.
      Extend this to also include the hash policy used for tx types that
      utilize hashing.
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f44aa9ef
    • John Hurley's avatar
      nfp: flower: add per repr private data for LAG offload · b9452452
      John Hurley authored
      Add a bitmap to each flower repr to track its state if it is enslaved by a
      bond. This LAG state may be different to the port state - for example, the
      port may be up but LAG state may be down due to the selection in an
      active/backup bond.
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: default avatarPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9452452
    • John Hurley's avatar
      nfp: flower: check for/turn on LAG support in firmware · 898bc7d6
      John Hurley authored
      Check if the fw contains the _abi_flower_balance_sync_enable symbol. If it
      does then write a 1 to this indicating that the driver is willing to
      receive NIC to kernel LAG related control messages.
      
      If the write is successful, update the list of extra features supported by
      the fw and add a stub to accept LAG cmsgs.
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: default avatarPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      898bc7d6
    • John Hurley's avatar
      nfp: nfpcore: add rtsym writing function · 1945ca7a
      John Hurley authored
      Add an rtsym API function that combines the lookup of a symbol and the
      writing of a value to it. Values can be written as unsigned 32 or 64 bits.
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: default avatarPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1945ca7a
    • John Hurley's avatar
      nfp: add ndo_set_mac_address for representors · 24f132e2
      John Hurley authored
      Adding a netdev to a bond requires that its mac address can be modified.
      The default eth_mac_addr is sufficient to satisfy this requirement.
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: default avatarPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      24f132e2
    • Stephen Hemminger's avatar
      hv_netvsc: fix bogus ifalias on network device · d97cde6a
      Stephen Hemminger authored
      If the guest network adapter is not configured with DeviceNaming
      enabled on the host, then the query for friendly name will return
      success but with a zero length name. Which then leads to a garbage value
      (stack contents) for ifalias.
      
      Fix is simple, just don't set name if  host doesn't return it.
      
      Fixes: 0fe554a4 ("hv_netvsc: propogate Hyper-V friendly name into interface alias")
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d97cde6a