1. 11 Nov, 2019 24 commits
    • David S. Miller's avatar
      Merge branch 'Accomodate-DSA-front-end-into-Ocelot' · fe2b8a88
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Accomodate DSA front-end into Ocelot
      
      After the nice "change-my-mind" discussion about Ocelot, Felix and
      LS1028A (which can be read here: https://lkml.org/lkml/2019/6/21/630),
      we have decided to take the route of reworking the Ocelot implementation
      in a way that is DSA-compatible.
      
      This is a large series, but hopefully is easy enough to digest, since it
      contains mostly code refactoring. What needs to be changed:
      - The struct net_device, phy_device needs to be isolated from Ocelot
        private structures (struct ocelot, struct ocelot_port). These will
        live as 1-to-1 equivalents to struct dsa_switch and struct dsa_port.
      - The function prototypes need to be compatible with DSA (of course,
        struct dsa_switch will become struct ocelot).
      - The CPU port needs to be assigned via a higher-level API, not
        hardcoded in the driver.
      
      What is going to be interesting is that the new DSA front-end of Ocelot
      will need to have features in lockstep with the DSA core itself. At the
      moment, some more advanced tc offloading features of Ocelot (tc-flower,
      etc) are not available in the DSA front-end due to lack of API in the
      DSA core. It also means that Ocelot practically re-implements large
      parts of DSA (although it is not a DSA switch per se) - see the FDB API
      for example.
      
      The code has been only compile-tested on Ocelot, since I don't have
      access to any VSC7514 hardware. It was proven to work on NXP LS1028A,
      which instantiates a DSA derivative of Ocelot. So I would like to ask
      Alex Belloni if you could confirm this series causes no regression on
      the Ocelot MIPS SoC.
      
      The goal is to get this rework upstream as quickly as possible,
      precisely because it is a large volume of code that risks gaining merge
      conflicts if we keep it for too long.
      
      This is but the first chunk of the LS1028A Felix DSA driver upstreaming.
      For those who are interested, the concept can be seen on my private
      Github repo, the user of this reworked Ocelot driver living under
      drivers/net/dsa/vitesse/:
      https://github.com/vladimiroltean/ls1028ardb-linux
      ====================
      Acked-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe2b8a88
    • Vladimir Oltean's avatar
      net: mscc: ocelot: don't hardcode the number of the CPU port · c9d2203b
      Vladimir Oltean authored
      VSC7514 is a 10-port switch with 2 extra "CPU ports" (targets in the
      queuing subsystem for terminating traffic locally).
      
      There are 2 issues with hardcoding the CPU port as #10:
      - It is not clear which snippets of the code are configuring something
        for one of the CPU ports, and which snippets are just doing something
        related to the number of physical ports.
      - Actually any physical port can act as a CPU port connected to an
        external CPU (in addition to the local CPU). This is called NPI mode
        (Node Processor Interface) and is the way that the 6-port VSC9959
        (Felix) switch is integrated inside NXP LS1028A (the "local management
        CPU" functionality is not used there).
      
      This patch makes it clear that the ocelot_bridge_stp_state_set function
      operates on the CPU port (by making it an implicit member of the
      bridging domain), and at the same time adds logic for the NPI port (aka
      a physical port) to play the role of a CPU port (it shouldn't be part of
      bridge_fwd_mask, as it's not explicitly enslaved to a bridge).
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9d2203b
    • Vladimir Oltean's avatar
      net: mscc: ocelot: split assignment of the cpu port into a separate function · 21468199
      Vladimir Oltean authored
      Now that the places that configure routing destinations for the CPU port
      have been marked as such, allow callers to specify their own CPU port
      that is different than ocelot->num_phys_ports. A user will be the Felix
      DSA driver, where the CPU port is one of the physical ports (NPI mode).
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21468199
    • Vladimir Oltean's avatar
      net: mscc: ocelot: refactor adjust_link into a netdev-independent function · 26f4dbab
      Vladimir Oltean authored
      This will be called from the Felix DSA frontend, which will work in
      PHYLIB compatibility mode initially.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26f4dbab
    • Claudiu Manoil's avatar
      net: mscc: ocelot: initialize list of multicast addresses in common code · 2b120dde
      Claudiu Manoil authored
      This is just common path code that belongs to ocelot_init,
      it has nothing to do with a specific SoC/board instance.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b120dde
    • Vladimir Oltean's avatar
      net: mscc: ocelot: separate the common implementation of ndo_open and ndo_stop · 889b8950
      Vladimir Oltean authored
      Allow these functions to be called from the .port_enable and
      .port_disable callbacks of DSA.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      889b8950
    • Vladimir Oltean's avatar
      net: mscc: ocelot: move port initialization into separate function · 31350d7f
      Vladimir Oltean authored
      We need a function for the DSA front-end that does none of the
      net_device registration, but initializes the hardware ports.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31350d7f
    • Vladimir Oltean's avatar
      net: mscc: ocelot: limit vlan ingress filtering to actual number of ports · 714d0ffa
      Vladimir Oltean authored
      The VSC7514 switch (Ocelot) is a 10-port device, while VSC9959 (Felix)
      is 6-port. Therefore the VLAN filtering mask would be out of bounds when
      calling for this new switch. Fix that.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      714d0ffa
    • Vladimir Oltean's avatar
      net: mscc: ocelot: refactor ethtool callbacks · c7282d38
      Vladimir Oltean authored
      Convert them into an implementation that can be called from DSA as well.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7282d38
    • Vladimir Oltean's avatar
      net: mscc: ocelot: separate net_device related items out of ocelot_port · 004d44f6
      Vladimir Oltean authored
      The ocelot and ocelot_port structures will be used by a new DSA driver,
      so the ocelot_board.c file will have to allocate and work with a private
      structure (ocelot_port_private), which embeds the generic struct
      ocelot_port. This is because in DSA, at least one interface does not
      have a net_device, and the DSA driver API does not interact with that
      anyway.
      
      The ocelot_port structure is equivalent to dsa_port, and ocelot to
      dsa_switch. The members of ocelot_port which have an equivalent in
      dsa_port (such as dp->vlan_filtering) have been moved to
      ocelot_port_private.
      
      We want to enforce the coding convention that "ocelot_port" refers to
      the structure, and "port" refers to the integer index. One can retrieve
      the structure at any time from ocelot->ports[port].
      
      The patch is large but only contains variable renaming and mechanical
      movement of fields from one structure to another.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      004d44f6
    • Vladimir Oltean's avatar
      net: mscc: ocelot: refactor struct ocelot_port out of function prototypes · f270dbfa
      Vladimir Oltean authored
      The ocelot_port structure has a net_device embedded in it, which makes
      it unsuitable for leaving it in the driver implementation functions.
      
      Leave ocelot_flower.c untouched. In that file, ocelot_port is used as an
      interface to the tc shared blocks. That will be addressed in the next
      patch.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f270dbfa
    • Vladimir Oltean's avatar
      net: mscc: ocelot: change prototypes of switchdev port attribute handlers · 4bda1415
      Vladimir Oltean authored
      This is needed so that the Felix DSA front-end can call the Ocelot
      implementations.
      
      The implementation of the "mc_disabled" switchdev attribute has also
      been simplified by using the read-modify-write macro instead of
      open-coding that operation.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4bda1415
    • Vladimir Oltean's avatar
      net: mscc: ocelot: change prototypes of hwtstamping ioctls · 306fd44b
      Vladimir Oltean authored
      This is needed in order to present a simpler prototype to the DSA
      front-end of ocelot.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      306fd44b
    • Vladimir Oltean's avatar
      net: mscc: ocelot: break out fdb operations into abstract implementations · 531ee1a6
      Vladimir Oltean authored
      To be able to implement a DSA front-end over ocelot_fdb_add,
      ocelot_fdb_del, ocelot_fdb_dump, these need to have a simple function
      prototype that is independent of struct net_device, netlink skb, etc.
      
      So rename the ndo ops of the ocelot driver into
      ocelot_port_fdb_{add,del,dump}, and have them all call the abstract
      implementations. At the same time, refactor ocelot_port_fdb_do_dump into
      a function whose prototype is compatible with dsa_fdb_dump_cb_t, so that
      the do_dump implementations can live together and be called by the
      ocelot_fdb_dump through a function pointer.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      531ee1a6
    • Vladimir Oltean's avatar
      net: mscc: ocelot: break apart vlan operations into ocelot_vlan_{add, del} · 9855934c
      Vladimir Oltean authored
      We need an implementation of these functions that is agnostic to the
      higher layer (switchdev or dsa).
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9855934c
    • Vladimir Oltean's avatar
      net: mscc: ocelot: break apart ocelot_vlan_port_apply · 97bb69e1
      Vladimir Oltean authored
      This patch transforms the ocelot_vlan_port_apply function ("apply
      what?") into 3 standalone functions:
      
      - ocelot_port_vlan_filtering
      - ocelot_port_set_native_vlan
      - ocelot_port_set_pvid
      
      These functions have a prototype that is better aligned to the DSA API.
      
      The function also had some static initialization (TPID, drop frames with
      multicast source MAC) which was not being changed from any place, so
      that was just moved to ocelot_probe_port (one of the 6 callers of
      ocelot_vlan_port_apply).
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97bb69e1
    • David S. Miller's avatar
      Merge branch 'net-dsa-mv88e6xxx-Add-support-for-port-mirroring' · c82488df
      David S. Miller authored
      Iwan R Timmer says:
      
      ====================
      net: dsa: mv88e6xxx: Add support for port mirroring
      
      This patch series add support for port mirroring in the mv88e6xx switch driver.
      The first patch changes the set_egress_port function to allow different egress
      ports for egress and ingress traffic. The second patch adds the actual code for
      port mirroring support.
      
      Tested on a 88E6176 with:
      
      tc qdisc add dev wan0 clsact
      tc filter add dev wan0 ingress matchall skip_sw \
              action mirred egress mirror dev lan2
      tc filter add dev wan0 egress matchall skip_sw \
              action mirred egress mirror dev lan3
      
      Changes in v3
      
      - Use enum for egress traffic direction
      - Keep track of egress ports on mv88e6390
      - Move booleans in struct for better structure packing
      
      Changes in v2
      
      - Support mirroring egress and ingress traffic to different ports
      - Check for invalid configurations when multiple ports are mirrored
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c82488df
    • Iwan R Timmer's avatar
      net: dsa: mv88e6xxx: Add support for port mirroring · f0942e00
      Iwan R Timmer authored
      Add support for configuring port mirroring through the cls_matchall
      classifier. We do a full ingress and/or egress capture towards a
      capture port. It allows setting a different capture port for ingress
      and egress traffic.
      
      It keeps track of the mirrored ports and the destination ports to
      prevent changes to the capture port while other ports are being
      mirrored.
      Signed-off-by: default avatarIwan R Timmer <irtimmer@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f0942e00
    • Iwan R Timmer's avatar
      net: dsa: mv88e6xxx: Split monitor port configuration · 5c74c54c
      Iwan R Timmer authored
      Separate the configuration of the egress and ingress monitor port.
      This allows the port mirror functionality to do ingress and egress
      port mirroring to separate ports.
      Signed-off-by: default avatarIwan R Timmer <irtimmer@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c74c54c
    • John Efstathiades's avatar
      Support LAN743x PTP periodic output on any GPIO · 22820017
      John Efstathiades authored
      The LAN743x Ethernet controller provides two independent PTP event
      channels. Each one can be used to generate a periodic output from
      the PTP clock. The output can be routed to any one of the available
      GPIO pins on the device.
      
      The PTP clock API can now be used to:
      - select any LAN743x GPIO pin to function as a periodic output
      - select either LAN743x PTP event channel to generate the output
      
      The LAN7430 has 4 GPIO pins that are multiplexed with its internal
      PHY LED control signals. A pin assigned to the LED control function
      will be assigned to the GPIO function if selected for PTP periodic
      output.
      Signed-off-by: default avatarJohn Efstathiades <john.efstathiades@pebblebay.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      22820017
    • David S. Miller's avatar
      Merge branch 'Unlock-new-potential-in-SJA1105-with-PTP-system-timestamping' · 26285f13
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Unlock new potential in SJA1105 with PTP system timestamping
      
      The SJA1105 being an automotive switch means it is designed to live in a
      set-and-forget environment, far from the configure-at-runtime nature of
      Linux. Frequently resetting the switch to change its static config means
      it loses track of its PTP time, which is not good.
      
      This patch series implements PTP system timestamping for this switch
      (using the API introduced for SPI here:
      https://www.mail-archive.com/netdev@vger.kernel.org/msg316725.html),
      adding the following benefits to the driver:
      - When under control of a user space PTP servo loop (ptp4l, phc2sys),
        the loss of sync during a switch reset is much more manageable, and
        the switch still remains in the s2 (locked servo) state.
      - When synchronizing the switch using the software technique (based on
        reading clock A and writing the value to clock B, as opposed to
        relying on hardware timestamping), e.g. by using phc2sys, the sync
        accuracy is vastly improved due to the fact that the actual switch PTP
        time can now be more precisely correlated with something of better
        precision (CLOCK_REALTIME). The issue is that SPI transfers are
        inherently bad for measuring time with low jitter, but the newly
        introduced API aims to alleviate that issue somewhat.
      
      This series is also a requirement for a future patch set that adds full
      time-aware scheduling offload support for the switch.
      ====================
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26285f13
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Disallow management xmit during switch reset · af580ae2
      Vladimir Oltean authored
      The purpose here is to avoid ptp4l fail due to this condition:
      
        timed out while polling for tx timestamp
        increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
        port 1: send peer delay request failed
      
      So either reset the switch before the management frame was sent, or
      after it was timestamped as well, but not in the middle.
      
      The condition may arise either due to a true timeout (i.e. because
      re-uploading the static config takes time), or due to the TX timestamp
      actually getting lost due to reset. For the former we can increase
      tx_timestamp_timeout in userspace, for the latter we need this patch.
      
      Locking all traffic during switch reset does not make sense at all,
      though. Forcing all CPU-originated traffic to potentially block waiting
      for a sleepable context to send > 800 bytes over SPI is not a good idea.
      Flows that are autonomously forwarded by the switch will get dropped
      anyway during switch reset no matter what. So just let all other
      CPU-originated traffic be dropped as well.
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af580ae2
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Restore PTP time after switch reset · 6cf99c13
      Vladimir Oltean authored
      The PTP time of the switch is not preserved when uploading a new static
      configuration. Work around this hardware oddity by reading its PTP time
      before a static config upload, and restoring it afterwards.
      
      Static config changes are expected to occur at runtime even in scenarios
      directly related to PTP, i.e. the Time-Aware Scheduler of the switch is
      programmed in this way.
      
      Perhaps the larger implication of this patch is that the PTP .gettimex64
      and .settime functions need to be exposed to sja1105_main.c, where the
      PTP lock needs to be held during this entire process. So their core
      implementation needs to move to some common functions which get exposed
      in sja1105_ptp.h.
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6cf99c13
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Implement the .gettimex64 system call for PTP · 34d76e9f
      Vladimir Oltean authored
      Through the PTP_SYS_OFFSET_EXTENDED ioctl, it is possible for userspace
      applications (i.e. phc2sys) to compensate for the delays incurred while
      reading the PHC's time.
      
      The task itself of taking the software timestamp is delegated to the SPI
      subsystem, through the newly introduced API in struct spi_transfer. The
      goal is to cross-timestamp I/O operations on the switch's PTP clock with
      values in the local system clock (CLOCK_REALTIME). For that we need to
      understand a bit of the hardware internals.
      
      The 'read PTP time' message is a 12 byte structure, first 4 bytes of
      which represent the SPI header, and the last 8 bytes represent the
      64-bit PTP time. The switch itself starts processing the command
      immediately after receiving the last bit of the address, i.e. at the
      middle of byte 3 (last byte of header). The PTP time is shadowed to a
      buffer register in the switch, and retrieved atomically during the
      subsequent SPI frames.
      
      A similar thing goes on for the 'write PTP time' message, although in
      that case the switch waits until the 64-bit PTP time becomes fully
      available before taking any action. So the byte that needs to be
      software-timestamped is byte 11 (last) of the transfer.
      
      The patch creates a common (and local) sja1105_xfer implementation for
      the SPI I/O, and offers 3 front-ends:
      
      - sja1105_xfer_u32 and sja1105_xfer_u64: these are capable of optionally
        requesting a PTP timestamp
      
      - sja1105_xfer_buf: this is for large transfers (e.g. the static config
        buffer) and other misc data, and there is no point in giving
        timestamping capabilities to this.
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34d76e9f
  2. 10 Nov, 2019 7 commits
  3. 09 Nov, 2019 9 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 14684b93
      David S. Miller authored
      One conflict in the BPF samples Makefile, some fixes in 'net' whilst
      we were converting over to Makefile.target rules in 'net-next'.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14684b93
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 0058b0a5
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) BPF sample build fixes from Björn Töpel
      
       2) Fix powerpc bpf tail call implementation, from Eric Dumazet.
      
       3) DCCP leaks jiffies on the wire, fix also from Eric Dumazet.
      
       4) Fix crash in ebtables when using dnat target, from Florian Westphal.
      
       5) Fix port disable handling whne removing bcm_sf2 driver, from Florian
          Fainelli.
      
       6) Fix kTLS sk_msg trim on fallback to copy mode, from Jakub Kicinski.
      
       7) Various KCSAN fixes all over the networking, from Eric Dumazet.
      
       8) Memory leaks in mlx5 driver, from Alex Vesker.
      
       9) SMC interface refcounting fix, from Ursula Braun.
      
      10) TSO descriptor handling fixes in stmmac driver, from Jose Abreu.
      
      11) Add a TX lock to synchonize the kTLS TX path properly with crypto
          operations. From Jakub Kicinski.
      
      12) Sock refcount during shutdown fix in vsock/virtio code, from Stefano
          Garzarella.
      
      13) Infinite loop in Intel ice driver, from Colin Ian King.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (108 commits)
        ixgbe: need_wakeup flag might not be set for Tx
        i40e: need_wakeup flag might not be set for Tx
        igb/igc: use ktime accessors for skb->tstamp
        i40e: Fix for ethtool -m issue on X722 NIC
        iavf: initialize ITRN registers with correct values
        ice: fix potential infinite loop because loop counter being too small
        qede: fix NULL pointer deref in __qede_remove()
        net: fix data-race in neigh_event_send()
        vsock/virtio: fix sock refcnt holding during the shutdown
        net: ethernet: octeon_mgmt: Account for second possible VLAN header
        mac80211: fix station inactive_time shortly after boot
        net/fq_impl: Switch to kvmalloc() for memory allocation
        mac80211: fix ieee80211_txq_setup_flows() failure path
        ipv4: Fix table id reference in fib_sync_down_addr
        ipv6: fixes rt6_probe() and fib6_nh->last_probe init
        net: hns: Fix the stray netpoll locks causing deadlock in NAPI path
        net: usb: qmi_wwan: add support for DW5821e with eSIM support
        CDC-NCM: handle incomplete transfer of MTU
        nfc: netlink: fix double device reference drop
        NFC: st21nfca: fix double free
        ...
      0058b0a5
    • Linus Torvalds's avatar
      Merge tag 'for-linus-2019-11-08' of git://git.kernel.dk/linux-block · 5cb8418c
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - Two NVMe device removal crash fixes, and a compat fixup for for an
         ioctl that was introduced in this release (Anton, Charles, Max - via
         Keith)
      
       - Missing error path mutex unlock for drbd (Dan)
      
       - cgroup writeback fixup on dead memcg (Tejun)
      
       - blkcg online stats print fix (Tejun)
      
      * tag 'for-linus-2019-11-08' of git://git.kernel.dk/linux-block:
        cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
        block: drbd: remove a stray unlock in __drbd_send_protocol()
        blkcg: make blkcg_print_stat() print stats only for online blkgs
        nvme: change nvme_passthru_cmd64 to explicitly mark rsvd
        nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths
        nvme-rdma: fix a segmentation fault during module unload
      5cb8418c
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · a2582cdc
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Fixes 2019-11-08
      
      This series contains fixes to igb, igc, ixgbe, i40e, iavf and ice
      drivers.
      
      Colin Ian King fixes a potentially wrap-around counter in a for-loop.
      
      Nick fixes the default ITR values for the iavf driver to 50 usecs
      interval.
      
      Arkadiusz fixes 'ethtool -m' for X722 devices where the correct value
      cannot be obtained from the firmware, so add X722 to the check to ensure
      the wrong value is not returned.
      
      Jake fixes igb and igc drivers in their implementation of launch time
      support by declaring skb->tstamp value as ktime_t instead of s64.
      
      Magnus fixes ixgbe and i40e where the need_wakeup flag for transmit may
      not be set for AF_XDP sockets that are only used to send packets.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2582cdc
    • Magnus Karlsson's avatar
      ixgbe: need_wakeup flag might not be set for Tx · 0843aa8f
      Magnus Karlsson authored
      The need_wakeup flag for Tx might not be set for AF_XDP sockets that
      are only used to send packets. This happens if there is at least one
      outstanding packet that has not been completed by the hardware and we
      get that corresponding completion (which will not generate an
      interrupt since interrupts are disabled in the napi poll loop) between
      the time we stopped processing the Tx completions and interrupts are
      enabled again. In this case, the need_wakeup flag will have been
      cleared at the end of the Tx completion processing as we believe we
      will get an interrupt from the outstanding completion at a later point
      in time. But if this completion interrupt occurs before interrupts
      are enable, we lose it and should at that point really have set the
      need_wakeup flag since there are no more outstanding completions that
      can generate an interrupt to continue the processing. When this
      happens, user space will see a Tx queue need_wakeup of 0 and skip
      issuing a syscall, which means will never get into the Tx processing
      again and we have a deadlock.
      
      This patch introduces a quick fix for this issue by just setting the
      need_wakeup flag for Tx to 1 all the time. I am working on a proper
      fix for this that will toggle the flag appropriately, but it is more
      challenging than I anticipated and I am afraid that this patch will
      not be completed before the merge window closes, therefore this easier
      fix for now. This fix has a negative performance impact in the range
      of 0% to 4%. Towards the higher end of the scale if you have driver
      and application on the same core and issue a lot of packets, and
      towards no negative impact if you use two cores, lower transmission
      speeds and/or a workload that also receives packets.
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      0843aa8f
    • Magnus Karlsson's avatar
      i40e: need_wakeup flag might not be set for Tx · 70563957
      Magnus Karlsson authored
      The need_wakeup flag for Tx might not be set for AF_XDP sockets that
      are only used to send packets. This happens if there is at least one
      outstanding packet that has not been completed by the hardware and we
      get that corresponding completion (which will not generate an
      interrupt since interrupts are disabled in the napi poll loop) between
      the time we stopped processing the Tx completions and interrupts are
      enabled again. In this case, the need_wakeup flag will have been
      cleared at the end of the Tx completion processing as we believe we
      will get an interrupt from the outstanding completion at a later point
      in time. But if this completion interrupt occurs before interrupts
      are enable, we lose it and should at that point really have set the
      need_wakeup flag since there are no more outstanding completions that
      can generate an interrupt to continue the processing. When this
      happens, user space will see a Tx queue need_wakeup of 0 and skip
      issuing a syscall, which means will never get into the Tx processing
      again and we have a deadlock.
      
      This patch introduces a quick fix for this issue by just setting the
      need_wakeup flag for Tx to 1 all the time. I am working on a proper
      fix for this that will toggle the flag appropriately, but it is more
      challenging than I anticipated and I am afraid that this patch will
      not be completed before the merge window closes, therefore this easier
      fix for now. This fix has a negative performance impact in the range
      of 0% to 4%. Towards the higher end of the scale if you have driver
      and application on the same core and issue a lot of packets, and
      towards no negative impact if you use two cores, lower transmission
      speeds and/or a workload that also receives packets.
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      70563957
    • Jacob Keller's avatar
      igb/igc: use ktime accessors for skb->tstamp · 6acab13b
      Jacob Keller authored
      When implementing launch time support in the igb and igc drivers, the
      skb->tstamp value is assumed to be a s64, but it's declared as a ktime_t
      value.
      
      Although ktime_t is typedef'd to s64 it wasn't always, and the kernel
      provides accessors for ktime_t values.
      
      Use the ktime_to_timespec64 and ktime_set accessors instead of directly
      assuming that the variable is always an s64.
      
      This improves portability if the code is ever moved to another kernel
      version, or if the definition of ktime_t ever changes again in the
      future.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Acked-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      6acab13b
    • Arkadiusz Kubalewski's avatar
      i40e: Fix for ethtool -m issue on X722 NIC · 4c9da6f2
      Arkadiusz Kubalewski authored
      This patch contains fix for a problem with command:
      'ethtool -m <dev>'
      which breaks functionality of:
      'ethtool <dev>'
      when called on X722 NIC
      
      Disallowed update of link phy_types on X722 NIC
      Currently correct value cannot be obtained from FW
      Previously wrong value returned by FW was used and was
      a root cause for incorrect output of 'ethtool <dev>' command
      Signed-off-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4c9da6f2
    • Nicholas Nunley's avatar
      iavf: initialize ITRN registers with correct values · 4eda4e00
      Nicholas Nunley authored
      Since commit 92418fb1 ("i40e/i40evf: Use usec value instead of reg
      value for ITR defines") the driver tracks the interrupt throttling
      intervals in single usec units, although the actual ITRN registers are
      programmed in 2 usec units. Most register programming flows in the driver
      correctly handle the conversion, although it is currently not applied when
      the registers are initialized to their default values. Most of the time
      this doesn't present a problem since the default values are usually
      immediately overwritten through the standard adaptive throttling mechanism,
      or updated manually by the user, but if adaptive throttling is disabled and
      the interval values are left alone then the incorrect value will persist.
      
      Since the intended default interval of 50 usecs (vs. 100 usecs as
      programmed) performs better for most traffic workloads, this can lead to
      performance regressions.
      
      This patch adds the correct conversion when writing the initial values to
      the ITRN registers.
      Signed-off-by: default avatarNicholas Nunley <nicholas.d.nunley@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4eda4e00