1. 01 Jul, 2019 1 commit
  2. 30 Jun, 2019 1 commit
    • David S. Miller's avatar
      Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 11697cfc
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2019-06-28
      
      This series contains a smorgasbord of updates to many of the Intel
      drivers.
      
      Gustavo A. R. Silva updates the ice and iavf drivers to use the
      strcut_size() helper where possible.
      
      Miguel increases the pause and refresh time for flow control in the
      e1000e driver during reset for certain devices.
      
      Dann Frazier fixes a potential NULL pointer dereference in ixgbe driver
      when using non-IPSec enabled devices.
      
      Colin Ian King fixes a potential overflow during a shift in the ixgbe
      driver.  Also fixes a potential NULL pointer dereference in the iavf
      driver by adding a check.
      
      Venkatesh Srinivas converts the e1000 driver to use dma_wmb() instead of
      wmb() for doorbell writes to avoid SFENCEs in the transmit and receive
      paths.
      
      Arjan updates the e1000e driver to improve boot time by over 100 msec by
      reducing the usleep ranges suring system startup.
      
      Artem updates the igb driver register dump in ethtool, first prepares
      the register dump for future additions of registers in the dump, then
      secondly, adds the RR2DCDELAY register to the dump.  When dealing with
      time-sensitive networks, this register is helpful in determining your
      latency from the device to the ring.
      
      Alex fixes the ixgbevf driver to use the current cached link state,
      rather than trying to re-check the value from the PF.
      
      Harshitha adds support for MACVLAN offloads in i40e by using channels as
      MACVLAN interfaces.
      
      Detlev Casanova updates the e1000e driver to use delayed work instead of
      timers to run the watchdog.
      
      Vitaly fixes an issue in e1000e, where when disconnecting and
      reconnecting the physical cable connection, the NIC enters a DMoff
      state.  This state causes a mismatch in link and duplexing, so check the
      PCIm function state and perform a PHY reset when in this state to
      resolve the issue.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11697cfc
  3. 29 Jun, 2019 10 commits
  4. 28 Jun, 2019 28 commits
    • Paul Blakey's avatar
      net/mlx5e: Disallow tc redirect offload cases we don't support · f6dc1264
      Paul Blakey authored
      After changing the parent_id to be the same for both NICs of same
      the hardware device, netdev_port_same_parent_id now returns true for
      more cases (all the lower devices in the hierarchy are on the same
      hardware device).
      
      If merged eswitch isn't enabled, these cases aren't supported, so disallow
      them.
      Signed-off-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      f6dc1264
    • Paul Blakey's avatar
      net/mlx5e: Expose same physical switch_id for all representors · 7ff40a46
      Paul Blakey authored
      Report system_image_guid as the E-Switch switch_id, this ensures
      that when a NIC contains multiple PCI functions and which
      has merged eswitch capability, all representors from
      multiple PFs publish same switch_id.
      Signed-off-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      7ff40a46
    • Gavi Teitz's avatar
      net/mlx5e: Don't refresh TIRs when updating representor SQs · a90f88fe
      Gavi Teitz authored
      Refreshing TIRs is done in order to update the TIRs with the current
      state of SQs in the transport domain, so that the TIRs can filter out
      undesired self-loopback packets based on the source SQ of the packet.
      
      Representor TIRs will only receive packets that originate from their
      associated vport, due to dedicated steering, and therefore will never
      receive self-loopback packets, whose source vport will be the vport of
      the E-Switch manager, and therefore not the vport associated with the
      representor. As such, it is not necessary to refresh the representors'
      TIRs, since self-loopback packets can't reach them.
      
      Since representors only exist in switchdev mode, and there is no
      scenario in which a representor will exist in the transport domain
      alongside a non-representor, it is not necessary to refresh the
      transport domain's TIRs upon changing the state of a representor's
      queues. Therefore, do not refresh TIRs upon such a change. Achieve
      this by adding an update_rx callback to the mlx5e_profile, which
      refreshes TIRs for non-representors and does nothing for representors,
      and replace instances of mlx5e_refresh_tirs() upon changing the state
      of the queues with update_rx().
      Signed-off-by: default avatarGavi Teitz <gavi@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      a90f88fe
    • Arnd Bergmann's avatar
      net/mlx5e: reduce stack usage in mlx5_eswitch_termtbl_create · 5233794b
      Arnd Bergmann authored
      Putting an empty 'mlx5_flow_spec' structure on the stack is a bit
      wasteful and causes a warning on 32-bit architectures when building
      with clang -fsanitize-coverage:
      
      drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c: In function 'mlx5_eswitch_termtbl_create':
      drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c:90:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
      
      Since the structure is never written to, we can statically allocate
      it to avoid the stack usage. To be on the safe side, mark all
      subsequent function arguments that we pass it into as 'const'
      as well.
      
      Fixes: 10caabda ("net/mlx5e: Use termination table for VLAN push actions")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      5233794b
    • Parav Pandit's avatar
      net/mlx5e: Set drvinfo in generic manner · f72e6c3e
      Parav Pandit authored
      Consider PCI and non PCI device types while setting device name
      in get_drvinfo() callback using existing generic device.
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarVu Pham <vuhuong@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      f72e6c3e
    • Parav Pandit's avatar
      net/mlx5e: Correct phys_port_name for PF port · 08706736
      Parav Pandit authored
      Currently PF phys_port_name is named as pfNvf-1 as vport number for PF
      vport is 65535.
      Correct PF's phys_port name as agreed upon name as pfN.
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarVu Pham <vuhuong@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      08706736
    • Ariel Levkovich's avatar
      net/mlx5e: Report netdevice MPLS features · 5dc9520b
      Ariel Levkovich authored
      Set supported device features in the netdevice MPLS features mask.
      This will enable HW checksumming and TSO for MPLS tagged traffic.
      Signed-off-by: default avatarAriel Levkovich <lariel@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      5dc9520b
    • Ariel Levkovich's avatar
      net/mlx5e: Move to HW checksumming advertising · e4683f35
      Ariel Levkovich authored
      This patch changes the way the driver advertises its checksum offload
      capabilities within the net device features bit mask.
      
      Instead of advertising protocol specific checksumming capabilities
      which are limited today to IPv4 and IPv6, we move to reporing
      generic HW checksumming capabilities.
      
      This will allow the network stack to let mlx5 device offload checksum
      for cases where the IP header is encapsulated within another protocol
      and the skb->protocol doesn't indicate one of the IP versions protocol,
      specifically in the case of MPLS label encapsulating the IP header and
      the skb->protocol indiciates MPLS ethertype rather than IP.
      
      Moving the HW_CSUM reporting is required in the basic net device hw
      features mask and also in the extensions (vlan and encpasulation
      features) since the extensions are always multiplied by the basic
      features set during the packet's traversal through the stack's tx flow.
      Signed-off-by: default avatarAriel Levkovich <lariel@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      e4683f35
    • Gavi Teitz's avatar
      net/mlx5: MPFS, Allow adding the same MAC more than once · e7e0bee8
      Gavi Teitz authored
      Remove the limitation preventing adding a vport's MAC address to the
      Multi-Physical Function Switch (MPFS) more than once per E-switch, as
      there is no difference in the MPFS if an address is being used by an
      E-switch more than once.
      
      This allows the E-switch to have multiple vports with the same MAC
      address, allowing vports to be classified by VLAN id instead of by MAC
      if desired.
      Signed-off-by: default avatarGavi Teitz <gavi@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      e7e0bee8
    • Gavi Teitz's avatar
      net/mlx5: MPFS, Cleanup add MAC flow · 6311f308
      Gavi Teitz authored
      Unify and isolate the error handling flow in mlx5_mpfs_add_mac(),
      removing code duplication.
      Signed-off-by: default avatarGavi Teitz <gavi@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      6311f308
    • Saeed Mahameed's avatar
      Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux · 4f5d1bea
      Saeed Mahameed authored
      Misc updates from mlx5-next branch:
      
      1) E-Switch vport metadata support for source vport matching
      2) Convert mkey_table to XArray
      3) Shared IRQs and to use single IRQ for all async EQs
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      4f5d1bea
    • Vitaly Lifshits's avatar
      e1000e: PCIm function state support · def4ec6d
      Vitaly Lifshits authored
      Due to commit: 5d8682588605 ("[misc] mei: me: allow runtime
      pm for platform with D0i3")
      When disconnecting the cable and reconnecting it the NIC
      enters DMoff state. This caused wrong link indication
      and duplex mismatch. This bug is described in:
      https://bugzilla.redhat.com/show_bug.cgi?id=1689436
      
      Checking PCIm function state and performing PHY reset after a
      timeout in watchdog task solves this issue.
      Signed-off-by: default avatarVitaly Lifshits <vitaly.lifshits@intel.com>
      Acked-by: default avatarSasha Neftin <sasha.neftin@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      def4ec6d
    • Detlev Casanova's avatar
      e1000e: Make watchdog use delayed work · 59653e64
      Detlev Casanova authored
      Use delayed work instead of timers to run the watchdog of the e1000e
      driver.
      
      Simplify the code with one less middle function.
      Signed-off-by: default avatarDetlev Casanova <detlev.casanova@gmail.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      59653e64
    • Harshitha Ramamurthy's avatar
      i40e: Add macvlan support on i40e · 1d8d80b4
      Harshitha Ramamurthy authored
      This patch enables macvlan offloads for i40e. The idea is to use
      channels as macvlan interfaces. The channels are VSIs of
      type VMDQ. When the first macvlan is created, the maximum number of
      channels possible are created. From then on, as a macvlan interface
      is created, a macvlan filter is added to these already created
      channels (VSIs).
      
      This patch utilizes subordinate device traffic classes to make queue
      groups(channels) available for an upper device like a macvlan.
      
      Steps to configure macvlan offloads:
      1. ethtool -K ethx l2-fwd-offload on
      2. ip link add link ethx name macvlan1 type macvlan
      3. ip addr add <address> dev macvlan1
      4. ip link set macvlan1 up
      Signed-off-by: default avatarHarshitha Ramamurthy <harshitha.ramamurthy@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1d8d80b4
    • Alexander Duyck's avatar
      ixgbevf: Use cached link state instead of re-reading the value for ethtool · 1e1b0c65
      Alexander Duyck authored
      Change the ethtool link settings call to just read the cached state out of
      the adapter structure instead of trying to recheck the value from the PF.
      Doing this should prevent excessive reading of the mailbox.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: default avatar"Guilherme G. Piccoli" <gpiccoli@canonical.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1e1b0c65
    • Colin Ian King's avatar
      iavf: fix dereference of null rx_buffer pointer · 9fe06a51
      Colin Ian King authored
      A recent commit efa14c39 ("iavf: allow null RX descriptors") added
      a null pointer sanity check on rx_buffer, however, rx_buffer is being
      dereferenced before that check, which implies a null pointer dereference
      bug can potentially occur.  Fix this by only dereferencing rx_buffer
      until after the null pointer check.
      
      Addresses-Coverity: ("Dereference before null check")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      9fe06a51
    • Artem Bityutskiy's avatar
      igb: add RR2DCDELAY to ethtool registers dump · cd502a7f
      Artem Bityutskiy authored
      This patch adds the RR2DCDELAY register to the ethtool registers dump.
      RR2DCDELAY exists on I210 and I211 Intel Gigabit Ethernet chips and it stands
      for "Read Request To Data Completion Delay". Here is how this register is
      described in the I210 datasheet:
      
      "This field captures the maximum PCIe split time in 16 ns units, which is the
      maximum delay between the read request to the first data completion. This is
      giving an estimation of the PCIe round trip time."
      
      In other words, whenever I210 reads from the host memory (e.g., fetches a
      descriptor from the ring), the chip measures every PCI DMA read transaction and
      captures the maximum value. So it ends up containing the longest DMA
      transaction time.
      
      This register is very useful for troubleshooting and research purposes. If you
      are dealing with time-sensitive networks, this register can help you get
      an idea of your "I210-to-ring" latency. This helps answering questions like
      "should I have PCIe ASPM enabled?" or "should I enable deep C-states?" on
      my system.
      
      It is safe to read this register at any point, reading it has no effect on
      the I210 chip functionality.
      Signed-off-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      cd502a7f
    • Artem Bityutskiy's avatar
      igb: minor ethool regdump amendment · 9379b399
      Artem Bityutskiy authored
      This patch has no functional impact and it is just a preparation
      for the following patch. It removes an early return from the
      'igb_get_regs()' function by moving the 82576-only registers
      dump into an "if" block. With this preparation, we can dump more
      non-82576 registers at the end of this function.
      Signed-off-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      9379b399
    • Jeff Kirsher's avatar
      iavf: Fix up debug print macro · 75051ce4
      Jeff Kirsher authored
      This aligns the iavf_debug() macro with the other Intel drivers.
      
      Add the bus number, bus_id field to i40e_bus_info so output shows
      each physical port(i.e func) in following format:
        [[[[<domain>]:]<bus>]:][<slot>][.[<func>]]
      domains are numbered from 0 to ffff), bus (0-ff), slot (0-1f) and
      function (0-7).
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      75051ce4
    • Arjan van de Ven's avatar
      e1000e: Reduce boot time by tightening sleep ranges · ab6973ae
      Arjan van de Ven authored
      The e1000e driver is a great user of the usleep_range() API,
      and has nice ranges that in principle help power management.
      
      However the ranges that are used only during system startup are
      very long (and can add easily 100 msec to the boot time) while
      the power savings of such long ranges is irrelevant due to the
      one-off, boot only, nature of these functions.
      
      This patch shrinks some of the longest ranges to be shorter
      (while still using a power friendly 1 msec range); this saves
      100msec+ of boot time on my BDW NUCs
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      ab6973ae
    • Gustavo A. R. Silva's avatar
      iavf: use struct_size() helper · af07adbb
      Gustavo A. R. Silva authored
      Make use of the struct_size() helper instead of an open-coded version
      in order to avoid any potential type mistakes, in particular in the
      context in which this code is being used.
      
      So, replace code of the following form:
      
      sizeof(struct virtchnl_ether_addr_list) + (count * sizeof(struct virtchnl_ether_addr))
      
      with:
      
      struct_size(veal, list, count)
      
      and so on...
      
      This code was detected with the help of Coccinelle.
      Signed-off-by: default avatar"Gustavo A. R. Silva" <gustavo@embeddedor.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      af07adbb
    • Venkatesh Srinivas's avatar
      e1000: Use dma_wmb() instead of wmb() before doorbell writes · 583cf7be
      Venkatesh Srinivas authored
      e1000 writes to doorbells to post transmit descriptors and fill the
      receive ring. After writing descriptors to memory but before
      writing to doorbells, use dma_wmb() rather than wmb(). wmb() is more
      heavyweight than necessary for a device to see descriptor writes.
      
      On x86, this avoids SFENCEs before doorbell writes in both the
      Tx and Rx paths. On ARM, this converts DSB ST -> DMB OSHST.
      
      Tested: 82576EB / x86; QEMU (qemu emulates an 8257x)
      Signed-off-by: default avatarVenkatesh Srinivas <venkateshs@google.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      583cf7be
    • Colin Ian King's avatar
      ixgbe: fix potential u32 overflow on shift · b97c0b52
      Colin Ian King authored
      The u32 variable rem is being shifted using u32 arithmetic however
      it is being passed to div_u64 that expects the expression to be a u64.
      The 32 bit shift may potentially overflow, so cast rem to a u64 before
      shifting to avoid this.  Also remove comment about overflow.
      
      Addresses-Coverity: ("Unintentional integer overflow")
      Fixes: cd458320 ("ixgbe: implement support for SDP/PPS output on X550 hardware")
      Fixes: 68d9676f ("ixgbe: fix PTP SDP pin setup on X540 hardware")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Acked-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      b97c0b52
    • Dann Frazier's avatar
      ixgbe: Avoid NULL pointer dereference with VF on non-IPsec hw · 92924064
      Dann Frazier authored
      An ipsec structure will not be allocated if the hardware does not support
      offload. Fixes the following Oops:
      
      [  191.045452] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      [  191.054232] Mem abort info:
      [  191.057014]   ESR = 0x96000004
      [  191.060057]   Exception class = DABT (current EL), IL = 32 bits
      [  191.065963]   SET = 0, FnV = 0
      [  191.069004]   EA = 0, S1PTW = 0
      [  191.072132] Data abort info:
      [  191.074999]   ISV = 0, ISS = 0x00000004
      [  191.078822]   CM = 0, WnR = 0
      [  191.081780] user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000043d9e467
      [  191.088382] [0000000000000000] pgd=0000000000000000
      [  191.093252] Internal error: Oops: 96000004 [#1] SMP
      [  191.098119] Modules linked in: vhost_net vhost tap vfio_pci vfio_virqfd vfio_iommu_type1 vfio xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter devlink ebtables ip6table_filter ip6_tables iptable_filter bpfilter ipmi_ssif nls_iso8859_1 input_leds joydev ipmi_si hns_roce_hw_v2 ipmi_devintf hns_roce ipmi_msghandler cppc_cpufreq sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 ses enclosure btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid raid6_pq libcrc32c raid1 raid0 multipath linear ixgbevf hibmc_drm ttm
      [  191.168607]  drm_kms_helper aes_ce_blk aes_ce_cipher syscopyarea crct10dif_ce sysfillrect ghash_ce qla2xxx sysimgblt sha2_ce sha256_arm64 hisi_sas_v3_hw fb_sys_fops sha1_ce uas nvme_fc mpt3sas ixgbe drm hisi_sas_main nvme_fabrics usb_storage hclge scsi_transport_fc ahci libsas hnae3 raid_class libahci xfrm_algo scsi_transport_sas mdio aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64
      [  191.202952] CPU: 94 PID: 0 Comm: swapper/94 Not tainted 4.19.0-rc1+ #11
      [  191.209553] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.20.01 04/26/2019
      [  191.218064] pstate: 20400089 (nzCv daIf +PAN -UAO)
      [  191.222873] pc : ixgbe_ipsec_vf_clear+0x60/0xd0 [ixgbe]
      [  191.228093] lr : ixgbe_msg_task+0x2d0/0x1088 [ixgbe]
      [  191.233044] sp : ffff000009b3bcd0
      [  191.236346] x29: ffff000009b3bcd0 x28: 0000000000000000
      [  191.241647] x27: ffff000009628000 x26: 0000000000000000
      [  191.246946] x25: ffff803f652d7600 x24: 0000000000000004
      [  191.252246] x23: ffff803f6a718900 x22: 0000000000000000
      [  191.257546] x21: 0000000000000000 x20: 0000000000000000
      [  191.262845] x19: 0000000000000000 x18: 0000000000000000
      [  191.268144] x17: 0000000000000000 x16: 0000000000000000
      [  191.273443] x15: 0000000000000000 x14: 0000000100000026
      [  191.278742] x13: 0000000100000025 x12: ffff8a5f7fbe0df0
      [  191.284042] x11: 000000010000000b x10: 0000000000000040
      [  191.289341] x9 : 0000000000001100 x8 : ffff803f6a824fd8
      [  191.294640] x7 : ffff803f6a825098 x6 : 0000000000000001
      [  191.299939] x5 : ffff000000f0ffc0 x4 : 0000000000000000
      [  191.305238] x3 : ffff000028c00000 x2 : ffff803f652d7600
      [  191.310538] x1 : 0000000000000000 x0 : ffff000000f205f0
      [  191.315838] Process swapper/94 (pid: 0, stack limit = 0x00000000addfed5a)
      [  191.322613] Call trace:
      [  191.325055]  ixgbe_ipsec_vf_clear+0x60/0xd0 [ixgbe]
      [  191.329927]  ixgbe_msg_task+0x2d0/0x1088 [ixgbe]
      [  191.334536]  ixgbe_msix_other+0x274/0x330 [ixgbe]
      [  191.339233]  __handle_irq_event_percpu+0x78/0x270
      [  191.343924]  handle_irq_event_percpu+0x40/0x98
      [  191.348355]  handle_irq_event+0x50/0xa8
      [  191.352180]  handle_fasteoi_irq+0xbc/0x148
      [  191.356263]  generic_handle_irq+0x34/0x50
      [  191.360259]  __handle_domain_irq+0x68/0xc0
      [  191.364343]  gic_handle_irq+0x84/0x180
      [  191.368079]  el1_irq+0xe8/0x180
      [  191.371208]  arch_cpu_idle+0x30/0x1a8
      [  191.374860]  do_idle+0x1dc/0x2a0
      [  191.378077]  cpu_startup_entry+0x2c/0x30
      [  191.381988]  secondary_start_kernel+0x150/0x1e0
      [  191.386506] Code: 6b15003f 54000320 f1404a9f 54000060 (79400260)
      
      Fixes: eda0333a ("ixgbe: add VF IPsec management")
      Signed-off-by: default avatarDann Frazier <dann.frazier@canonical.com>
      Acked-by: default avatarShannon Nelson <snelson@pensando.io>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      92924064
    • Miguel Bernal Marin's avatar
    • Gustavo A. R. Silva's avatar
      ice: Use struct_size() helper · 89f6a305
      Gustavo A. R. Silva authored
      One of the more common cases of allocation size calculations is finding
      the size of a structure that has a zero-sized array at the end, along
      with memory for some number of elements for that array. For example:
      
      struct foo {
          int stuff;
          struct boo entry[];
      };
      
      size = sizeof(struct foo) + count * sizeof(struct boo);
      instance = alloc(size, GFP_KERNEL);
      
      Instead of leaving these open-coded and prone to type mistakes, we can
      now use the new struct_size() helper:
      
      size = struct_size(instance, entry, count);
      
      This code was detected with the help of Coccinelle.
      Signed-off-by: default avatar"Gustavo A. R. Silva" <gustavo@embeddedor.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      89f6a305
    • David S. Miller's avatar
      Merge branch 'net-sched-Add-txtime-assist-support-for-taprio' · 0a7960c7
      David S. Miller authored
      Vedang Patel says:
      
      ====================
      net/sched: Add txtime-assist support for taprio.
      
      Changes in v6:
      - Use _BITUL() instead of BIT() in UAPI for etf. (patch #1)
      - Fix a bug reported by kbuild test bot in length_to_duration(). (patch #6)
      - Remove an unused function (get_cycle_start()). (Patch #6)
      
      Changes in v5:
      - Commit message improved for the igb patch (patch #1).
      - Fixed typo in commit message for etf patch (patch #2).
      
      Changes in v4:
      - Remove inline directive from functions in foo.c.
      - Fix spacing in pkt_sched.h (for etf patch).
      
      Changes in v3:
      - Simplify implementation for taprio flags.
      - txtime_delay can only be set if txtime-assist mode is enabled.
      - txtime_delay and flags will only be visible in tc output if set by user.
      - Minor changes in error reporting.
      
      Changes in v2:
      - Txtime-offload has now been renamed to txtime-assist mode.
      - Renamed the offload parameter to flags.
      - Removed the code which introduced the hardware offloading functionality.
      
      Original Cover letter (with above changes included)
      --------------------------------------------------
      
      Currently, we are seeing packets being transmitted outside their
      timeslices. We can confirm that the packets are being dequeued at the right
      time. So, the delay is induced after the packet is dequeued, because
      taprio, without any offloading, has no control of when a packet is actually
      transmitted.
      
      In order to solve this, we are making use of the txtime feature provided by
      ETF qdisc. Hardware offloading needs to be supported by the ETF qdisc in
      order to take advantage of this feature. The taprio qdisc will assign
      txtime (in skb->tstamp) for all the packets which do not have the txtime
      allocated via the SO_TXTIME socket option. For the packets which already
      have SO_TXTIME set, taprio will validate whether the packet will be
      transmitted in the correct interval.
      
      In order to support this, the following parameters have been added:
      - flags (taprio): This is added in order to support different offloading
        modes which will be added in the future.
      - txtime-delay (taprio): This indicates the minimum time it will take for
        the packet to hit the wire after it reaches taprio_enqueue(). This is
        useful in determining whether we can transmit the packet in the remaining
        time if the gate corresponding to the packet is currently open.
      - skip_skb_check (ETF): ETF currently drops any packet which does not have
        the SO_TXTIME socket option set. This check can be skipped by specifying
        this option.
      
      Following is an example configuration:
      
      tc qdisc replace dev $IFACE parent root handle 100 taprio \\
          num_tc 3 \\
          map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \\
          queues 1@0 1@0 1@0 \\
          base-time $BASE_TIME \\
          sched-entry S 01 300000 \\
          sched-entry S 02 300000 \\
          sched-entry S 04 400000 \\
          flags 0x1 \\
          txtime-delay 200000 \\
          clockid CLOCK_TAI
      
      tc qdisc replace dev $IFACE parent 100:1 etf \\
          offload delta 200000 clockid CLOCK_TAI skip_skb_check
      
      Here, the "flags" parameter is indicating that the txtime-assist mode is
      enabled. Also, all the traffic classes have been assigned the same queue.
      This is to prevent the traffic classes in the lower priority queues from
      getting starved. Note that this configuration is specific to the i210
      ethernet card. Other network cards where the hardware queues are given the
      same priority, might be able to utilize more than one queue.
      
      Following are some of the other highlights of the series:
      - Fix a bug where hardware timestamping and SO_TXTIME options cannot be
        used together. (Patch 1)
      - Introduces the skip_skb_check option.  (Patch 2)
      - Make TxTime assist mode work with TCP packets (Patch 7).
      
      The following changes are recommended to be done in order to get the best
      performance from taprio in this mode:
      ip link set dev enp1s0 mtu 1514
      ethtool -K eth0 gso off
      ethtool -K eth0 tso off
      ethtool --set-eee eth0 eee off
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a7960c7
    • Vedang Patel's avatar
      taprio: Adjust timestamps for TCP packets · 54002066
      Vedang Patel authored
      When the taprio qdisc is running in "txtime offload" mode, it will
      set the launchtime value (in skb->tstamp) for all the packets which do
      not have the SO_TXTIME socket option. But, the TCP packets already have
      this value set and it indicates the earliest departure time represented
      in CLOCK_MONOTONIC clock.
      
      We need to respect the timestamp set by the TCP subsystem. So, convert
      this time to the clock which taprio is using and ensure that the packet
      is not transmitted before the deadline set by TCP.
      Signed-off-by: default avatarVedang Patel <vedang.patel@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54002066