1. 24 Apr, 2019 18 commits
  2. 23 Apr, 2019 22 commits
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Use #define for the WQE wait timeout constant · f8ebecf2
      Maxim Mikityanskiy authored
      Create a #define for the timeout of mlx5e_wait_for_min_rx_wqes to
      clarify the meaning of a magic number.
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      f8ebecf2
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Remove unused rx_page_reuse stat · 03ceda6f
      Maxim Mikityanskiy authored
      Remove the no longer used page_reuse stat of RQs.
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      03ceda6f
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Take HW interrupt trigger into a function · 63d26b49
      Maxim Mikityanskiy authored
      mlx5e_trigger_irq posts a NOP to the ICO SQ just to trigger an IRQ and
      enter the NAPI poll on the right CPU according to the affinity. Use it
      in mlx5e_activate_rq.
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      63d26b49
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Remove unused parameter · 10961c56
      Maxim Mikityanskiy authored
      mdev is unused in mlx5e_rx_is_linear_skb.
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      10961c56
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Add an underflow warning comment · b1b187e1
      Maxim Mikityanskiy authored
      mlx5e_mpwqe_get_log_rq_size calculates the number of WQEs (N) based on
      the requested number of frames in the RQ (F) and the number of packets
      per WQE (P). It ensures that N is not less than the minimum number of
      WQEs in an RQ (N_min). Arithmetically, it means that F / P >= N_min
      should be true. This function deals with logarithms, so it should check
      that log(F) - log(P) >= log(N_min). However, if F < P, this expression
      will cause an unsigned underflow. Check log(F) >= log(P) + log(N_min)
      instead.
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      b1b187e1
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Move parameter calculation functions to en/params.c · 9a22d5d8
      Maxim Mikityanskiy authored
      This commit moves the parameter calculation functions to a separate file
      for better modularity and code sharing with future features.
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      9a22d5d8
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Report mlx5e_xdp_set errors · 74bbaebf
      Maxim Mikityanskiy authored
      If the channels fail to reopen after setting an XDP program, return the
      error code instead of 0. A proper fix is still needed, as now any error
      while reopening the channels brings the interface down. This patch only
      adds error reporting.
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      74bbaebf
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Remove unused parameter · 83b2fd64
      Maxim Mikityanskiy authored
      params is unused in mlx5e_init_di_list.
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      83b2fd64
    • Shay Agroskin's avatar
      net/mlx5e: XDP, Inline small packets into the TX MPWQE in XDP xmit flow · c2273219
      Shay Agroskin authored
      Upon high packet rate with multiple CPUs TX workloads, much of the HCA's
      resources are spent on prefetching TX descriptors, thus affecting
      transmission rates.
      This patch comes to mitigate this problem by moving some workload to the
      CPU and reducing the HW data prefetch overhead for small packets (<= 256B).
      
      When forwarding packets with XDP, a packet that is smaller
      than a certain size (set to ~256 bytes) would be sent inline within
      its WQE TX descrptor (mem-copied), when the hardware tx queue is congested
      beyond a pre-defined water-mark.
      
      This is added to better utilize the HW resources (which now makes
      one less packet data prefetch) and allow better scalability, on the
      account of CPU usage (which now 'memcpy's the packet into the WQE).
      
      To load balance between HW and CPU and get max packet rate, we use
      watermarks to detect how much the HW is congested and move the work
      loads back and forth between HW and CPU.
      
      Performance:
      Tested packet rate for UDP 64Byte multi-stream
      over two dual port ConnectX-5 100Gbps NICs.
      CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
      
      * Tested with hyper-threading disabled
      
      XDP_TX:
      
      |          | before | after   |       |
      | 24 rings | 51Mpps | 116Mpps | +126% |
      | 1 ring   | 12Mpps | 12Mpps  | same  |
      
      XDP_REDIRECT:
      
      ** Below is the transmit rate, not the redirection rate
      which might be larger, and is not affected by this patch.
      
      |          | before  | after   |      |
      | 32 rings | 64Mpps  | 92Mpps  | +43% |
      | 1 ring   | 6.4Mpps | 6.4Mpps | same |
      
      As we can see, feature significantly improves scaling, without
      hurting single ring performance.
      Signed-off-by: default avatarShay Agroskin <shayag@mellanox.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      c2273219
    • Shay Agroskin's avatar
      net/mlx5e: XDP, Add TX MPWQE session counter · 73cab880
      Shay Agroskin authored
      This counter tracks how many TX MPWQE sessions are started in XDP SQ
      in XDP TX/REDIRECT flow. It counts per-channel and global stats.
      Signed-off-by: default avatarShay Agroskin <shayag@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      73cab880
    • Tariq Toukan's avatar
      net/mlx5e: XDP, Enhance RQ indication for XDP redirect flush · 15143bf5
      Tariq Toukan authored
      The XDP redirect flush indication belongs to the receive queue,
      not to its XDP send queue.
      
      For this, use a new bit on rq->flags.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarShay Agroskin <shayag@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      15143bf5
    • Tariq Toukan's avatar
      net/mlx5e: XDP, Fix shifted flag index in RQ bitmap · f03590f7
      Tariq Toukan authored
      Values in enum mlx5e_rq_flag are used as bit indixes.
      Intention was to use them with no BIT(i) wrapping.
      
      No functional bug fix here, as the same (shifted)flag bit
      is used for all set, test, and clear operations.
      
      Fixes: 121e8927 ("net/mlx5e: Refactor RQ XDP_TX indication")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarShay Agroskin <shayag@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      f03590f7
    • Tariq Toukan's avatar
      net/mlx5e: RX, Support multiple outstanding UMR posts · fd9b4be8
      Tariq Toukan authored
      The buffers mapping of the Multi-Packet WQEs (of Striding RQ)
      is done via UMR posts, one UMR WQE per an RX MPWQE.
      
      A single MPWQE is capable of serving many incoming packets,
      usually larger than the budget of a single napi cycle.
      Hence, posting a single UMR WQE per napi cycle (and handling its
      completion in the next cycle) works fine in many common cases,
      but not always.
      
      When an XDP program is loaded, every MPWQE is capable of serving less
      packets, to satisfy the packet-per-page requirement.
      Thus, for the same number of packets more MPWQEs (and UMR posts)
      are needed (twice as much for the default MTU), giving less latency
      room for the UMR completions.
      
      In this patch, we add support for multiple outstanding UMR posts,
      to allow faster gap closure between consuming MPWQEs and reposting
      them back into the WQ.
      
      For better SW and HW locality, we combine the UMR posts in bulks of
      (at least) two.
      
      This is expected to improve packet rate in high CPU scale.
      
      Performance test:
      As expected, huge improvement in large-scale (48 cores).
      
      xdp_redirect_map, 64B UDP multi-stream.
      Redirect from ConnectX-5 100Gbps to ConnectX-6 100Gbps.
      CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz.
      
      Before: Unstable, 7 to 30 Mpps
      After:  Stable,   at 70.5 Mpps
      
      No degradation in other tested scenarios.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      fd9b4be8
    • Saeed Mahameed's avatar
    • David S. Miller's avatar
      Merge branch 'net-phy-mscc-Improvements-to-VSC8514-PHY-driver' · 539b593d
      David S. Miller authored
      Kavya Sree Kotagiri says:
      
      ====================
      net: phy: mscc: Improvements to VSC8514 PHY driver.
      
          The VSC8514 PHY is a 4-ports PHY that is 10/100/1000BASE-T, 100BASE-FX,
          1000BASE-X, can communicate with the MAC via QSGMII.
          The MAC interface protocol for each port within QSGMII can
          be either 1000BASE-X or SGMII, if the QSGMII MAC that the VSC8514 is
          connecting to supports this functionality.
          VSC8514 also supports SGMII MAC-side autonegotiation on each individual
          port, downshifting, can set the blinking pattern of each of its 4 LEDs,
          SyncE, 1000BASE-T Ring Resiliency as well as HP Auto-MDIX detection.
      
          This patch series adds support for 10BASE-T, 100BASE-TX, and
          1000BASE-T, QSGMII link with the MAC, downshifting, HP Auto-MDIX
          detection and blinking pattern for its 4 LEDs.
      
          The GPIO register bank is a set of registers that are common to all
          PHYs in the package. So any modification in any register of this bank
          affects all PHYs of the package.
      
          If the PHYs haven't been reset before booting the Linux kernel and were
          configured to use interrupts for e.g. link status updates, it is
          required to clear the interrupts mask register of all PHYs before being
          able to use interrupts with any PHY. The first PHY of the package that
          will be init will take care of clearing all PHYs interrupts mask
          registers. Thus, we need to keep track of the init sequence in the
          package, if it's already been done or if it's to be done.
      
          Most of the init sequence of a PHY of the package is common to all PHYs
          in the package, thus we use the SMI broadcast feature which enables us
          to propagate a write in one register of one PHY to all PHYs in the same
          package.
      
          This patch series adds support for VSC8514 in Microsemi driver(mscc.c)
          and removes support from Vitesse driver(vitesse.c).
      
      v8
      - mscc: Added appropriate code using phy_modify() in vsc8514_config_init().
      
      v7
      - mscc: Handled return values in vsc8514_config_init().
      
      v6
      - mscc: Added proper return value in vsc85xx_csr_ctrl_phy_read().
      - mscc: Replaced __mdiobus_write and__mdiobus_read with __phy_write and __phy_read resp.
      - mscc: Replaced register addresses in 8514_config_init() with proper constants.
      
      v5
      - mscc: Added return error statements for few function calls.
      - mscc: Added comments in vsc85xx_csr_ctrl_phy_read() and vsc85xx_csr_ctrl_phy_write()
      v4
      - mscc: Removed features settings
      - mscc: Removed aneg_done settings.
      
      v3
      - mscc: Used BIT(x) for PHY_MCB_S6G_WRITE and PHY_MCB_S6G_READ
              instead of hex.
      - mscc: Replaced magic numbers with proper constants.
      - mscc: Handled delays and timeouts at appropriate points.
      - mscc: Added comments/explanation where requested.
      
      v2
      - mscc: Sorted variable declarations in reverse christmas tree order.
      
      v1
      - Added 0/2 file.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      539b593d
    • Kavya Sree Kotagiri's avatar
      net: phy: vitesse: Remove support for VSC8514. · edeb207b
      Kavya Sree Kotagiri authored
      Add support for VSC8514 in Microsemi driver (mscc.c)
      with more features.
      Signed-off-by: default avatarKavya Sree Kotagiri <kavyasree.kotagiri@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edeb207b
    • Kavya Sree Kotagiri's avatar
      net: phy: mscc: add support for VSC8514 PHY. · e4f9ba64
      Kavya Sree Kotagiri authored
      The VSC8514 PHY is a 4-ports PHY that is 10/100/1000BASE-T, 100BASE-FX,
      1000BASE-X, can communicate with the MAC via QSGMII.
      The MAC interface protocol for each port within QSGMII can
      be either 1000BASE-X or SGMII, if the QSGMII MAC that the VSC8514 is
      connecting to supports this functionality.
      VSC8514 also supports SGMII MAC-side autonegotiation on each individual
      port, downshifting, can set the blinking pattern of each of its 4 LEDs,
      SyncE, 1000BASE-T Ring Resiliency as well as HP Auto-MDIX detection.
      
      This adds support for 10BASE-T, 100BASE-TX, and 1000BASE-T,
      QSGMII link with the MAC, downshifting, HP Auto-MDIX detection
      and blinking pattern for its 4 LEDs.
      
      The GPIO register bank is a set of registers that are common to all PHYs
      in the package. So any modification in any register of this bank affects
      all PHYs of the package.
      
      If the PHYs haven't been reset before booting the Linux kernel and were
      configured to use interrupts for e.g. link status updates, it is
      required to clear the interrupts mask register of all PHYs before being
      able to use interrupts with any PHY. The first PHY of the package that
      will be init will take care of clearing all PHYs interrupts mask
      registers. Thus, we need to keep track of the init sequence in the
      package, if it's already been done or if it's to be done.
      
      Most of the init sequence of a PHY of the package is common to all PHYs
      in the package, thus we use the SMI broadcast feature which enables us
      to propagate a write in one register of one PHY to all PHYs in the same
      package.
      Signed-off-by: default avatarKavya Sree Kotagiri <kavyasree.kotagiri@microchip.com>
      Signed-off-by: default avatarQuentin Schulz <quentin.schulz@bootlin.com>
      Co-developed-by: default avatarQuentin Schulz <quentin.schulz@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4f9ba64
    • Jian Shen's avatar
      net: phy: marvell: add new default led configure for m88e151x · a93f7fe1
      Jian Shen authored
      The default m88e151x LED configuration is 0x1177, used LED[0]
      for 1000M link, LED[1] for 100M link, and LED[2] for active.
      But for some boards, which use LED[0] for link, and LED[1] for
      active, prefer to be 0x1040. To be compatible with this case,
      this patch defines a new dev_flag, and set it before connect
      phy in HNS3 driver. When phy initializing, using the new
      LED configuration if this dev_flag is set.
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a93f7fe1
    • Florian Fainelli's avatar
      net: systemport: Remove need for DMA descriptor · 7e6e185c
      Florian Fainelli authored
      All we do is write the length/status and address bits to a DMA
      descriptor only to write its contents into on-chip registers right
      after, eliminate this unnecessary step.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e6e185c
    • Ido Schimmel's avatar
      bridge: Fix possible use-after-free when deleting bridge port · 697cd36c
      Ido Schimmel authored
      When a bridge port is being deleted, do not dereference it later in
      br_vlan_port_event() as it can result in a use-after-free [1] if the RCU
      callback was executed before invoking the function.
      
      [1]
      [  129.638551] ==================================================================
      [  129.646904] BUG: KASAN: use-after-free in br_vlan_port_event+0x53c/0x5fd
      [  129.654406] Read of size 8 at addr ffff8881e4aa1ae8 by task ip/483
      [  129.663008] CPU: 0 PID: 483 Comm: ip Not tainted 5.1.0-rc5-custom-02265-ga946bd73daac #1383
      [  129.672359] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
      [  129.682484] Call Trace:
      [  129.685242]  dump_stack+0xa9/0x10e
      [  129.689068]  print_address_description.cold.2+0x9/0x25e
      [  129.694930]  kasan_report.cold.3+0x78/0x9d
      [  129.704420]  br_vlan_port_event+0x53c/0x5fd
      [  129.728300]  br_device_event+0x2c7/0x7a0
      [  129.741505]  notifier_call_chain+0xb5/0x1c0
      [  129.746202]  rollback_registered_many+0x895/0xe90
      [  129.793119]  unregister_netdevice_many+0x48/0x210
      [  129.803384]  rtnl_delete_link+0xe1/0x140
      [  129.815906]  rtnl_dellink+0x2a3/0x820
      [  129.844166]  rtnetlink_rcv_msg+0x397/0x910
      [  129.868517]  netlink_rcv_skb+0x137/0x3a0
      [  129.882013]  netlink_unicast+0x49b/0x660
      [  129.900019]  netlink_sendmsg+0x755/0xc90
      [  129.915758]  ___sys_sendmsg+0x761/0x8e0
      [  129.966315]  __sys_sendmsg+0xf0/0x1c0
      [  129.988918]  do_syscall_64+0xa4/0x470
      [  129.993032]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  129.998696] RIP: 0033:0x7ff578104b58
      ...
      [  130.073811] Allocated by task 479:
      [  130.077633]  __kasan_kmalloc.constprop.5+0xc1/0xd0
      [  130.083008]  kmem_cache_alloc_trace+0x152/0x320
      [  130.088090]  br_add_if+0x39c/0x1580
      [  130.092005]  do_set_master+0x1aa/0x210
      [  130.096211]  do_setlink+0x985/0x3100
      [  130.100224]  __rtnl_newlink+0xc52/0x1380
      [  130.104625]  rtnl_newlink+0x6b/0xa0
      [  130.108541]  rtnetlink_rcv_msg+0x397/0x910
      [  130.113136]  netlink_rcv_skb+0x137/0x3a0
      [  130.117538]  netlink_unicast+0x49b/0x660
      [  130.121939]  netlink_sendmsg+0x755/0xc90
      [  130.126340]  ___sys_sendmsg+0x761/0x8e0
      [  130.130645]  __sys_sendmsg+0xf0/0x1c0
      [  130.134753]  do_syscall_64+0xa4/0x470
      [  130.138864]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      [  130.146195] Freed by task 0:
      [  130.149421]  __kasan_slab_free+0x125/0x170
      [  130.154016]  kfree+0xf3/0x310
      [  130.157349]  kobject_put+0x1a8/0x4c0
      [  130.161363]  rcu_core+0x859/0x19b0
      [  130.165175]  __do_softirq+0x250/0xa26
      [  130.170956] The buggy address belongs to the object at ffff8881e4aa1ae8
                      which belongs to the cache kmalloc-1k of size 1024
      [  130.184972] The buggy address is located 0 bytes inside of
                      1024-byte region [ffff8881e4aa1ae8, ffff8881e4aa1ee8)
      
      Fixes: 9c0ec2e7 ("bridge: support binding vlan dev link state to vlan member bridge ports")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Cc: Mike Manning <mmanning@vyatta.att-mail.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: default avatarMike Manning <mmanning@vyatta.att-mail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      697cd36c
    • Crag.Wang's avatar
      r8152: sync sa_family with the media type of network device · a6cbcb77
      Crag.Wang authored
      Without this patch the socket address family sporadically gets wrong
      value ends up the dev_set_mac_address() fails to set the desired MAC
      address.
      
      Fixes: 25766271 ("r8152: Refresh MAC address during USBDEVFS_RESET")
      Signed-off-by: default avatarCrag.Wang <crag.wang@dell.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-By: default avatarMario Limonciello <mario.limonciello@dell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6cbcb77
    • David S. Miller's avatar
      Merge branch 'mlxsw-Shared-buffer-improvements' · 6f97955f
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Shared buffer improvements
      
      This patchset includes two improvements with regards to shared buffer
      configuration in mlxsw.
      
      The first part of this patchset forbids the user from performing illegal
      shared buffer configuration that can result in unnecessary packet loss.
      In order to better communicate these configuration failures to the user,
      extack is propagated from devlink towards drivers. This is done in
      patches #1-#8.
      
      The second part of the patchset deals with the shared buffer
      configuration of the CPU port. When a packet is trapped by the device,
      it is sent across the PCI bus to the attached host CPU. From the
      device's perspective, it is as if the packet is transmitted through the
      CPU port.
      
      While testing traffic directed at the CPU it became apparent that for
      certain packet sizes and certain burst sizes, the current shared buffer
      configuration of the CPU port is inadequate and results in packet drops.
      The configuration is adjusted by patches #9-#14 that create two new pools
      - ingress & egress - which are dedicated for CPU traffic.
      ====================
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f97955f