1. 21 Dec, 2017 1 commit
  2. 20 Dec, 2017 14 commits
  3. 19 Dec, 2017 25 commits
    • Moshe Shemesh's avatar
      net/mlx5: Stay in polling mode when command EQ destroy fails · a2fba188
      Moshe Shemesh authored
      During unload, on mlx5_stop_eqs we move command interface from events
      mode to polling mode, but if command interface EQ destroy fail we move
      back to events mode.
      That's wrong since even if we fail to destroy command interface EQ, we
      do release its irq, so no interrupts will be received.
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      a2fba188
    • Moshe Shemesh's avatar
      net/mlx5: Cleanup IRQs in case of unload failure · d6b2785c
      Moshe Shemesh authored
      When mlx5_stop_eqs fails to destroy any of the eqs it returns with an error.
      In such failure flow the function will return without
      releasing all EQs irqs and then pci_free_irq_vectors will fail.
      Fix by only warn on destroy EQ failure and continue to release other
      EQs and their irqs.
      
      It fixes the following kernel trace:
      kernel: kernel BUG at drivers/pci/msi.c:352!
      ...
      ...
      kernel: Call Trace:
      kernel: pci_disable_msix+0xd3/0x100
      kernel: pci_free_irq_vectors+0xe/0x20
      kernel: mlx5_load_one.isra.17+0x9f5/0xec0 [mlx5_core]
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      d6b2785c
    • Maor Gottlieb's avatar
      net/mlx5: Fix steering memory leak · 139ed6c6
      Maor Gottlieb authored
      Flow steering priority and namespace are software only objects that
      didn't have the proper destructors and were not freed during steering
      cleanup.
      
      Fix it by adding destructor functions for these objects.
      
      Fixes: bd71b08e ("net/mlx5: Support multiple updates of steering rules in parallel")
      Signed-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      139ed6c6
    • Gal Pressman's avatar
      net/mlx5e: Prevent possible races in VXLAN control flow · 0c1cc8b2
      Gal Pressman authored
      When calling add/remove VXLAN port, a lock must be held in order to
      prevent race scenarios when more than one add/remove happens at the
      same time.
      Fix by holding our state_lock (mutex) as done by all other parts of the
      driver.
      Note that the spinlock protecting the radix-tree is still needed in
      order to synchronize radix-tree access from softirq context.
      
      Fixes: b3f63c3d ("net/mlx5e: Add netdev support for VXLAN tunneling")
      Signed-off-by: default avatarGal Pressman <galp@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      0c1cc8b2
    • Gal Pressman's avatar
      net/mlx5e: Add refcount to VXLAN structure · 23f4cc2c
      Gal Pressman authored
      A refcount mechanism must be implemented in order to prevent unwanted
      scenarios such as:
      - Open an IPv4 VXLAN interface
      - Open an IPv6 VXLAN interface (different socket)
      - Remove one of the interfaces
      
      With current implementation, the UDP port will be removed from our VXLAN
      database and turn off the offloads for the other interface, which is
      still active.
      The reference count mechanism will only allow UDP port removals once all
      consumers are gone.
      
      Fixes: b3f63c3d ("net/mlx5e: Add netdev support for VXLAN tunneling")
      Signed-off-by: default avatarGal Pressman <galp@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      23f4cc2c
    • Gal Pressman's avatar
      net/mlx5e: Fix possible deadlock of VXLAN lock · 63235141
      Gal Pressman authored
      mlx5e_vxlan_lookup_port is called both from mlx5e_add_vxlan_port (user
      context) and mlx5e_features_check (softirq), but the lock acquired does
      not disable bottom half and might result in deadlock. Fix it by simply
      replacing spin_lock() with spin_lock_bh().
      While at it, replace all unnecessary spin_lock_irq() to spin_lock_bh().
      
      lockdep's WARNING: inconsistent lock state
      [  654.028136] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [  654.028229] swapper/5/0 [HC0[0]:SC1[9]:HE1:SE0] takes:
      [  654.028321]  (&(&vxlan_db->lock)->rlock){+.?.}, at: [<ffffffffa06e7f0e>] mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core]
      [  654.028528] {SOFTIRQ-ON-W} state was registered at:
      [  654.028607]   _raw_spin_lock+0x3c/0x70
      [  654.028689]   mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core]
      [  654.028794]   mlx5e_vxlan_add_port+0x2e/0x120 [mlx5_core]
      [  654.028878]   process_one_work+0x1e9/0x640
      [  654.028942]   worker_thread+0x4a/0x3f0
      [  654.029002]   kthread+0x141/0x180
      [  654.029056]   ret_from_fork+0x24/0x30
      [  654.029114] irq event stamp: 579088
      [  654.029174] hardirqs last  enabled at (579088): [<ffffffff818f475a>] ip6_finish_output2+0x49a/0x8c0
      [  654.029309] hardirqs last disabled at (579087): [<ffffffff818f470e>] ip6_finish_output2+0x44e/0x8c0
      [  654.029446] softirqs last  enabled at (579030): [<ffffffff810b3b3d>] irq_enter+0x6d/0x80
      [  654.029567] softirqs last disabled at (579031): [<ffffffff810b3c05>] irq_exit+0xb5/0xc0
      [  654.029684] other info that might help us debug this:
      [  654.029781]  Possible unsafe locking scenario:
      
      [  654.029868]        CPU0
      [  654.029908]        ----
      [  654.029947]   lock(&(&vxlan_db->lock)->rlock);
      [  654.030045]   <Interrupt>
      [  654.030090]     lock(&(&vxlan_db->lock)->rlock);
      [  654.030162]
       *** DEADLOCK ***
      
      Fixes: b3f63c3d ("net/mlx5e: Add netdev support for VXLAN tunneling")
      Signed-off-by: default avatarGal Pressman <galp@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      63235141
    • Moni Shoua's avatar
      net/mlx5: Fix error flow in CREATE_QP command · dbff26e4
      Moni Shoua authored
      In error flow, when DESTROY_QP command should be executed, the wrong
      mailbox was set with data, not the one that is written to hardware,
      Fix that.
      
      Fixes: 09a7d9ec '{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc'
      Signed-off-by: default avatarMoni Shoua <monis@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      dbff26e4
    • Eugenia Emantayev's avatar
      net/mlx5: Fix misspelling in the error message and comment · 777ec2b2
      Eugenia Emantayev authored
      Fix misspelling in word syndrome.
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: default avatarEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      777ec2b2
    • Eugenia Emantayev's avatar
      net/mlx5e: Fix defaulting RX ring size when not needed · 696a97cf
      Eugenia Emantayev authored
      Fixes the bug when turning on/off CQE compression mechanism
      resets the RX rings size to default value when it is not
      needed.
      
      Fixes: 2fc4bfb7 ("net/mlx5e: Dynamic RQ type infrastructure")
      Signed-off-by: default avatarEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      696a97cf
    • Gal Pressman's avatar
      net/mlx5e: Fix features check of IPv6 traffic · 2989ad1e
      Gal Pressman authored
      The assumption that the next header field contains the transport
      protocol is wrong for IPv6 packets with extension headers.
      Instead, we should look the inner-most next header field in the buffer.
      This will fix TSO offload for tunnels over IPv6 with extension headers.
      
      Performance testing: 19.25x improvement, cool!
      Measuring bandwidth of 16 threads TCP traffic over IPv6 GRE tap.
      CPU: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
      NIC: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
      TSO: Enabled
      Before: 4,926.24  Mbps
      Now   : 94,827.91 Mbps
      
      Fixes: b3f63c3d ("net/mlx5e: Add netdev support for VXLAN tunneling")
      Signed-off-by: default avatarGal Pressman <galp@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      2989ad1e
    • Huy Nguyen's avatar
      net/mlx5e: Fix ETS BW check · ff089191
      Huy Nguyen authored
      Fix bug that allows ets bw sum to be 0% when ets tc type exists.
      
      Fixes: 08fb1dac ('net/mlx5e: Support DCBNL IEEE ETS')
      Signed-off-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Reviewed-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      ff089191
    • Eran Ben Elisha's avatar
      net/mlx5: Fix rate limit packet pacing naming and struct · 37e92a9d
      Eran Ben Elisha authored
      In mlx5_ifc, struct size was not complete, and thus driver was sending
      garbage after the last defined field. Fixed it by adding reserved field
      to complete the struct size.
      
      In addition, rename all set_rate_limit to set_pp_rate_limit to be
      compliant with the Firmware <-> Driver definition.
      
      Fixes: 7486216b ("{net,IB}/mlx5: mlx5_ifc updates")
      Fixes: 1466cc5b ("net/mlx5: Rate limit tables support")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      37e92a9d
    • Saeed Mahameed's avatar
      Revert "mlx5: move affinity hints assignments to generic code" · 231243c8
      Saeed Mahameed authored
      Before the offending commit, mlx5 core did the IRQ affinity itself,
      and it seems that the new generic code have some drawbacks and one
      of them is the lack for user ability to modify irq affinity after
      the initial affinity values got assigned.
      
      The issue is still being discussed and a solution in the new generic code
      is required, until then we need to revert this patch.
      
      This fixes the following issue:
      echo <new affinity> > /proc/irq/<x>/smp_affinity
      fails with  -EIO
      
      This reverts commit a435393a.
      Note: kept mlx5_get_vector_affinity in include/linux/mlx5/driver.h since
      it is used in mlx5_ib driver.
      
      Fixes: a435393a ("mlx5: move affinity hints assignments to generic code")
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jes Sorensen <jsorensen@fb.com>
      Reported-by: default avatarJes Sorensen <jsorensen@fb.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      231243c8
    • Kamal Heib's avatar
      net/mlx5: FPGA, return -EINVAL if size is zero · bae115a2
      Kamal Heib authored
      Currently, if a size of zero is passed to
      mlx5_fpga_mem_{read|write}_i2c()
      the "err" return value will not be initialized, which triggers gcc
      warnings:
      
      [..]/mlx5/core/fpga/sdk.c:87 mlx5_fpga_mem_read_i2c() error:
      uninitialized symbol 'err'.
      [..]/mlx5/core/fpga/sdk.c:115 mlx5_fpga_mem_write_i2c() error:
      uninitialized symbol 'err'.
      
      fix that.
      
      Fixes: a9956d35 ('net/mlx5: FPGA, Add SBU infrastructure')
      Signed-off-by: default avatarKamal Heib <kamalh@mellanox.com>
      Reviewed-by: default avatarYevgeny Kliteynik <kliteyn@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      bae115a2
    • Phil Sutter's avatar
      ipv4: fib: Fix metrics match when deleting a route · d03a4557
      Phil Sutter authored
      The recently added fib_metrics_match() causes a regression for routes
      with both RTAX_FEATURES and RTAX_CC_ALGO if the latter has
      TCP_CONG_NEEDS_ECN flag set:
      
      | # ip link add d0 type dummy
      | # ip link set d0 up
      | # ip route add 172.29.29.0/24 dev d0 features ecn congctl dctcp
      | # ip route del 172.29.29.0/24 dev d0 features ecn congctl dctcp
      | RTNETLINK answers: No such process
      
      During route insertion, fib_convert_metrics() detects that the given CC
      algo requires ECN and hence sets DST_FEATURE_ECN_CA bit in
      RTAX_FEATURES.
      
      During route deletion though, fib_metrics_match() compares stored
      RTAX_FEATURES value with that from userspace (which obviously has no
      knowledge about DST_FEATURE_ECN_CA) and fails.
      
      Fixes: 5f9ae3d9 ("ipv4: do metrics match when looking up and deleting a route")
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d03a4557
    • Fredrik Hallenberg's avatar
      net: stmmac: Fix bad RX timestamp extraction · a1762456
      Fredrik Hallenberg authored
      As noted in dwmac4_wrback_get_rx_timestamp_status the timestamp is found
      in the context descriptor following the current descriptor. However the
      current code looks for the context descriptor in the current
      descriptor, which will always fail.
      Signed-off-by: default avatarFredrik Hallenberg <megahallon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1762456
    • Fredrik Hallenberg's avatar
      net: stmmac: Fix TX timestamp calculation · 200922c9
      Fredrik Hallenberg authored
      When using GMAC4 the value written in PTP_SSIR should be shifted however
      the shifted value is also used in subsequent calculations which results
      in a bad timestamp value.
      Signed-off-by: default avatarFredrik Hallenberg <megahallon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      200922c9
    • Jon Maloy's avatar
      tipc: fix list sorting bug in function tipc_group_update_member() · 3db09601
      Jon Maloy authored
      When, during a join operation, or during message transmission, a group
      member needs to be added to the group's 'congested' list, we sort it
      into the list in ascending order, according to its current advertised
      window size. However, we miss the case when the member is already on
      that list. This will have the result that the member, after the window
      size has been decremented, might be at the wrong position in that list.
      This again may have the effect that we during broadcast and multicast
      transmissions miss the fact that a destination is not yet ready for
      reception, and we end up sending anyway. From this point on, the
      behavior during the remaining session is unpredictable, e.g., with
      underflowing window sizes.
      
      We now correct this bug by unconditionally removing the member from
      the list before (re-)sorting it in.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3db09601
    • Xin Long's avatar
      ip6_tunnel: get the min mtu properly in ip6_tnl_xmit · c9fefa08
      Xin Long authored
      Now it's using IPV6_MIN_MTU as the min mtu in ip6_tnl_xmit, but
      IPV6_MIN_MTU actually only works when the inner packet is ipv6.
      
      With IPV6_MIN_MTU for ipv4 packets, the new pmtu for inner dst
      couldn't be set less than 1280. It would cause tx_err and the
      packet to be dropped when the outer dst pmtu is close to 1280.
      
      Jianlin found it by running ipv4 traffic with the topo:
      
        (client) gre6 <---> eth1 (route) eth2 <---> gre6 (server)
      
      After changing eth2 mtu to 1300, the performance became very
      low, or the connection was even broken. The issue also affects
      ip4ip6 and ip6ip6 tunnels.
      
      So if the inner packet is ipv4, 576 should be considered as the
      min mtu.
      
      Note that for ip4ip6 and ip6ip6 tunnels, the inner packet can
      only be ipv4 or ipv6, but for gre6 tunnel, it may also be ARP.
      This patch using 576 as the min mtu for non-ipv6 packet works
      for all those cases.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9fefa08
    • Xin Long's avatar
      ip6_gre: remove the incorrect mtu limit for ipgre tap · 2c52129a
      Xin Long authored
      The same fix as the patch "ip_gre: remove the incorrect mtu limit for
      ipgre tap" is also needed for ip6_gre.
      
      Fixes: 61e84623 ("net: centralize net_device min/max MTU checking")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c52129a
    • Xin Long's avatar
      ip_gre: remove the incorrect mtu limit for ipgre tap · cfddd4c3
      Xin Long authored
      ipgre tap driver calls ether_setup(), after commit 61e84623
      ("net: centralize net_device min/max MTU checking"), the range
      of mtu is [min_mtu, max_mtu], which is [68, 1500] by default.
      
      It causes the dev mtu of the ipgre tap device to not be greater
      than 1500, this limit value is not correct for ipgre tap device.
      
      Besides, it's .change_mtu already does the right check. So this
      patch is just to set max_mtu as 0, and leave the check to it's
      .change_mtu.
      
      Fixes: 61e84623 ("net: centralize net_device min/max MTU checking")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cfddd4c3
    • Xin Long's avatar
      vxlan: update skb dst pmtu on tx path · a93bf0ff
      Xin Long authored
      Unlike ip tunnels, now vxlan doesn't do any pmtu update for
      upper dst pmtu, even if it doesn't match the lower dst pmtu
      any more.
      
      The problem can be reproduced when reducing the vxlan lower
      dev's pmtu when running netperf. In jianlin's testing, the
      performance went to 1/7 of the previous.
      
      This patch is to update the upper dst pmtu to match the lower
      dst pmtu on tx path so that packets can be sent out even when
      lower dev's pmtu has been changed.
      
      It also works for metadata dst.
      
      Note that this patch doesn't process any pmtu icmp packet.
      But even in the future, the support for pmtu icmp packets
      process of udp tunnels will also needs this.
      
      The same thing will be done for geneve in another patch.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a93bf0ff
    • Alexander Kochetkov's avatar
      net: arc_emac: restart stalled EMAC · 78aa0975
      Alexander Kochetkov authored
      Under certain conditions EMAC stop reception of incoming packets and
      continuously increment R_MISS register instead of saving data into
      provided buffer. The commit implement workaround for such situation.
      Then the stall detected EMAC will be restarted.
      
      On device the stall looks like the device lost it's dynamic IP address.
      ifconfig shows that interface error counter rapidly increments.
      At the same time on the DHCP server we can see continues DHCP-requests
      from device.
      
      In real network stalls happen really rarely. To make them frequent the
      broadcast storm[1] should be simulated. For simulation it is necessary
      to make following connections:
          1. connect radxarock to 1st port of switch
          2. connect some PC to 2nd port of switch
          3. connect two other free ports together using standard ethernet cable,
             in order to make a switching loop.
      
      After that, is necessary to make a broadcast storm. For example, running on
      PC 'ping' to some IP address triggers ARP-request storm. After some
      time (~10sec), EMAC on rk3188 will stall.
      
      Observed and tested on rk3188 radxarock.
      
      [1] https://en.wikipedia.org/wiki/Broadcast_radiationSigned-off-by: default avatarAlexander Kochetkov <al.kochet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78aa0975
    • Alexander Kochetkov's avatar
      net: arc_emac: fix arc_emac_rx() error paths · e688822d
      Alexander Kochetkov authored
      arc_emac_rx() has some issues found by code review.
      
      In case netdev_alloc_skb_ip_align() or dma_map_single() failure
      rx fifo entry will not be returned to EMAC.
      
      In case dma_map_single() failure previously allocated skb became
      lost to driver. At the same time address of newly allocated skb
      will not be provided to EMAC.
      Signed-off-by: default avatarAlexander Kochetkov <al.kochet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e688822d
    • Sean Wang's avatar
      net: mediatek: setup proper state for disabled GMAC on the default · 7352e252
      Sean Wang authored
      The current solution would setup fixed and force link of 1Gbps to the both
      GMAC on the default. However, The GMAC should always be put to link down
      state when the GMAC is disabled on certain target boards. Otherwise,
      the driver possibly receives unexpected data from the floating hardware
      connection through the unused GMAC. Although the driver had been added
      certain protection in RX path to get rid of such kind of unexpected data
      sent to the upper stack.
      Signed-off-by: default avatarSean Wang <sean.wang@mediatek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7352e252