1. 06 Sep, 2018 9 commits
    • Alaa Hleihel's avatar
      net/mlx5e: don't set CHECKSUM_COMPLETE on SCTP packets · fe1dc069
      Alaa Hleihel authored
      CHECKSUM_COMPLETE is not applicable to SCTP protocol.
      Setting it for SCTP packets leads to CRC32c validation failure.
      
      Fixes: bbceefce ("net/mlx5e: Support RX CHECKSUM_COMPLETE")
      Signed-off-by: default avatarAlaa Hleihel <alaa@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      fe1dc069
    • Natali Shechtman's avatar
      net/mlx5e: Set ECN for received packets using CQE indication · f007c13d
      Natali Shechtman authored
      In multi-host (MH) NIC scheme, a single HW port serves multiple hosts
      or sockets on the same host.
      The HW uses a mechanism in the PCIe buffer which monitors
      the amount of consumed PCIe buffers per host.
      On a certain configuration, under congestion,
      the HW emulates a switch doing ECN marking on packets using ECN
      indication on the completion descriptor (CQE).
      
      The driver needs to set the ECN bits on the packet SKB,
      such that the network stack can react on that, this commit does that.
      Signed-off-by: default avatarNatali Shechtman <natali@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      f007c13d
    • Shay Agroskin's avatar
      net/mlx5e: Replace PTP clock lock from RW lock to seq lock · 64109f1d
      Shay Agroskin authored
      Changed "priv.clock.lock" lock from 'rw_lock' to 'seq_lock'
      in order to improve packet rate performance.
      
      Tested on Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz.
      Sent 64b packets between two peers connected by ConnectX-5,
      and measured packet rate for the receiver in three modes:
      	no time-stamping (base rate)
      	time-stamping using rw_lock (old lock) for critical region
      	time-stamping using seq_lock (new lock) for critical region
      Only the receiver time stamped its packets.
      
      The measured packet rate improvements are:
      
      	Single flow (multiple TX rings to single RX ring):
      		without timestamping:	  4.26 (M packets)/sec
      		with rw-lock (old lock):  4.1  (M packets)/sec
      		with seq-lock (new lock): 4.16 (M packets)/sec
      		1.46% improvement
      
      	Multiple flows (multiple TX rings to six RX rings):
      		without timestamping: 	  22   (M packets)/sec
      		with rw-lock (old lock):  11.7 (M packets)/sec
      		with seq-lock (new lock): 21.3 (M packets)/sec
      		82.05% improvement
      
      The packet rate improvement is due to the lack of atomic operations
      for the 'readers' by the seq-lock.
      Since there are much more 'readers' than 'writers' contention
      on this lock, almost all atomic operations are saved.
      this results in a dramatic decrease in overall
      cache misses.
      Signed-off-by: default avatarShay Agroskin <shayag@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      64109f1d
    • Roi Dayan's avatar
      net/mlx5e: Move Q counters allocation and drop RQ to init_rx · 1462e48d
      Roi Dayan authored
      Not all profiles query the HW Q counters in update_stats() callback.
      HW Q couners are limited per device and in case of representors all
      their Q counters are allocated on the parent PF device.
      Avoid reundant allocation of HW Q counters by moving the allocation
      to init_rx profile callback.
      Signed-off-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      1462e48d
    • Kamal Heib's avatar
      net/mlx5e: Move mlx5e_priv_flags into en_ethtool.c · d2408205
      Kamal Heib authored
      Move the definition of mlx5e_priv_flags into en_ethtool.c because it's
      only used there.
      
      Fixes: 4e59e288 ("net/mlx5e: Introduce net device priv flags infrastructure")
      Signed-off-by: default avatarKamal Heib <kamalheib1@gmail.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      d2408205
    • Vlad Buslov's avatar
      net/mlx5: Add flow counters idr · 12d6066c
      Vlad Buslov authored
      Previous patch in series changed flow counter storage structure from
      rb_tree to linked list in order to improve flow counter traversal
      performance. The drawback of such solution is that flow counter lookup by
      id becomes linear in complexity.
      
      Store pointers to flow counters in idr in order to improve lookup
      performance to logarithmic again. Idr is non-intrusive data structure and
      doesn't require extending flow counter struct with new elements. This means
      that idr can be used for lookup, while linked list from previous patch is
      used for traversal, and struct mlx5_fc size is <= 2 cache lines.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarAmir Vadai <amir@vadai.me>
      Reviewed-by: default avatarPaul Blakey <paulb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      12d6066c
    • Vlad Buslov's avatar
      net/mlx5: Store flow counters in a list · 9aff93d7
      Vlad Buslov authored
      In order to improve performance of flow counter stats query loop that
      traverses all configured flow counters, replace rb_tree with double-linked
      list. This change improves performance of traversing flow counters by
      removing the tree traversal. (profiling data showed that call to rb_next
      was most top CPU consumer)
      
      However, lookup of flow flow counter in list becomes linear, instead of
      logarithmic. This problem is fixed by next patch in series, which adds idr
      for fast lookup. Idr is to be used because it is not an intrusive data
      structure and doesn't require adding any new members to struct mlx5_fc,
      which allows its control data part to stay <= 1 cache line in size.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarAmir Vadai <amir@vadai.me>
      Reviewed-by: default avatarPaul Blakey <paulb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      9aff93d7
    • Vlad Buslov's avatar
      net/mlx5: Add new list to store deleted flow counters · 6e5e2283
      Vlad Buslov authored
      In order to prevent flow counters stats work function from traversing whole
      flow counters tree while searching for deleted flow counters, new list to
      store deleted flow counters is added to struct mlx5_fc_stats. Lockless
      NULL-terminated single linked list data type is used due to following
      reasons:
       - This use case only needs to add single element to list and
       remove/iterate whole list. Lockless list doesn't require any additional
       synchronization for these operations.
       - First cache line of flow counter data structure only has space to store
       single additional pointer, which precludes usage of double linked list.
      
      Remove flow counter 'deleted' flag that is no longer needed.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarAmir Vadai <amir@vadai.me>
      Reviewed-by: default avatarPaul Blakey <paulb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      6e5e2283
    • Vlad Buslov's avatar
      net/mlx5: Change flow counters addlist type to single linked list · 83033688
      Vlad Buslov authored
      In order to prevent flow counters stats work function from traversing whole
      flow counters tree while searching for deleted flow counters, new list to
      store deleted flow counters will be added to struct mlx5_fc_stats. However,
      the flow counter structure itself has no space left to store any more data
      in first cache line. To free space that is needed to store additional list
      node, convert current addlist double linked list (two pointers per node) to
      atomic single linked list (one pointer per node).
      
      Lockless NULL-terminated single linked list data type doesn't require any
      additional external synchronization for operations used by flow counters
      module (add single new element, remove all elements from list and traverse
      them). Remove addlist_lock that is no longer needed.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarAmir Vadai <amir@vadai.me>
      Reviewed-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      83033688
  2. 05 Sep, 2018 27 commits
  3. 04 Sep, 2018 4 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 28619527
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Must perform TXQ teardown before unregistering interfaces in
          mac80211, from Toke Høiland-Jørgensen.
      
       2) Don't allow creating mac80211_hwsim with less than one channel, from
          Johannes Berg.
      
       3) Division by zero in cfg80211, fix from Johannes Berg.
      
       4) Fix endian issue in tipc, from Haiqing Bai.
      
       5) BPF sockmap use-after-free fixes from Daniel Borkmann.
      
       6) Spectre-v1 in mac80211_hwsim, from Jinbum Park.
      
       7) Missing rhashtable_walk_exit() in tipc, from Cong Wang.
      
       8) Revert kvzalloc() conversion of AF_PACKET, it breaks mmap() when
          kvzalloc() tries to use kmalloc() pages. From Eric Dumazet.
      
       9) Fix deadlock in hv_netvsc, from Dexuan Cui.
      
      10) Do not restart timewait timer on RST, from Florian Westphal.
      
      11) Fix double lwstate refcount grab in ipv6, from Alexey Kodanev.
      
      12) Unsolicit report count handling is off-by-one, fix from Hangbin Liu.
      
      13) Sleep-in-atomic in cadence driver, from Jia-Ju Bai.
      
      14) Respect ttl-inherit in ip6 tunnel driver, from Hangbin Liu.
      
      15) Use-after-free in act_ife, fix from Cong Wang.
      
      16) Missing hold to meta module in act_ife, from Vlad Buslov.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits)
        net: phy: sfp: Handle unimplemented hwmon limits and alarms
        net: sched: action_ife: take reference to meta module
        act_ife: fix a potential use-after-free
        net/mlx5: Fix SQ offset in QPs with small RQ
        tipc: correct spelling errors for tipc_topsrv_queue_evt() comments
        tipc: correct spelling errors for struct tipc_bc_base's comment
        bnxt_en: Do not adjust max_cp_rings by the ones used by RDMA.
        bnxt_en: Clean up unused functions.
        bnxt_en: Fix firmware signaled resource change logic in open.
        sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabel
        sctp: fix invalid reference to the index variable of the iterator
        net/ibm/emac: wrong emac_calc_base call was used by typo
        net: sched: null actions array pointer before releasing action
        vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition
        r8169: add support for NCube 8168 network card
        ip6_tunnel: respect ttl inherit for ip6tnl
        mac80211: shorten the IBSS debug messages
        mac80211: don't Tx a deauth frame if the AP forbade Tx
        mac80211: Fix station bandwidth setting after channel switch
        mac80211: fix a race between restart and CSA flows
        ...
      28619527
    • Andrew Lunn's avatar
      net: phy: sfp: Handle unimplemented hwmon limits and alarms · a33710bd
      Andrew Lunn authored
      Not all SFPs implement the registers containing sensor limits and
      alarms. Luckily, there is a bit indicating if they are implemented or
      not. Add checking for this bit, when deciding if the hwmon attributes
      should be visible.
      
      Fixes: 1323061a ("net: phy: sfp: Add HWMON support for module sensors")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a33710bd
    • Vlad Buslov's avatar
      net: sched: action_ife: take reference to meta module · 84cb8eb2
      Vlad Buslov authored
      Recent refactoring of add_metainfo() caused use_all_metadata() to add
      metainfo to ife action metalist without taking reference to module. This
      causes warning in module_put called from ife action cleanup function.
      
      Implement add_metainfo_and_get_ops() function that returns with reference
      to module taken if metainfo was added successfully, and call it from
      use_all_metadata(), instead of calling __add_metainfo() directly.
      
      Example warning:
      
      [  646.344393] WARNING: CPU: 1 PID: 2278 at kernel/module.c:1139 module_put+0x1cb/0x230
      [  646.352437] Modules linked in: act_meta_skbtcindex act_meta_mark act_meta_skbprio act_ife ife veth nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c tun ebtable_filter ebtables ip6table_filter ip6_tables bridge stp llc mlx5_ib ib_uverbs ib_core intel_rapl sb_edac x86_pkg_temp_thermal mlx5_core coretemp kvm_intel kvm nfsd igb irqbypass crct10dif_pclmul devlink crc32_pclmul mei_me joydev ses crc32c_intel enclosure auth_rpcgss i2c_algo_bit ioatdma ptp mei pps_core ghash_clmulni_intel iTCO_wdt iTCO_vendor_support pcspkr dca ipmi_ssif lpc_ich target_core_mod i2c_i801 ipmi_si ipmi_devintf pcc_cpufreq wmi ipmi_msghandler nfs_acl lockd acpi_pad acpi_power_meter grace sunrpc mpt3sas raid_class scsi_transport_sas
      [  646.425631] CPU: 1 PID: 2278 Comm: tc Not tainted 4.19.0-rc1+ #799
      [  646.432187] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [  646.440595] RIP: 0010:module_put+0x1cb/0x230
      [  646.445238] Code: f3 66 94 02 e8 26 ff fa ff 85 c0 74 11 0f b6 1d 51 30 94 02 80 fb 01 77 60 83 e3 01 74 13 65 ff 0d 3a 83 db 73 e9 2b ff ff ff <0f> 0b e9 00 ff ff ff e8 59 01 fb ff 85 c0 75 e4 48 c7 c2 20 62 6b
      [  646.464997] RSP: 0018:ffff880354d37068 EFLAGS: 00010286
      [  646.470599] RAX: 0000000000000000 RBX: ffffffffc0a52518 RCX: ffffffff8c2668db
      [  646.478118] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffc0a52518
      [  646.485641] RBP: ffffffffc0a52180 R08: fffffbfff814a4a4 R09: fffffbfff814a4a3
      [  646.493164] R10: ffffffffc0a5251b R11: fffffbfff814a4a4 R12: 1ffff1006a9a6e0d
      [  646.500687] R13: 00000000ffffffff R14: ffff880362bab890 R15: dead000000000100
      [  646.508213] FS:  00007f4164c99800(0000) GS:ffff88036fe40000(0000) knlGS:0000000000000000
      [  646.516961] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  646.523080] CR2: 00007f41638b8420 CR3: 0000000351df0004 CR4: 00000000001606e0
      [  646.530595] Call Trace:
      [  646.533408]  ? find_symbol_in_section+0x260/0x260
      [  646.538509]  tcf_ife_cleanup+0x11b/0x200 [act_ife]
      [  646.543695]  tcf_action_cleanup+0x29/0xa0
      [  646.548078]  __tcf_action_put+0x5a/0xb0
      [  646.552289]  ? nla_put+0x65/0xe0
      [  646.555889]  __tcf_idr_release+0x48/0x60
      [  646.560187]  tcf_generic_walker+0x448/0x6b0
      [  646.564764]  ? tcf_action_dump_1+0x450/0x450
      [  646.569411]  ? __lock_is_held+0x84/0x110
      [  646.573720]  ? tcf_ife_walker+0x10c/0x20f [act_ife]
      [  646.578982]  tca_action_gd+0x972/0xc40
      [  646.583129]  ? tca_get_fill.constprop.17+0x250/0x250
      [  646.588471]  ? mark_lock+0xcf/0x980
      [  646.592324]  ? check_chain_key+0x140/0x1f0
      [  646.596832]  ? debug_show_all_locks+0x240/0x240
      [  646.601839]  ? memset+0x1f/0x40
      [  646.605350]  ? nla_parse+0xca/0x1a0
      [  646.609217]  tc_ctl_action+0x215/0x230
      [  646.613339]  ? tcf_action_add+0x220/0x220
      [  646.617748]  rtnetlink_rcv_msg+0x56a/0x6d0
      [  646.622227]  ? rtnl_fdb_del+0x3f0/0x3f0
      [  646.626466]  netlink_rcv_skb+0x18d/0x200
      [  646.630752]  ? rtnl_fdb_del+0x3f0/0x3f0
      [  646.634959]  ? netlink_ack+0x500/0x500
      [  646.639106]  netlink_unicast+0x2d0/0x370
      [  646.643409]  ? netlink_attachskb+0x340/0x340
      [  646.648050]  ? _copy_from_iter_full+0xe9/0x3e0
      [  646.652870]  ? import_iovec+0x11e/0x1c0
      [  646.657083]  netlink_sendmsg+0x3b9/0x6a0
      [  646.661388]  ? netlink_unicast+0x370/0x370
      [  646.665877]  ? netlink_unicast+0x370/0x370
      [  646.670351]  sock_sendmsg+0x6b/0x80
      [  646.674212]  ___sys_sendmsg+0x4a1/0x520
      [  646.678443]  ? copy_msghdr_from_user+0x210/0x210
      [  646.683463]  ? lock_downgrade+0x320/0x320
      [  646.687849]  ? debug_show_all_locks+0x240/0x240
      [  646.692760]  ? do_raw_spin_unlock+0xa2/0x130
      [  646.697418]  ? _raw_spin_unlock+0x24/0x30
      [  646.701798]  ? __handle_mm_fault+0x1819/0x1c10
      [  646.706619]  ? __pmd_alloc+0x320/0x320
      [  646.710738]  ? debug_show_all_locks+0x240/0x240
      [  646.715649]  ? restore_nameidata+0x7b/0xa0
      [  646.720117]  ? check_chain_key+0x140/0x1f0
      [  646.724590]  ? check_chain_key+0x140/0x1f0
      [  646.729070]  ? __fget_light+0xbc/0xd0
      [  646.733121]  ? __sys_sendmsg+0xd7/0x150
      [  646.737329]  __sys_sendmsg+0xd7/0x150
      [  646.741359]  ? __ia32_sys_shutdown+0x30/0x30
      [  646.746003]  ? up_read+0x53/0x90
      [  646.749601]  ? __do_page_fault+0x484/0x780
      [  646.754105]  ? do_syscall_64+0x1e/0x2c0
      [  646.758320]  do_syscall_64+0x72/0x2c0
      [  646.762353]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  646.767776] RIP: 0033:0x7f4163872150
      [  646.771713] Code: 8b 15 3c 7d 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 83 3d b9 d5 2b 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be cd 00 00 48 89 04 24
      [  646.791474] RSP: 002b:00007ffdef7d6b58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  646.799721] RAX: ffffffffffffffda RBX: 0000000000000024 RCX: 00007f4163872150
      [  646.807240] RDX: 0000000000000000 RSI: 00007ffdef7d6bd0 RDI: 0000000000000003
      [  646.814760] RBP: 000000005b8b9482 R08: 0000000000000001 R09: 0000000000000000
      [  646.822286] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007ffdef7dad20
      [  646.829807] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000679bc0
      [  646.837360] irq event stamp: 6083
      [  646.841043] hardirqs last  enabled at (6081): [<ffffffff8c220a7d>] __call_rcu+0x17d/0x500
      [  646.849882] hardirqs last disabled at (6083): [<ffffffff8c004f06>] trace_hardirqs_off_thunk+0x1a/0x1c
      [  646.859775] softirqs last  enabled at (5968): [<ffffffff8d4004a1>] __do_softirq+0x4a1/0x6ee
      [  646.868784] softirqs last disabled at (6082): [<ffffffffc0a78759>] tcf_ife_cleanup+0x39/0x200 [act_ife]
      [  646.878845] ---[ end trace b1b8c12ffe51e657 ]---
      
      Fixes: 5ffe57da ("act_ife: fix a potential deadlock")
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84cb8eb2
    • Gustavo A. R. Silva's avatar
      net: usbnet: mark expected switch fall-through · 2fc4aa59
      Gustavo A. R. Silva authored
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      Addresses-Coverity-ID: 1077614 ("Missing break in switch")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fc4aa59