1. 07 Sep, 2018 3 commits
    • Jesper Dangaard Brouer's avatar
      xdp: split code for map vs non-map redirect · 47b123ed
      Jesper Dangaard Brouer authored
      The compiler does an efficient job of inlining static C functions.
      Perf top clearly shows that almost everything gets inlined into the
      function call xdp_do_redirect.
      
      The function xdp_do_redirect end-up containing and interleaving the
      map and non-map redirect code.  This is sub-optimal, as it would be
      strange for an XDP program to use both types of redirect in the same
      program. The two use-cases are separate, and interleaving the code
      just cause more instruction-cache pressure.
      
      I would like to stress (again) that the non-map variant bpf_redirect
      is very slow compared to the bpf_redirect_map variant, approx half the
      speed.  Measured with driver i40e the difference is:
      
      - map     redirect: 13,250,350 pps
      - non-map redirect:  7,491,425 pps
      
      For this reason, the function name of the non-map variant of redirect
      have been called xdp_do_redirect_slow.  This hopefully gives a hint
      when using perf, that this is not the optimal XDP redirect operating mode.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      47b123ed
    • Jesper Dangaard Brouer's avatar
      xdp: explicit inline __xdp_map_lookup_elem · 2a68d85f
      Jesper Dangaard Brouer authored
      The compiler chooses to not-inline the function __xdp_map_lookup_elem,
      because it can see that it is used by both Generic-XDP and native-XDP
      do redirect calls (xdp_do_generic_redirect_map and xdp_do_redirect_map).
      
      The compiler cannot know that this is a bad choice, as it cannot know
      that a net device cannot run both XDP modes (Generic or Native) at the
      same time.  Thus, mark this function inline, even-though we normally
      leave this up-to the compiler.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2a68d85f
    • Jesper Dangaard Brouer's avatar
      xdp: unlikely instrumentation for xdp map redirect · e1302542
      Jesper Dangaard Brouer authored
      Notice the compiler generated ASM code layout was suboptimal.  It
      assumed map enqueue errors as the likely case, which is shouldn't.
      It assumed that xdp_do_flush_map() was a likely case, due to maps
      changing between packets, which should be very unlikely.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e1302542
  2. 06 Sep, 2018 4 commits
    • Alexei Starovoitov's avatar
      bpf/verifier: fix verifier instability · a9c676bc
      Alexei Starovoitov authored
      Edward Cree says:
      In check_mem_access(), for the PTR_TO_CTX case, after check_ctx_access()
      has supplied a reg_type, the other members of the register state are set
      appropriately.  Previously reg.range was set to 0, but as it is in a
      union with reg.map_ptr, which is larger, upper bytes of the latter were
      left in place.  This then caused the memcmp() in regsafe() to fail,
      preventing some branches from being pruned (and occasionally causing the
      same program to take a varying number of processed insns on repeated
      verifier runs).
      
      Fix the instability by clearing bpf_reg_state in __mark_reg_[un]known()
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Debugged-by: default avatarEdward Cree <ecree@solarflare.com>
      Acked-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a9c676bc
    • Taeung Song's avatar
      libbpf: Remove the duplicate checking of function storage · 69495d2a
      Taeung Song authored
      After the commit eac7d845 ("tools: libbpf: don't return '.text'
      as a program for multi-function programs"), bpf_program__next()
      in bpf_object__for_each_program skips the function storage such as .text,
      so eliminate the duplicate checking.
      
      Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarTaeung Song <treeze.taeung@gmail.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      69495d2a
    • Dmitry Safonov's avatar
      netlink: Make groups check less stupid in netlink_bind() · 428f944b
      Dmitry Safonov authored
      As Linus noted, the test for 0 is needless, groups type can follow the
      usual kernel style and 8*sizeof(unsigned long) is BITS_PER_LONG:
      
      > The code [..] isn't technically incorrect...
      > But it is stupid.
      > Why stupid? Because the test for 0 is pointless.
      >
      > Just doing
      >        if (nlk->ngroups < 8*sizeof(groups))
      >                groups &= (1UL << nlk->ngroups) - 1;
      >
      > would have been fine and more understandable, since the "mask by shift
      > count" already does the right thing for a ngroups value of 0. Now that
      > test for zero makes me go "what's special about zero?". It turns out
      > that the answer to that is "nothing".
      [..]
      > The type of "groups" is kind of silly too.
      >
      > Yeah, "long unsigned int" isn't _technically_ wrong. But we normally
      > call that type "unsigned long".
      
      Cleanup my piece of pointlessness.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: netdev@vger.kernel.org
      Fairly-blamed-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      428f944b
    • Vincent Whitchurch's avatar
      packet: add sockopt to ignore outgoing packets · fa788d98
      Vincent Whitchurch authored
      Currently, the only way to ignore outgoing packets on a packet socket is
      via the BPF filter.  With MSG_ZEROCOPY, packets that are looped into
      AF_PACKET are copied in dev_queue_xmit_nit(), and this copy happens even
      if the filter run from packet_rcv() would reject them.  So the presence
      of a packet socket on the interface takes away the benefits of
      MSG_ZEROCOPY, even if the packet socket is not interested in outgoing
      packets.  (Even when MSG_ZEROCOPY is not used, the skb is unnecessarily
      cloned, but the cost for that is much lower.)
      
      Add a socket option to allow AF_PACKET sockets to ignore outgoing
      packets to solve this.  Note that the *BSDs already have something
      similar: BIOCSSEESENT/BIOCSDIRECTION and BIOCSDIRFILT.
      
      The first intended user is lldpd.
      Signed-off-by: default avatarVincent Whitchurch <vincent.whitchurch@axis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa788d98
  3. 05 Sep, 2018 27 commits
  4. 04 Sep, 2018 6 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 28619527
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Must perform TXQ teardown before unregistering interfaces in
          mac80211, from Toke Høiland-Jørgensen.
      
       2) Don't allow creating mac80211_hwsim with less than one channel, from
          Johannes Berg.
      
       3) Division by zero in cfg80211, fix from Johannes Berg.
      
       4) Fix endian issue in tipc, from Haiqing Bai.
      
       5) BPF sockmap use-after-free fixes from Daniel Borkmann.
      
       6) Spectre-v1 in mac80211_hwsim, from Jinbum Park.
      
       7) Missing rhashtable_walk_exit() in tipc, from Cong Wang.
      
       8) Revert kvzalloc() conversion of AF_PACKET, it breaks mmap() when
          kvzalloc() tries to use kmalloc() pages. From Eric Dumazet.
      
       9) Fix deadlock in hv_netvsc, from Dexuan Cui.
      
      10) Do not restart timewait timer on RST, from Florian Westphal.
      
      11) Fix double lwstate refcount grab in ipv6, from Alexey Kodanev.
      
      12) Unsolicit report count handling is off-by-one, fix from Hangbin Liu.
      
      13) Sleep-in-atomic in cadence driver, from Jia-Ju Bai.
      
      14) Respect ttl-inherit in ip6 tunnel driver, from Hangbin Liu.
      
      15) Use-after-free in act_ife, fix from Cong Wang.
      
      16) Missing hold to meta module in act_ife, from Vlad Buslov.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits)
        net: phy: sfp: Handle unimplemented hwmon limits and alarms
        net: sched: action_ife: take reference to meta module
        act_ife: fix a potential use-after-free
        net/mlx5: Fix SQ offset in QPs with small RQ
        tipc: correct spelling errors for tipc_topsrv_queue_evt() comments
        tipc: correct spelling errors for struct tipc_bc_base's comment
        bnxt_en: Do not adjust max_cp_rings by the ones used by RDMA.
        bnxt_en: Clean up unused functions.
        bnxt_en: Fix firmware signaled resource change logic in open.
        sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabel
        sctp: fix invalid reference to the index variable of the iterator
        net/ibm/emac: wrong emac_calc_base call was used by typo
        net: sched: null actions array pointer before releasing action
        vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition
        r8169: add support for NCube 8168 network card
        ip6_tunnel: respect ttl inherit for ip6tnl
        mac80211: shorten the IBSS debug messages
        mac80211: don't Tx a deauth frame if the AP forbade Tx
        mac80211: Fix station bandwidth setting after channel switch
        mac80211: fix a race between restart and CSA flows
        ...
      28619527
    • Andrew Lunn's avatar
      net: phy: sfp: Handle unimplemented hwmon limits and alarms · a33710bd
      Andrew Lunn authored
      Not all SFPs implement the registers containing sensor limits and
      alarms. Luckily, there is a bit indicating if they are implemented or
      not. Add checking for this bit, when deciding if the hwmon attributes
      should be visible.
      
      Fixes: 1323061a ("net: phy: sfp: Add HWMON support for module sensors")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a33710bd
    • Vlad Buslov's avatar
      net: sched: action_ife: take reference to meta module · 84cb8eb2
      Vlad Buslov authored
      Recent refactoring of add_metainfo() caused use_all_metadata() to add
      metainfo to ife action metalist without taking reference to module. This
      causes warning in module_put called from ife action cleanup function.
      
      Implement add_metainfo_and_get_ops() function that returns with reference
      to module taken if metainfo was added successfully, and call it from
      use_all_metadata(), instead of calling __add_metainfo() directly.
      
      Example warning:
      
      [  646.344393] WARNING: CPU: 1 PID: 2278 at kernel/module.c:1139 module_put+0x1cb/0x230
      [  646.352437] Modules linked in: act_meta_skbtcindex act_meta_mark act_meta_skbprio act_ife ife veth nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c tun ebtable_filter ebtables ip6table_filter ip6_tables bridge stp llc mlx5_ib ib_uverbs ib_core intel_rapl sb_edac x86_pkg_temp_thermal mlx5_core coretemp kvm_intel kvm nfsd igb irqbypass crct10dif_pclmul devlink crc32_pclmul mei_me joydev ses crc32c_intel enclosure auth_rpcgss i2c_algo_bit ioatdma ptp mei pps_core ghash_clmulni_intel iTCO_wdt iTCO_vendor_support pcspkr dca ipmi_ssif lpc_ich target_core_mod i2c_i801 ipmi_si ipmi_devintf pcc_cpufreq wmi ipmi_msghandler nfs_acl lockd acpi_pad acpi_power_meter grace sunrpc mpt3sas raid_class scsi_transport_sas
      [  646.425631] CPU: 1 PID: 2278 Comm: tc Not tainted 4.19.0-rc1+ #799
      [  646.432187] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [  646.440595] RIP: 0010:module_put+0x1cb/0x230
      [  646.445238] Code: f3 66 94 02 e8 26 ff fa ff 85 c0 74 11 0f b6 1d 51 30 94 02 80 fb 01 77 60 83 e3 01 74 13 65 ff 0d 3a 83 db 73 e9 2b ff ff ff <0f> 0b e9 00 ff ff ff e8 59 01 fb ff 85 c0 75 e4 48 c7 c2 20 62 6b
      [  646.464997] RSP: 0018:ffff880354d37068 EFLAGS: 00010286
      [  646.470599] RAX: 0000000000000000 RBX: ffffffffc0a52518 RCX: ffffffff8c2668db
      [  646.478118] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffc0a52518
      [  646.485641] RBP: ffffffffc0a52180 R08: fffffbfff814a4a4 R09: fffffbfff814a4a3
      [  646.493164] R10: ffffffffc0a5251b R11: fffffbfff814a4a4 R12: 1ffff1006a9a6e0d
      [  646.500687] R13: 00000000ffffffff R14: ffff880362bab890 R15: dead000000000100
      [  646.508213] FS:  00007f4164c99800(0000) GS:ffff88036fe40000(0000) knlGS:0000000000000000
      [  646.516961] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  646.523080] CR2: 00007f41638b8420 CR3: 0000000351df0004 CR4: 00000000001606e0
      [  646.530595] Call Trace:
      [  646.533408]  ? find_symbol_in_section+0x260/0x260
      [  646.538509]  tcf_ife_cleanup+0x11b/0x200 [act_ife]
      [  646.543695]  tcf_action_cleanup+0x29/0xa0
      [  646.548078]  __tcf_action_put+0x5a/0xb0
      [  646.552289]  ? nla_put+0x65/0xe0
      [  646.555889]  __tcf_idr_release+0x48/0x60
      [  646.560187]  tcf_generic_walker+0x448/0x6b0
      [  646.564764]  ? tcf_action_dump_1+0x450/0x450
      [  646.569411]  ? __lock_is_held+0x84/0x110
      [  646.573720]  ? tcf_ife_walker+0x10c/0x20f [act_ife]
      [  646.578982]  tca_action_gd+0x972/0xc40
      [  646.583129]  ? tca_get_fill.constprop.17+0x250/0x250
      [  646.588471]  ? mark_lock+0xcf/0x980
      [  646.592324]  ? check_chain_key+0x140/0x1f0
      [  646.596832]  ? debug_show_all_locks+0x240/0x240
      [  646.601839]  ? memset+0x1f/0x40
      [  646.605350]  ? nla_parse+0xca/0x1a0
      [  646.609217]  tc_ctl_action+0x215/0x230
      [  646.613339]  ? tcf_action_add+0x220/0x220
      [  646.617748]  rtnetlink_rcv_msg+0x56a/0x6d0
      [  646.622227]  ? rtnl_fdb_del+0x3f0/0x3f0
      [  646.626466]  netlink_rcv_skb+0x18d/0x200
      [  646.630752]  ? rtnl_fdb_del+0x3f0/0x3f0
      [  646.634959]  ? netlink_ack+0x500/0x500
      [  646.639106]  netlink_unicast+0x2d0/0x370
      [  646.643409]  ? netlink_attachskb+0x340/0x340
      [  646.648050]  ? _copy_from_iter_full+0xe9/0x3e0
      [  646.652870]  ? import_iovec+0x11e/0x1c0
      [  646.657083]  netlink_sendmsg+0x3b9/0x6a0
      [  646.661388]  ? netlink_unicast+0x370/0x370
      [  646.665877]  ? netlink_unicast+0x370/0x370
      [  646.670351]  sock_sendmsg+0x6b/0x80
      [  646.674212]  ___sys_sendmsg+0x4a1/0x520
      [  646.678443]  ? copy_msghdr_from_user+0x210/0x210
      [  646.683463]  ? lock_downgrade+0x320/0x320
      [  646.687849]  ? debug_show_all_locks+0x240/0x240
      [  646.692760]  ? do_raw_spin_unlock+0xa2/0x130
      [  646.697418]  ? _raw_spin_unlock+0x24/0x30
      [  646.701798]  ? __handle_mm_fault+0x1819/0x1c10
      [  646.706619]  ? __pmd_alloc+0x320/0x320
      [  646.710738]  ? debug_show_all_locks+0x240/0x240
      [  646.715649]  ? restore_nameidata+0x7b/0xa0
      [  646.720117]  ? check_chain_key+0x140/0x1f0
      [  646.724590]  ? check_chain_key+0x140/0x1f0
      [  646.729070]  ? __fget_light+0xbc/0xd0
      [  646.733121]  ? __sys_sendmsg+0xd7/0x150
      [  646.737329]  __sys_sendmsg+0xd7/0x150
      [  646.741359]  ? __ia32_sys_shutdown+0x30/0x30
      [  646.746003]  ? up_read+0x53/0x90
      [  646.749601]  ? __do_page_fault+0x484/0x780
      [  646.754105]  ? do_syscall_64+0x1e/0x2c0
      [  646.758320]  do_syscall_64+0x72/0x2c0
      [  646.762353]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  646.767776] RIP: 0033:0x7f4163872150
      [  646.771713] Code: 8b 15 3c 7d 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 83 3d b9 d5 2b 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be cd 00 00 48 89 04 24
      [  646.791474] RSP: 002b:00007ffdef7d6b58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  646.799721] RAX: ffffffffffffffda RBX: 0000000000000024 RCX: 00007f4163872150
      [  646.807240] RDX: 0000000000000000 RSI: 00007ffdef7d6bd0 RDI: 0000000000000003
      [  646.814760] RBP: 000000005b8b9482 R08: 0000000000000001 R09: 0000000000000000
      [  646.822286] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007ffdef7dad20
      [  646.829807] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000679bc0
      [  646.837360] irq event stamp: 6083
      [  646.841043] hardirqs last  enabled at (6081): [<ffffffff8c220a7d>] __call_rcu+0x17d/0x500
      [  646.849882] hardirqs last disabled at (6083): [<ffffffff8c004f06>] trace_hardirqs_off_thunk+0x1a/0x1c
      [  646.859775] softirqs last  enabled at (5968): [<ffffffff8d4004a1>] __do_softirq+0x4a1/0x6ee
      [  646.868784] softirqs last disabled at (6082): [<ffffffffc0a78759>] tcf_ife_cleanup+0x39/0x200 [act_ife]
      [  646.878845] ---[ end trace b1b8c12ffe51e657 ]---
      
      Fixes: 5ffe57da ("act_ife: fix a potential deadlock")
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84cb8eb2
    • Gustavo A. R. Silva's avatar
      net: usbnet: mark expected switch fall-through · 2fc4aa59
      Gustavo A. R. Silva authored
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      Addresses-Coverity-ID: 1077614 ("Missing break in switch")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fc4aa59
    • Cong Wang's avatar
      act_ife: fix a potential use-after-free · 6d784f16
      Cong Wang authored
      Immediately after module_put(), user could delete this
      module, so e->ops could be already freed before we call
      e->ops->release().
      
      Fix this by moving module_put() after ops->release().
      
      Fixes: ef6980b6 ("introduce IFE action")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d784f16
    • Tariq Toukan's avatar
      net/mlx5: Fix SQ offset in QPs with small RQ · 639505d4
      Tariq Toukan authored
      Correct the formula for calculating the RQ page remainder,
      which should be in byte granularity.  The result will be
      non-zero only for RQs smaller than PAGE_SIZE, as an RQ size
      is a power of 2.
      
      Divide this by the SQ stride (MLX5_SEND_WQE_BB) to get the
      SQ offset in strides granularity.
      
      Fixes: d7037ad7 ("net/mlx5: Fix QP fragmented buffer allocation")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      639505d4