1. 03 Apr, 2019 8 commits
    • Alexei Starovoitov's avatar
      libbpf: teach libbpf about log_level bit 2 · da11b417
      Alexei Starovoitov authored
      Allow bpf_prog_load_xattr() to specify log_level for program loading.
      
      Teach libbpf to accept log_level with bit 2 set.
      
      Increase default BPF_LOG_BUF_SIZE from 256k to 16M.
      There is no downside to increase it to a maximum allowed by old kernels.
      Existing 256k limit caused ENOSPC errors and users were not able to see
      verifier error which is printed at the end of the verifier log.
      
      If ENOSPC is hit, double the verifier log and try again to capture
      the verifier error.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      da11b417
    • Alexei Starovoitov's avatar
      bpf: increase verifier log limit · 7a9f5c65
      Alexei Starovoitov authored
      The existing 16Mbyte verifier log limit is not enough for log_level=2
      even for small programs. Increase it to 1Gbyte.
      Note it's not a kernel memory limit.
      It's an amount of memory user space provides to store
      the verifier log. The kernel populates it 1k at a time.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      7a9f5c65
    • Alexei Starovoitov's avatar
      bpf: increase complexity limit and maximum program size · c04c0d2b
      Alexei Starovoitov authored
      Large verifier speed improvements allow to increase
      verifier complexity limit.
      Now regardless of the program composition and its size it takes
      little time for the verifier to hit insn_processed limit.
      On typical x86 machine non-debug kernel processes 1M instructions
      in 1/10 of a second.
      (before these speed improvements specially crafted programs
      could be hitting multi-second verification times)
      Full kasan kernel with debug takes ~1 second for the same 1M insns.
      Hence bump the BPF_COMPLEXITY_LIMIT_INSNS limit to 1M.
      Also increase the number of instructions per program
      from 4k to internal BPF_COMPLEXITY_LIMIT_INSNS limit.
      4k limit was confusing to users, since small programs with hundreds
      of insns could be hitting BPF_COMPLEXITY_LIMIT_INSNS limit.
      Sometimes adding more insns and bpf_trace_printk debug statements
      would make the verifier accept the program while removing
      code would make the verifier reject it.
      Some user space application started to add #define MAX_FOO to
      their programs and do:
        MAX_FOO=100;
      again:
        compile with MAX_FOO;
        try to load;
        if (fails_to_load) { reduce MAX_FOO; goto again; }
      to be able to fit maximum amount of processing into single program.
      Other users artificially split their single program into a set of programs
      and use all 32 iterations of tail_calls to increase compute limits.
      And the most advanced folks used unlimited tc-bpf filter list
      to execute many bpf programs.
      Essentially the users managed to workaround 4k insn limit.
      This patch removes the limit for root programs from uapi.
      BPF_COMPLEXITY_LIMIT_INSNS is the kernel internal limit
      and success to load the program no longer depends on program size,
      but on 'smartness' of the verifier only.
      The verifier will continue to get smarter with every kernel release.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c04c0d2b
    • Alexei Starovoitov's avatar
      bpf: verbose jump offset overflow check · 4f73379e
      Alexei Starovoitov authored
      Larger programs may trigger 16-bit jump offset overflow check
      during instruction patching. Make this error verbose otherwise
      users cannot decipher error code without printks in the verifier.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      4f73379e
    • Alexei Starovoitov's avatar
      bpf: convert temp arrays to kvcalloc · 71dde681
      Alexei Starovoitov authored
      Temporary arrays used during program verification need to be vmalloc-ed
      to support large bpf programs.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      71dde681
    • Alexei Starovoitov's avatar
      bpf: improve verification speed by not remarking live_read · 25af32da
      Alexei Starovoitov authored
      With large verifier speed improvement brought by the previous patch
      mark_reg_read() becomes the hottest function during verification.
      On a typical program it consumes 40% of cpu.
      mark_reg_read() walks parentage chain of registers to mark parents as LIVE_READ.
      Once the register is marked there is no need to remark it again in the future.
      Hence stop walking the chain once first LIVE_READ is seen.
      This optimization drops mark_reg_read() time from 40% of cpu to <1%
      and overall 2x improvement of verification speed.
      For some programs the longest_mark_read_walk counter improves from ~500 to ~5
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      25af32da
    • Alexei Starovoitov's avatar
      bpf: improve verification speed by droping states · 9f4686c4
      Alexei Starovoitov authored
      Branch instructions, branch targets and calls in a bpf program are
      the places where the verifier remembers states that led to successful
      verification of the program.
      These states are used to prune brute force program analysis.
      For unprivileged programs there is a limit of 64 states per such
      'branching' instructions (maximum length is tracked by max_states_per_insn
      counter introduced in the previous patch).
      Simply reducing this threshold to 32 or lower increases insn_processed
      metric to the point that small valid programs get rejected.
      For root programs there is no limit and cilium programs can have
      max_states_per_insn to be 100 or higher.
      Walking 100+ states multiplied by number of 'branching' insns during
      verification consumes significant amount of cpu time.
      Turned out simple LRU-like mechanism can be used to remove states
      that unlikely will be helpful in future search pruning.
      This patch introduces hit_cnt and miss_cnt counters:
      hit_cnt - this many times this state successfully pruned the search
      miss_cnt - this many times this state was not equivalent to other states
      (and that other states were added to state list)
      
      The heuristic introduced in this patch is:
      if (sl->miss_cnt > sl->hit_cnt * 3 + 3)
        /* drop this state from future considerations */
      
      Higher numbers increase max_states_per_insn (allow more states to be
      considered for pruning) and slow verification speed, but do not meaningfully
      reduce insn_processed metric.
      Lower numbers drop too many states and insn_processed increases too much.
      Many different formulas were considered.
      This one is simple and works well enough in practice.
      (the analysis was done on selftests/progs/* and on cilium programs)
      
      The end result is this heuristic improves verification speed by 10 times.
      Large synthetic programs that used to take a second more now take
      1/10 of a second.
      In cases where max_states_per_insn used to be 100 or more, now it's ~10.
      
      There is a slight increase in insn_processed for cilium progs:
                             before   after
      bpf_lb-DLB_L3.o 	1831	1838
      bpf_lb-DLB_L4.o 	3029	3218
      bpf_lb-DUNKNOWN.o 	1064	1064
      bpf_lxc-DDROP_ALL.o	26309	26935
      bpf_lxc-DUNKNOWN.o	33517	34439
      bpf_netdev.o		9713	9721
      bpf_overlay.o		6184	6184
      bpf_lcx_jit.o		37335	39389
      And 2-3 times improvement in the verification speed.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      9f4686c4
    • Alexei Starovoitov's avatar
      bpf: add verifier stats and log_level bit 2 · 06ee7115
      Alexei Starovoitov authored
      In order to understand the verifier bottlenecks add various stats
      and extend log_level:
      log_level 1 and 2 are kept as-is:
      bit 0 - level=1 - print every insn and verifier state at branch points
      bit 1 - level=2 - print every insn and verifier state at every insn
      bit 2 - level=4 - print verifier error and stats at the end of verification
      
      When verifier rejects the program the libbpf is trying to load the program twice.
      Once with log_level=0 (no messages, only error code is reported to user space)
      and second time with log_level=1 to tell the user why the verifier rejected it.
      
      With introduction of bit 2 - level=4 the libbpf can choose to always use that
      level and load programs once, since the verification speed is not affected and
      in case of error the verbose message will be available.
      
      Note that the verifier stats are not part of uapi just like all other
      verbose messages. They're expected to change in the future.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      06ee7115
  2. 02 Apr, 2019 6 commits
  3. 01 Apr, 2019 1 commit
    • Yonghong Song's avatar
      bpf: add bpffs multi-dimensional array tests in test_btf · 9de2640b
      Yonghong Song authored
      For multiple dimensional arrays like below,
        int a[2][3]
      both llvm and pahole generated one BTF_KIND_ARRAY type like
        . element_type: int
        . index_type: unsigned int
        . number of elements: 6
      
      Such a collapsed BTF_KIND_ARRAY type will cause the divergence
      in BTF vs. the user code. In the compile-once-run-everywhere
      project, the header file is generated from BTF and used for bpf
      program, and the definition in the header file will be different
      from what user expects.
      
      But the kernel actually supports chained multi-dimensional array
      types properly. The above "int a[2][3]" can be represented as
        Type #n:
          . element_type: int
          . index_type: unsigned int
          . number of elements: 3
        Type #(n+1):
          . element_type: type #n
          . index_type: unsigned int
          . number of elements: 2
      
      The following llvm commit
        https://reviews.llvm.org/rL357215
      also enables llvm to generated proper chained multi-dimensional arrays.
      
      The test_btf already has a raw test ("struct test #1") for chained
      multi-dimensional arrays. This patch added amended bpffs test for
      chained multi-dimensional arrays.
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      9de2640b
  4. 29 Mar, 2019 3 commits
    • Alexei Starovoitov's avatar
      Merge branch 'variable-stack-access' · c3969de8
      Alexei Starovoitov authored
      Andrey Ignatov says:
      
      ====================
      The patch set adds support for stack access with variable offset from helpers.
      
      Patch 1 is the main patch in the set and provides more details.
      Patch 2 adds selftests for new functionality.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c3969de8
    • Andrey Ignatov's avatar
      selftests/bpf: Test variable offset stack access · 8ff80e96
      Andrey Ignatov authored
      Test different scenarios of indirect variable-offset stack access: out of
      bound access (>0), min_off below initialized part of the stack,
      max_off+size above initialized part of the stack, initialized stack.
      
      Example of output:
        ...
        #856/p indirect variable-offset stack access, out of bound OK
        #857/p indirect variable-offset stack access, max_off+size > max_initialized OK
        #858/p indirect variable-offset stack access, min_off < min_initialized OK
        #859/p indirect variable-offset stack access, ok OK
        ...
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8ff80e96
    • Andrey Ignatov's avatar
      bpf: Support variable offset stack access from helpers · 2011fccf
      Andrey Ignatov authored
      Currently there is a difference in how verifier checks memory access for
      helper arguments for PTR_TO_MAP_VALUE and PTR_TO_STACK with regard to
      variable part of offset.
      
      check_map_access, that is used for PTR_TO_MAP_VALUE, can handle variable
      offsets just fine, so that BPF program can call a helper like this:
      
        some_helper(map_value_ptr + off, size);
      
      , where offset is unknown at load time, but is checked by program to be
      in a safe rage (off >= 0 && off + size < map_value_size).
      
      But it's not the case for check_stack_boundary, that is used for
      PTR_TO_STACK, and same code with pointer to stack is rejected by
      verifier:
      
        some_helper(stack_value_ptr + off, size);
      
      For example:
        0: (7a) *(u64 *)(r10 -16) = 0
        1: (7a) *(u64 *)(r10 -8) = 0
        2: (61) r2 = *(u32 *)(r1 +0)
        3: (57) r2 &= 4
        4: (17) r2 -= 16
        5: (0f) r2 += r10
        6: (18) r1 = 0xffff888111343a80
        8: (85) call bpf_map_lookup_elem#1
        invalid variable stack read R2 var_off=(0xfffffffffffffff0; 0x4)
      
      Add support for variable offset access to check_stack_boundary so that
      if offset is checked by program to be in a safe range it's accepted by
      verifier.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2011fccf
  5. 28 Mar, 2019 2 commits
  6. 27 Mar, 2019 20 commits
    • Eric Dumazet's avatar
      inet: switch IP ID generator to siphash · df453700
      Eric Dumazet authored
      According to Amit Klein and Benny Pinkas, IP ID generation is too weak
      and might be used by attackers.
      
      Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix())
      having 64bit key and Jenkins hash is risky.
      
      It is time to switch to siphash and its 128bit keys.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAmit Klein <aksecurity@gmail.com>
      Reported-by: default avatarBenny Pinkas <benny@pinkas.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df453700
    • Colin Ian King's avatar
      net: phy: mdio-bcm-unimac: remove redundant !timeout check · 180a8c3d
      Colin Ian King authored
      The check for zero timeout is always true at the end of the proceeding
      while loop; the only other exit path in the loop is if the unimac MDIO
      is not busy.  Remove the redundant zero timeout check and always
      return -ETIMEDOUT on this timeout return path.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      180a8c3d
    • Eric Dumazet's avatar
      tcp: fix zerocopy and notsent_lowat issues · 4f661542
      Eric Dumazet authored
      My recent patch had at least three problems :
      
      1) TX zerocopy wants notification when skb is acknowledged,
         thus we need to call skb_zcopy_clear() if the skb is
         cached into sk->sk_tx_skb_cache
      
      2) Some applications might expect precise EPOLLOUT
         notifications, so we need to update sk->sk_wmem_queued
         and call sk_mem_uncharge() from sk_wmem_free_skb()
         in all cases. The SOCK_QUEUE_SHRUNK flag must also be set.
      
      3) Reuse of saved skb should have used skb_cloned() instead
        of simply checking if the fast clone has been freed.
      
      Fixes: 472c2e07 ("tcp: add one skb cache for tx")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: default avatarHolger Hoffstätte <holger@applied-asynchrony.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f661542
    • Numan Siddique's avatar
      net: openvswitch: Add a new action check_pkt_len · 4d5ec89f
      Numan Siddique authored
      This patch adds a new action - 'check_pkt_len' which checks the
      packet length and executes a set of actions if the packet
      length is greater than the specified length or executes
      another set of actions if the packet length is lesser or equal to.
      
      This action takes below nlattrs
        * OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for
      
        * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER - Nested actions
          to apply if the packet length is greater than the specified 'pkt_len'
      
        * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL - Nested
          actions to apply if the packet length is lesser or equal to the
          specified 'pkt_len'.
      
      The main use case for adding this action is to solve the packet
      drops because of MTU mismatch in OVN virtual networking solution.
      When a VM (which belongs to a logical switch of OVN) sends a packet
      destined to go via the gateway router and if the nic which provides
      external connectivity, has a lesser MTU, OVS drops the packet
      if the packet length is greater than this MTU.
      
      With the help of this action, OVN will check the packet length
      and if it is greater than the MTU size, it will generate an
      ICMP packet (type 3, code 4) and includes the next hop mtu in it
      so that the sender can fragment the packets.
      
      Reported-at:
      https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.htmlSuggested-by: default avatarBen Pfaff <blp@ovn.org>
      Signed-off-by: default avatarNuman Siddique <nusiddiq@redhat.com>
      CC: Gregory Rose <gvrose8192@gmail.com>
      CC: Pravin B Shelar <pshelar@ovn.org>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Tested-by: default avatarGreg Rose <gvrose8192@gmail.com>
      Reviewed-by: default avatarGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d5ec89f
    • David S. Miller's avatar
      Merge branch 'ethtool-add-support-for-Fast-Link-Down-as-new-PHY-tunable' · d7aa0338
      David S. Miller authored
      Heiner Kallweit says:
      
      ====================
      ethtool: add support for Fast Link Down as new PHY tunable
      
      This adds support for Fast Link Down as new PHY tunable.
      Fast Link Down reduces the time until a link down event is reported
      for 1000BaseT. According to the standard it's 750ms what is too long
      for several use cases.
      
      This is the kernel-related series, the ethtool userspace extension
      I'd submit once the kernel part has been applied.
      
      v2:
      - add describing comment in patch 1
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7aa0338
    • Heiner Kallweit's avatar
      net: phy: marvell: add PHY tunable fast link down support for 88E1540 · 69f42be8
      Heiner Kallweit authored
      1000BaseT standard requires that a link is reported as down earliest
      after 750ms. Several use case however require a much faster detecion
      of a broken link. Fast Link Down supports this by intentionally
      violating a the standard. This patch exposes the Fast Link Down
      feature of 88E1540 and 88E6390. These PHY's can be found as internal
      PHY's in several switches: 88E6352, 88E6240, 88E6176, 88E6172,
      and 88E6390(X). Fast Link Down and EEE are mutually exclusive.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69f42be8
    • Heiner Kallweit's avatar
      ethtool: add PHY Fast Link Down support · 3aeb0803
      Heiner Kallweit authored
      This adds support for Fast Link Down as new PHY tunable.
      Fast Link Down reduces the time until a link down event is reported
      for 1000BaseT. According to the standard it's 750ms what is too long
      for several use cases.
      
      v2:
      - add comment describing the constants
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3aeb0803
    • Bart Van Assche's avatar
      net/core: Allow the compiler to verify declaration and definition consistency · 7b7ed885
      Bart Van Assche authored
      Instead of declaring a function in a .c file, declare it in a header
      file and include that header file from the source files that define
      and that use the function. That allows the compiler to verify
      consistency of declaration and definition. See also commit
      52267790 ("sock: add MSG_ZEROCOPY") # v4.14.
      
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b7ed885
    • Bart Van Assche's avatar
      net/core: Fix rtnetlink kernel-doc headers · a986967e
      Bart Van Assche authored
      This patch avoids that the following warnings are reported when building
      with W=1:
      
      net/core/rtnetlink.c:3580: warning: Function parameter or member 'ndm' not described in 'ndo_dflt_fdb_add'
      net/core/rtnetlink.c:3580: warning: Function parameter or member 'tb' not described in 'ndo_dflt_fdb_add'
      net/core/rtnetlink.c:3580: warning: Function parameter or member 'dev' not described in 'ndo_dflt_fdb_add'
      net/core/rtnetlink.c:3580: warning: Function parameter or member 'addr' not described in 'ndo_dflt_fdb_add'
      net/core/rtnetlink.c:3580: warning: Function parameter or member 'vid' not described in 'ndo_dflt_fdb_add'
      net/core/rtnetlink.c:3580: warning: Function parameter or member 'flags' not described in 'ndo_dflt_fdb_add'
      net/core/rtnetlink.c:3718: warning: Function parameter or member 'ndm' not described in 'ndo_dflt_fdb_del'
      net/core/rtnetlink.c:3718: warning: Function parameter or member 'tb' not described in 'ndo_dflt_fdb_del'
      net/core/rtnetlink.c:3718: warning: Function parameter or member 'dev' not described in 'ndo_dflt_fdb_del'
      net/core/rtnetlink.c:3718: warning: Function parameter or member 'addr' not described in 'ndo_dflt_fdb_del'
      net/core/rtnetlink.c:3718: warning: Function parameter or member 'vid' not described in 'ndo_dflt_fdb_del'
      net/core/rtnetlink.c:3861: warning: Function parameter or member 'skb' not described in 'ndo_dflt_fdb_dump'
      net/core/rtnetlink.c:3861: warning: Function parameter or member 'cb' not described in 'ndo_dflt_fdb_dump'
      net/core/rtnetlink.c:3861: warning: Function parameter or member 'filter_dev' not described in 'ndo_dflt_fdb_dump'
      net/core/rtnetlink.c:3861: warning: Function parameter or member 'idx' not described in 'ndo_dflt_fdb_dump'
      net/core/rtnetlink.c:3861: warning: Excess function parameter 'nlh' description in 'ndo_dflt_fdb_dump'
      
      Cc: Hubert Sokolowski <hubert.sokolowski@intel.com>
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a986967e
    • Bart Van Assche's avatar
      net/core: Document __skb_flow_dissect() flags argument · d79b3baf
      Bart Van Assche authored
      This patch avoids that the following warning is reported when building
      with W=1:
      
      warning: Function parameter or member 'flags' not described in '__skb_flow_dissect'
      
      Cc: Tom Herbert <tom@herbertland.com>
      Fixes: cd79a238 ("flow_dissector: Add flags argument to skb_flow_dissector functions") # v4.3.
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d79b3baf
    • Bart Van Assche's avatar
      net/core: Document all dev_ioctl() arguments · b3c0fd61
      Bart Van Assche authored
      This patch avoids that the following warnings are reported when building
      with W=1:
      
      net/core/dev_ioctl.c:378: warning: Function parameter or member 'ifr' not described in 'dev_ioctl'
      net/core/dev_ioctl.c:378: warning: Function parameter or member 'need_copyout' not described in 'dev_ioctl'
      net/core/dev_ioctl.c:378: warning: Excess function parameter 'arg' description in 'dev_ioctl'
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Fixes: 44c02a2c ("dev_ioctl(): move copyin/copyout to callers") # v4.16.
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3c0fd61
    • Bart Van Assche's avatar
      net/core: Document reuseport_add_sock() bind_inany argument · 37f3c421
      Bart Van Assche authored
      This patch avoids that the following warning is reported when building
      with W=1:
      
      warning: Function parameter or member 'bind_inany' not described in 'reuseport_add_sock'
      
      Cc: Martin KaFai Lau <kafai@fb.com>
      Fixes: 2dbb9b9e ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT") # v4.19.
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37f3c421
    • Heiner Kallweit's avatar
      net: dsa: mv88e6xxx: remove unneeded cmode initialization · 863d1a8d
      Heiner Kallweit authored
      This partially reverts ed8fe202 ("net: dsa: mv88e6xxx: prevent
      interrupt storm caused by mv88e6390x_port_set_cmode"). I missed
      that chip->ports[].cmode is overwritten anyway by the cmode
      caching in mv88e6xxx_setup().
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      863d1a8d
    • Sudarsana Reddy Kalluru's avatar
      bnx2x: Utilize FW 7.13.11.0. · 32705592
      Sudarsana Reddy Kalluru authored
      Commit 8fcf0ec44c11f "bnx2x: Add FW 7.13.11.0" added said .bin FW to
      linux-firmware; This patch incorporates the FW in the bnx2x driver.
      This introduces few FW fixes and the support for Tx VLAN filtering.
      
      Please consider applying it to 'net-next' tree.
      Signed-off-by: default avatarSudarsana Reddy Kalluru <skalluru@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      32705592
    • Kristian Evensen's avatar
      fou: Support binding FoU socket · 1713cb37
      Kristian Evensen authored
      An FoU socket is currently bound to the wildcard-address. While this
      works fine, there are several use-cases where the use of the
      wildcard-address is not desirable. For example, I use FoU on some
      multi-homed servers and would like to use FoU on only one of the
      interfaces.
      
      This commit adds support for binding FoU sockets to a given source
      address/interface, as well as connecting the socket to a given
      destination address/port. udp_tunnel already provides the required
      infrastructure, so most of the code added is for exposing and setting
      the different attributes (local address, peer address, etc.).
      
      The lookups performed when we add, delete or get an FoU-socket has also
      been updated to compare all the attributes a user can set. Since the
      comparison now involves several elements, I have added a separate
      comparison-function instead of open-coding.
      
      In order to test the code and ensure that the new comparison code works
      correctly, I started by creating a wildcard socket bound to port 1234 on
      my machine. I then tried to create a non-wildcarded socket bound to the
      same port, as well as fetching and deleting the socket (including source
      address, peer address or interface index in the netlink request).  Both
      the create, fetch and delete request failed. Deleting/fetching the
      socket was only successful when my netlink request attributes matched
      those used to create the socket.
      
      I then repeated the tests, but with a socket bound to a local ip
      address, a socket bound to a local address + interface, and a bound
      socket that was also «connected» to a peer. Add only worked when no
      socket with the matching source address/interface (or wildcard) existed,
      while fetch/delete was only successful when all attributes matched.
      
      In addition to testing that the new code work, I also checked that the
      current behavior is kept. If none of the new attributes are provided,
      then an FoU-socket is configured as before (i.e., wildcarded).  If any
      of the new attributes are provided, the FoU-socket is configured as
      expected.
      
      v1->v2:
      * Fixed building with IPv6 disabled (kbuild).
      * Fixed a return type warning and make the ugly comparison function more
      readable (kbuild).
      * Describe more in detail what has been tested (thanks David Miller).
      * Make peer port required if peer address is specified.
      Signed-off-by: default avatarKristian Evensen <kristian.evensen@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1713cb37
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 1a9df9e2
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Fixes here and there, a couple new device IDs, as usual:
      
         1) Fix BQL race in dpaa2-eth driver, from Ioana Ciornei.
      
         2) Fix 64-bit division in iwlwifi, from Arnd Bergmann.
      
         3) Fix documentation for some eBPF helpers, from Quentin Monnet.
      
         4) Some UAPI bpf header sync with tools, also from Quentin Monnet.
      
         5) Set descriptor ownership bit at the right time for jumbo frames in
            stmmac driver, from Aaro Koskinen.
      
         6) Set IFF_UP properly in tun driver, from Eric Dumazet.
      
         7) Fix load/store doubleword instruction generation in powerpc eBPF
            JIT, from Naveen N. Rao.
      
         8) nla_nest_start() return value checks all over, from Kangjie Lu.
      
         9) Fix asoc_id handling in SCTP after the SCTP_*_ASSOC changes this
            merge window. From Marcelo Ricardo Leitner and Xin Long.
      
        10) Fix memory corruption with large MTUs in stmmac, from Aaro
            Koskinen.
      
        11) Do not use ipv4 header for ipv6 flows in TCP and DCCP, from Eric
            Dumazet.
      
        12) Fix topology subscription cancellation in tipc, from Erik Hugne.
      
        13) Memory leak in genetlink error path, from Yue Haibing.
      
        14) Valid control actions properly in packet scheduler, from Davide
            Caratti.
      
        15) Even if we get EEXIST, we still need to rehash if a shrink was
            delayed. From Herbert Xu.
      
        16) Fix interrupt mask handling in interrupt handler of r8169, from
            Heiner Kallweit.
      
        17) Fix leak in ehea driver, from Wen Yang"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (168 commits)
        dpaa2-eth: fix race condition with bql frame accounting
        chelsio: use BUG() instead of BUG_ON(1)
        net: devlink: skip info_get op call if it is not defined in dumpit
        net: phy: bcm54xx: Encode link speed and activity into LEDs
        tipc: change to check tipc_own_id to return in tipc_net_stop
        net: usb: aqc111: Extend HWID table by QNAP device
        net: sched: Kconfig: update reference link for PIE
        net: dsa: qca8k: extend slave-bus implementations
        net: dsa: qca8k: remove leftover phy accessors
        dt-bindings: net: dsa: qca8k: support internal mdio-bus
        dt-bindings: net: dsa: qca8k: fix example
        net: phy: don't clear BMCR in genphy_soft_reset
        bpf, libbpf: clarify bump in libbpf version info
        bpf, libbpf: fix version info and add it to shared object
        rxrpc: avoid clang -Wuninitialized warning
        tipc: tipc clang warning
        net: sched: fix cleanup NULL pointer exception in act_mirr
        r8169: fix cable re-plugging issue
        net: ethernet: ti: fix possible object reference leak
        net: ibm: fix possible object reference leak
        ...
      1a9df9e2
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · eec7e295
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      100GbE Intel Wired LAN Driver Updates 2019-03-26
      
      This series contains more updates to the ice driver only.
      
      Jeremiah provides his first patch to the Linux kernel to clean up
      un-necessary newlines in driver log messages.
      
      Mitch updates the ice driver to use existing status codes in the iavf
      driver so that when errors occur, it will not report nonsensical
      results.  Adds support for VF admin queue interrupts by programming the
      VPINT_MBX_CTL register array.
      
      Brett adds a check for a bit that we set while preparing for a reset, to
      ensure we are prepared to do a proper reset.  Also implemented PCI error
      handling operations.  Went through and audited the hot path with pahole
      and made modifications based on the results since 2 structures were
      taking up more space than necessary due to cache alignment issues.
      Fixed an issue where when flow control was disabled, the state of flow
      control was being displayed as "Unknown".
      
      Anirudh fixes adaptive interrupt moderation changes by adding code that
      was missed, that should have been added in the initial patch to add that
      support.  Cleaned up a function prototype that was never implemented.
      Did additional code cleanup by removing unneeded braces and redundant
      code comments.
      
      Akeem fixes an issue that occurs when the VF is attempting to remove the
      default LAN/MAC address, which is programmed by the administrator by
      updating the error message to explicitly say that the VF cannot change
      the MAC programmed by the PF.
      
      Preethi fixes the driver to not fall into the error path when a added
      filter already exists, but instead continue to process the rest of the
      function and add appropriate checks after adding MAC filters.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eec7e295
    • David S. Miller's avatar
      Merge branch 'net-mvpp2-Classifier-updates-and-cleanups' · b0be25c5
      David S. Miller authored
      Maxime Chevallier says:
      
      ====================
      net: mvpp2: Classifier updates and cleanups
      
      In preparation for the future addition of classification offload
      support, this series cleans-up the current code dealing with the
      classification engines.
      
      As of today, the classification code only deals with RSS, which is a
      very narrow use-case when considering all of the classifier's features.
      
      This lead to design and naming decisions that don't really stand when
      considering using more classification features.
      
      This series therefore includes quite a lot a renaming of functions and
      macros, and tries to make the naming schemes more consistent and the
      code more readable.
      
      The debugfs interface that allows reading the various Parsing and
      Classification engines has been cleaned-up and made more generic,
      allowing to read the Flow Table and C2 engine hit counters on a
      per-entry basis.
      
      The Classifier itself has been made more robust by introducing the
      lu_type field in classification lookups, that prevents false-positive
      matches in the future. We also initialise the various engine in a more
      extensive and less error-prone way by assining default values to all
      entries in the C2 and Flow table.
      
      This is a pretty big series considering it's mostly cleanup, but my goal
      is to make the future series that will contain new big features easier
      to review, focusing on the real logic.
      
      Besides the debugfs interface, this series doesn't intend to introduce any
      new features or change in behaviours.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0be25c5
    • Maxime Chevallier's avatar
      net: mvpp2: cls: Rework C2 engine macros · c2d3d8ee
      Maxime Chevallier authored
      The C2 classification engine has a 256 entry TCAM, used for ternary
      matches on an 8 byte Header Extracted Key. For now, we compute the
      various indices for classification and RSS that use this engine thanks
      to a set of macros.
      
      This commit mainly renames the macros used to make it clear that they
      should be used with the C2 engine, but also make use of the full 256
      entries in the engine. For now, the C2 entries are only used for RSS.
      
      These entries are put at the end of the TCAM range, in case we want to
      add higher priority matches later on.
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2d3d8ee
    • Maxime Chevallier's avatar
      net: mvpp2: cls: Initialize lookup priorities for all entries in the flow · 693131db
      Maxime Chevallier authored
      When classifying a packet pertaining to a given flow, the classifier
      will issue multiple lookup commands until it finds one with the 'last'
      bit set. It expects all prorities to be assign continuously (although
      not necessarily in an ordered fashion) from 0 to the number of lookups.
      
      We can initialize this once, and make sure unused lookups are given an
      empty port map. This avoids having to maintain priorities and the
      information of which lookup is the last.
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      693131db