1. 31 Aug, 2021 25 commits
  2. 30 Aug, 2021 15 commits
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 19a31d79
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      bpf-next 2021-08-31
      
      We've added 116 non-merge commits during the last 17 day(s) which contain
      a total of 126 files changed, 6813 insertions(+), 4027 deletions(-).
      
      The main changes are:
      
      1) Add opaque bpf_cookie to perf link which the program can read out again,
         to be used in libbpf-based USDT library, from Andrii Nakryiko.
      
      2) Add bpf_task_pt_regs() helper to access userspace pt_regs, from Daniel Xu.
      
      3) Add support for UNIX stream type sockets for BPF sockmap, from Jiang Wang.
      
      4) Allow BPF TCP congestion control progs to call bpf_setsockopt() e.g. to switch
         to another congestion control algorithm during init, from Martin KaFai Lau.
      
      5) Extend BPF iterator support for UNIX domain sockets, from Kuniyuki Iwashima.
      
      6) Allow bpf_{set,get}sockopt() calls from setsockopt progs, from Prankur Gupta.
      
      7) Add bpf_get_netns_cookie() helper for BPF_PROG_TYPE_{SOCK_OPS,CGROUP_SOCKOPT}
         progs, from Xu Liu and Stanislav Fomichev.
      
      8) Support for __weak typed ksyms in libbpf, from Hao Luo.
      
      9) Shrink struct cgroup_bpf by 504 bytes through refactoring, from Dave Marchevsky.
      
      10) Fix a smatch complaint in verifier's narrow load handling, from Andrey Ignatov.
      
      11) Fix BPF interpreter's tail call count limit, from Daniel Borkmann.
      
      12) Big batch of improvements to BPF selftests, from Magnus Karlsson, Li Zhijian,
          Yucong Sun, Yonghong Song, Ilya Leoshkevich, Jussi Maki, Ilya Leoshkevich, others.
      
      13) Another big batch to revamp XDP samples in order to give them consistent look
          and feel, from Kumar Kartikeya Dwivedi.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (116 commits)
        MAINTAINERS: Remove self from powerpc BPF JIT
        selftests/bpf: Fix potential unreleased lock
        samples: bpf: Fix uninitialized variable in xdp_redirect_cpu
        selftests/bpf: Reduce more flakyness in sockmap_listen
        bpf: Fix bpf-next builds without CONFIG_BPF_EVENTS
        bpf: selftests: Add dctcp fallback test
        bpf: selftests: Add connect_to_fd_opts to network_helpers
        bpf: selftests: Add sk_state to bpf_tcp_helpers.h
        bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt
        selftests: xsk: Preface options with opt
        selftests: xsk: Make enums lower case
        selftests: xsk: Generate packets from specification
        selftests: xsk: Generate packet directly in umem
        selftests: xsk: Simplify cleanup of ifobjects
        selftests: xsk: Decrease sending speed
        selftests: xsk: Validate tx stats on tx thread
        selftests: xsk: Simplify packet validation in xsk tests
        selftests: xsk: Rename worker_* functions that are not thread entry points
        selftests: xsk: Disassociate umem size with packets sent
        selftests: xsk: Remove end-of-test packet
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20210830225618.11634-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      19a31d79
    • Maxim Mikityanskiy's avatar
      sch_htb: Fix inconsistency when leaf qdisc creation fails · ca49bfd9
      Maxim Mikityanskiy authored
      In HTB offload mode, qdiscs of leaf classes are grafted to netdev
      queues. sch_htb expects the dev_queue field of these qdiscs to point to
      the corresponding queues. However, qdisc creation may fail, and in that
      case noop_qdisc is used instead. Its dev_queue doesn't point to the
      right queue, so sch_htb can lose track of used netdev queues, which will
      cause internal inconsistencies.
      
      This commit fixes this bug by keeping track of the netdev queue inside
      struct htb_class. All reads of cl->leaf.q->dev_queue are replaced by the
      new field, the two values are synced on writes, and WARNs are added to
      assert equality of the two values.
      
      The driver API has changed: when TC_HTB_LEAF_DEL needs to move a queue,
      the driver used to pass the old and new queue IDs to sch_htb. Now that
      there is a new field (offload_queue) in struct htb_class that needs to
      be updated on this operation, the driver will pass the old class ID to
      sch_htb instead (it already knows the new class ID).
      
      Fixes: d03b195b ("sch_htb: Hierarchical QoS hardware offload")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20210826115425.1744053-1-maximmi@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ca49bfd9
    • Sandipan Das's avatar
      MAINTAINERS: Remove self from powerpc BPF JIT · fca35b11
      Sandipan Das authored
      Stepping down as I haven't had a chance to look into the powerpc
      BPF JIT compilers for a while.
      Signed-off-by: default avatarSandipan Das <sandipan@linux.ibm.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210827111905.396145-1-sandipan@linux.ibm.com
      fca35b11
    • Yajun Deng's avatar
      net: ipv4: Fix the warning for dereference · 1b9fbe81
      Yajun Deng authored
      Add a if statements to avoid the warning.
      
      Dan Carpenter report:
      The patch faf482ca: "net: ipv4: Move ip_options_fragment() out of
      loop" from Aug 23, 2021, leads to the following Smatch complaint:
      
          net/ipv4/ip_output.c:833 ip_do_fragment()
          warn: variable dereferenced before check 'iter.frag' (see line 828)
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Fixes: faf482ca ("net: ipv4: Move ip_options_fragment() out of loop")
      Link: https://lore.kernel.org/netdev/20210830073802.GR7722@kadam/T/#tSigned-off-by: default avatarYajun Deng <yajun.deng@linux.dev>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b9fbe81
    • Dan Carpenter's avatar
      net: qrtr: make checks in qrtr_endpoint_post() stricter · aaa8e492
      Dan Carpenter authored
      These checks are still not strict enough.  The main problem is that if
      "cb->type == QRTR_TYPE_NEW_SERVER" is true then "len - hdrlen" is
      guaranteed to be 4 but we need to be at least 16 bytes.  In fact, we
      can reject everything smaller than sizeof(*pkt) which is 20 bytes.
      
      Also I don't like the ALIGN(size, 4).  It's better to just insist that
      data is needs to be aligned at the start.
      
      Fixes: 0baa99ee ("net: qrtr: Allow non-immediate node routing")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aaa8e492
    • Haimin Zhang's avatar
      fix array-index-out-of-bounds in taprio_change · efe487fc
      Haimin Zhang authored
      syzbot report an array-index-out-of-bounds in taprio_change
      index 16 is out of range for type '__u16 [16]'
      that's because mqprio->num_tc is lager than TC_MAX_QUEUE,so we check
      the return value of netdev_set_num_tc.
      
      Reported-by: syzbot+2b3e5fb6c7ef285a94f6@syzkaller.appspotmail.com
      Signed-off-by: default avatarHaimin Zhang <tcs_kernel@tencent.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      efe487fc
    • 王贇's avatar
      net: fix NULL pointer reference in cipso_v4_doi_free · e842cb60
      王贇 authored
      In netlbl_cipsov4_add_std() when 'doi_def->map.std' alloc
      failed, we sometime observe panic:
      
        BUG: kernel NULL pointer dereference, address:
        ...
        RIP: 0010:cipso_v4_doi_free+0x3a/0x80
        ...
        Call Trace:
         netlbl_cipsov4_add_std+0xf4/0x8c0
         netlbl_cipsov4_add+0x13f/0x1b0
         genl_family_rcv_msg_doit.isra.15+0x132/0x170
         genl_rcv_msg+0x125/0x240
      
      This is because in cipso_v4_doi_free() there is no check
      on 'doi_def->map.std' when doi_def->type got value 1, which
      is possibe, since netlbl_cipsov4_add_std() haven't initialize
      it before alloc 'doi_def->map.std'.
      
      This patch just add the check to prevent panic happen in similar
      cases.
      Reported-by: default avatarAbaci <abaci@linux.alibaba.com>
      Signed-off-by: default avatarMichael Wang <yun.wang@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e842cb60
    • David S. Miller's avatar
      Merge branch 'inet-exceptions-less-predictable' · 63cad4c7
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      inet: make exception handling less predictible
      
      This second round of patches is addressing Keyu Man recommendations
      to make linux hosts more robust against a class of brute force attacks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63cad4c7
    • Eric Dumazet's avatar
      ipv4: make exception cache less predictible · 67d6d681
      Eric Dumazet authored
      Even after commit 6457378f ("ipv4: use siphash instead of Jenkins in
      fnhe_hashfun()"), an attacker can still use brute force to learn
      some secrets from a victim linux host.
      
      One way to defeat these attacks is to make the max depth of the hash
      table bucket a random value.
      
      Before this patch, each bucket of the hash table used to store exceptions
      could contain 6 items under attack.
      
      After the patch, each bucket would contains a random number of items,
      between 6 and 10. The attacker can no longer infer secrets.
      
      This is slightly increasing memory size used by the hash table,
      by 50% in average, we do not expect this to be a problem.
      
      This patch is more complex than the prior one (IPv6 equivalent),
      because IPv4 was reusing the oldest entry.
      Since we need to be able to evict more than one entry per
      update_or_create_fnhe() call, I had to replace
      fnhe_oldest() with fnhe_remove_oldest().
      
      Also note that we will queue extra kfree_rcu() calls under stress,
      which hopefully wont be a too big issue.
      
      Fixes: 4895c771 ("ipv4: Add FIB nexthop exceptions.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarKeyu Man <kman001@ucr.edu>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Tested-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67d6d681
    • Eric Dumazet's avatar
      ipv6: make exception cache less predictible · a00df2ca
      Eric Dumazet authored
      Even after commit 4785305c ("ipv6: use siphash in rt6_exception_hash()"),
      an attacker can still use brute force to learn some secrets from a victim
      linux host.
      
      One way to defeat these attacks is to make the max depth of the hash
      table bucket a random value.
      
      Before this patch, each bucket of the hash table used to store exceptions
      could contain 6 items under attack.
      
      After the patch, each bucket would contains a random number of items,
      between 6 and 10. The attacker can no longer infer secrets.
      
      This is slightly increasing memory size used by the hash table,
      we do not expect this to be a problem.
      
      Following patch is dealing with the same issue in IPv4.
      
      Fixes: 35732d01 ("ipv6: introduce a hash table to store dst cache")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarKeyu Man <kman001@ucr.edu>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a00df2ca
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 9dfa859d
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for net-next:
      
      1) Clean up and consolidate ct ecache infrastructure by merging ct and
         expect notifiers, from Florian Westphal.
      
      2) Missing counters and timestamp in nfnetlink_queue and _log conntrack
         information.
      
      3) Missing error check for xt_register_template() in iptables mangle,
         as a incremental fix for the previous pull request, also from
         Florian Westphal.
      
      4) Add netfilter hooks for the SRv6 lightweigh tunnel driver, from
         Ryoga Sato. The hooks are enabled via nf_hooks_lwtunnel sysctl
         to make sure existing netfilter rulesets do not break. There is
         a static key to disable the hooks by default.
      
         The pktgen_bench_xmit_mode_netif_receive.sh shows no noticeable
         impact in the seg6_input path for non-netfilter users: similar
         numbers with and without this patch.
      
         This is a sample of the perf report output:
      
          11.67%  kpktgend_0       [ipv6]                    [k] ipv6_get_saddr_eval
           7.89%  kpktgend_0       [ipv6]                    [k] __ipv6_addr_label
           7.52%  kpktgend_0       [ipv6]                    [k] __ipv6_dev_get_saddr
           6.63%  kpktgend_0       [kernel.vmlinux]          [k] asm_exc_nmi
           4.74%  kpktgend_0       [ipv6]                    [k] fib6_node_lookup_1
           3.48%  kpktgend_0       [kernel.vmlinux]          [k] pskb_expand_head
           3.33%  kpktgend_0       [ipv6]                    [k] ip6_rcv_core.isra.29
           3.33%  kpktgend_0       [ipv6]                    [k] seg6_do_srh_encap
           2.53%  kpktgend_0       [ipv6]                    [k] ipv6_dev_get_saddr
           2.45%  kpktgend_0       [ipv6]                    [k] fib6_table_lookup
           2.24%  kpktgend_0       [kernel.vmlinux]          [k] ___cache_free
           2.16%  kpktgend_0       [ipv6]                    [k] ip6_pol_route
           2.11%  kpktgend_0       [kernel.vmlinux]          [k] __ipv6_addr_type
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9dfa859d
    • David S. Miller's avatar
      Merge branch 'IXP46x-PTP-Timer' · 724812d8
      David S. Miller authored
      Linus Walleij says:
      
      ====================
      IXP46x PTP Timer clean-up and DT
      
      ChangeLog v2->v3:
      
      - Dropped the patch enabling compile tests: we are still dependent
        on some machine-specific headers. The plan is to get rid of this
        after device tree conversion. We include one of the compile testing
        fixes anyway, because it is nice to have fixed.
      
      - Rebased on the latest net-next
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      724812d8
    • Linus Walleij's avatar
      ixp4xx_eth: Probe the PTP module from the device tree · e9e50622
      Linus Walleij authored
      This adds device tree probing support for the PTP module
      adjacent to the ethernet module. It is pretty straight
      forward, all resources are in the device tree as they
      come to the platform device.
      
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9e50622
    • Linus Walleij's avatar
      ixp4xx_eth: Add devicetree bindings · 323fb75d
      Linus Walleij authored
      This adds device tree bindings for the IXP46x PTP Timer, a companion
      to the IXP4xx ethernet in newer platforms.
      
      Cc: devicetree@vger.kernel.org
      Cc: Arnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      323fb75d
    • Linus Walleij's avatar
      ixp4xx_eth: Stop referring to GPIOs · 13dc9319
      Linus Walleij authored
      The driver is being passed interrupts, then looking up the
      same interrupts as GPIOs a second time to convert them into
      interrupts and set properties on them.
      
      This is pointless: the GPIO and irqchip APIs of a GPIO chip
      are orthogonal. Just request the interrupts and be done
      with it, drop reliance on any GPIO functions or definitions.
      
      Use devres-managed functions and add a small devress quirk
      to unregister the clock as well and we can rely on devres
      to handle all the resources and cut down a bunch of
      boilerplate in the process.
      
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13dc9319