1. 15 May, 2020 1 commit
    • Daniel Borkmann's avatar
      bpf: Restrict bpf_probe_read{, str}() only to archs where they work · 0ebeea8c
      Daniel Borkmann authored
      Given the legacy bpf_probe_read{,str}() BPF helpers are broken on archs
      with overlapping address ranges, we should really take the next step to
      disable them from BPF use there.
      
      To generally fix the situation, we've recently added new helper variants
      bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str().
      For details on them, see 6ae08ae3 ("bpf: Add probe_read_{user, kernel}
      and probe_read_{user,kernel}_str helpers").
      
      Given bpf_probe_read{,str}() have been around for ~5 years by now, there
      are plenty of users at least on x86 still relying on them today, so we
      cannot remove them entirely w/o breaking the BPF tracing ecosystem.
      
      However, their use should be restricted to archs with non-overlapping
      address ranges where they are working in their current form. Therefore,
      move this behind a CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE and
      have x86, arm64, arm select it (other archs supporting it can follow-up
      on it as well).
      
      For the remaining archs, they can workaround easily by relying on the
      feature probe from bpftool which spills out defines that can be used out
      of BPF C code to implement the drop-in replacement for old/new kernels
      via: bpftool feature probe macro
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/bpf/20200515101118.6508-2-daniel@iogearbox.net
      0ebeea8c
  2. 14 May, 2020 6 commits
    • Yonghong Song's avatar
      selftests/bpf: Enforce returning 0 for fentry/fexit programs · 6d74f64b
      Yonghong Song authored
      There are a few fentry/fexit programs returning non-0.
      The tests with these programs will break with the previous
      patch which enfoced return-0 rules. Fix them properly.
      
      Fixes: ac065870 ("selftests/bpf: Add BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macros")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200514053207.1298479-1-yhs@fb.com
      6d74f64b
    • Yonghong Song's avatar
      bpf: Enforce returning 0 for fentry/fexit progs · e92888c7
      Yonghong Song authored
      Currently, tracing/fentry and tracing/fexit prog
      return values are not enforced. In trampoline codes,
      the fentry/fexit prog return values are ignored.
      Let us enforce it to be 0 to avoid confusion and
      allows potential future extension.
      
      This patch also explicitly added return value
      checking for tracing/raw_tp, tracing/fmod_ret,
      and freplace programs such that these program
      return values can be anything. The purpose are
      two folds:
       1. to make it explicit about return value expectations
          for these programs in verifier.
       2. for tracing prog_type, if a future attach type
          is added, the default is -ENOTSUPP which will
          enforce to specify return value ranges explicitly.
      
      Fixes: fec56f58 ("bpf: Introduce BPF trampoline")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200514053206.1298415-1-yhs@fb.com
      e92888c7
    • Anders Roxell's avatar
      security: Fix the default value of secid_to_secctx hook · 625236ba
      Anders Roxell authored
      security_secid_to_secctx is called by the bpf_lsm hook and a successful
      return value (i.e 0) implies that the parameter will be consumed by the
      LSM framework. The current behaviour return success when the pointer
      isn't initialized when CONFIG_BPF_LSM is enabled, with the default
      return from kernel/bpf/bpf_lsm.c.
      
      This is the internal error:
      
      [ 1229.341488][ T2659] usercopy: Kernel memory exposure attempt detected from null address (offset 0, size 280)!
      [ 1229.374977][ T2659] ------------[ cut here ]------------
      [ 1229.376813][ T2659] kernel BUG at mm/usercopy.c:99!
      [ 1229.378398][ T2659] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
      [ 1229.380348][ T2659] Modules linked in:
      [ 1229.381654][ T2659] CPU: 0 PID: 2659 Comm: systemd-journal Tainted: G    B   W         5.7.0-rc5-next-20200511-00019-g864e0c6319b8-dirty #13
      [ 1229.385429][ T2659] Hardware name: linux,dummy-virt (DT)
      [ 1229.387143][ T2659] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--)
      [ 1229.389165][ T2659] pc : usercopy_abort+0xc8/0xcc
      [ 1229.390705][ T2659] lr : usercopy_abort+0xc8/0xcc
      [ 1229.392225][ T2659] sp : ffff000064247450
      [ 1229.393533][ T2659] x29: ffff000064247460 x28: 0000000000000000
      [ 1229.395449][ T2659] x27: 0000000000000118 x26: 0000000000000000
      [ 1229.397384][ T2659] x25: ffffa000127049e0 x24: ffffa000127049e0
      [ 1229.399306][ T2659] x23: ffffa000127048e0 x22: ffffa000127048a0
      [ 1229.401241][ T2659] x21: ffffa00012704b80 x20: ffffa000127049e0
      [ 1229.403163][ T2659] x19: ffffa00012704820 x18: 0000000000000000
      [ 1229.405094][ T2659] x17: 0000000000000000 x16: 0000000000000000
      [ 1229.407008][ T2659] x15: 0000000000000000 x14: 003d090000000000
      [ 1229.408942][ T2659] x13: ffff80000d5b25b2 x12: 1fffe0000d5b25b1
      [ 1229.410859][ T2659] x11: 1fffe0000d5b25b1 x10: ffff80000d5b25b1
      [ 1229.412791][ T2659] x9 : ffffa0001034bee0 x8 : ffff00006ad92d8f
      [ 1229.414707][ T2659] x7 : 0000000000000000 x6 : ffffa00015eacb20
      [ 1229.416642][ T2659] x5 : ffff0000693c8040 x4 : 0000000000000000
      [ 1229.418558][ T2659] x3 : ffffa0001034befc x2 : d57a7483a01c6300
      [ 1229.420610][ T2659] x1 : 0000000000000000 x0 : 0000000000000059
      [ 1229.422526][ T2659] Call trace:
      [ 1229.423631][ T2659]  usercopy_abort+0xc8/0xcc
      [ 1229.425091][ T2659]  __check_object_size+0xdc/0x7d4
      [ 1229.426729][ T2659]  put_cmsg+0xa30/0xa90
      [ 1229.428132][ T2659]  unix_dgram_recvmsg+0x80c/0x930
      [ 1229.429731][ T2659]  sock_recvmsg+0x9c/0xc0
      [ 1229.431123][ T2659]  ____sys_recvmsg+0x1cc/0x5f8
      [ 1229.432663][ T2659]  ___sys_recvmsg+0x100/0x160
      [ 1229.434151][ T2659]  __sys_recvmsg+0x110/0x1a8
      [ 1229.435623][ T2659]  __arm64_sys_recvmsg+0x58/0x70
      [ 1229.437218][ T2659]  el0_svc_common.constprop.1+0x29c/0x340
      [ 1229.438994][ T2659]  do_el0_svc+0xe8/0x108
      [ 1229.440587][ T2659]  el0_svc+0x74/0x88
      [ 1229.441917][ T2659]  el0_sync_handler+0xe4/0x8b4
      [ 1229.443464][ T2659]  el0_sync+0x17c/0x180
      [ 1229.444920][ T2659] Code: aa1703e2 aa1603e1 910a8260 97ecc860 (d4210000)
      [ 1229.447070][ T2659] ---[ end trace 400497d91baeaf51 ]---
      [ 1229.448791][ T2659] Kernel panic - not syncing: Fatal exception
      [ 1229.450692][ T2659] Kernel Offset: disabled
      [ 1229.452061][ T2659] CPU features: 0x240002,20002004
      [ 1229.453647][ T2659] Memory Limit: none
      [ 1229.455015][ T2659] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      Rework the so the default return value is -EOPNOTSUPP.
      
      There are likely other callbacks such as security_inode_getsecctx() that
      may have the same problem, and that someone that understand the code
      better needs to audit them.
      
      Thank you Arnd for helping me figure out what went wrong.
      
      Fixes: 98e828a0 ("security: Refactor declaration of LSM hooks")
      Signed-off-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Link: https://lore.kernel.org/bpf/20200512174607.9630-1-anders.roxell@linaro.org
      625236ba
    • Sumanth Korikkar's avatar
      libbpf: Fix register naming in PT_REGS s390 macros · 516d8d49
      Sumanth Korikkar authored
      Fix register naming in PT_REGS s390 macros
      
      Fixes: b8ebce86 ("libbpf: Provide CO-RE variants of PT_REGS macros")
      Signed-off-by: default avatarSumanth Korikkar <sumanthk@linux.ibm.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200513154414.29972-1-sumanthk@linux.ibm.com
      516d8d49
    • Andrii Nakryiko's avatar
      bpf: Fix bug in mmap() implementation for BPF array map · 333291ce
      Andrii Nakryiko authored
      mmap() subsystem allows user-space application to memory-map region with
      initial page offset. This wasn't taken into account in initial implementation
      of BPF array memory-mapping. This would result in wrong pages, not taking into
      account requested page shift, being memory-mmaped into user-space. This patch
      fixes this gap and adds a test for such scenario.
      
      Fixes: fc970227 ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY")
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200512235925.3817805-1-andriin@fb.com
      333291ce
    • Matteo Croce's avatar
      samples: bpf: Fix build error · 23ad0466
      Matteo Croce authored
      GCC 10 is very strict about symbol clash, and lwt_len_hist_user contains
      a symbol which clashes with libbpf:
      
      /usr/bin/ld: samples/bpf/lwt_len_hist_user.o:(.bss+0x0): multiple definition of `bpf_log_buf'; samples/bpf/bpf_load.o:(.bss+0x8c0): first defined here
      collect2: error: ld returned 1 exit status
      
      bpf_log_buf here seems to be a leftover, so removing it.
      Signed-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200511113234.80722-1-mcroce@redhat.com
      23ad0466
  3. 13 May, 2020 10 commits
  4. 12 May, 2020 7 commits
  5. 11 May, 2020 4 commits
    • David S. Miller's avatar
      Merge branch 'net-ipa-fix-cleanup-after-modem-crash' · 1abfb181
      David S. Miller authored
      Alex Elder says:
      
      ====================
      net: ipa: fix cleanup after modem crash
      
      The first patch in this series fixes a bug where the size of a data
      transfer request was never set, meaning it was 0.  The consequence
      of this was that such a transfer request would never complete if
      attempted, and led to a hung task timeout.
      
      This data transfer is required for cleaning up IPA hardware state
      when recovering from a modem crash.  The code to implement this
      cleanup is already present, but its use was commented out because
      it hit the bug described above.  So the second patch in this series
      enables the use of that "tag process" cleanup code.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1abfb181
    • Alex Elder's avatar
      net: ipa: use tag process on modem crash · 2c4bb809
      Alex Elder authored
      One part of recovering from a modem crash is performing a "tag
      sequence" of several IPA immediate commands, to clear the hardware
      pipeline.  The sequence ends with a data transfer request on the
      command endpoint (which is not otherwise done).  Unfortunately,
      attempting to do the data transfer led to a hang, so that request
      plus two other commands were commented out.
      
      The previous commit fixes the bug that was causing that hang.  And
      with that bug fixed we can properly issue the tag sequence when the
      modem crashes, to return the hardware to a known state.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c4bb809
    • Alex Elder's avatar
      net: ipa: set DMA length in gsi_trans_cmd_add() · c781e1d4
      Alex Elder authored
      When a command gets added to a transaction for the AP->command
      channel we set the DMA address of its scatterlist entry, but not
      its DMA length.  Fix this bug.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c781e1d4
    • Luo bin's avatar
      hinic: fix a bug of ndo_stop · e8a1b0ef
      Luo bin authored
      if some function in ndo_stop interface returns failure because of
      hardware fault, must go on excuting rest steps rather than return
      failure directly, otherwise will cause memory leak.And bump the
      timeout for SET_FUNC_STATE to ensure that cmd won't return failure
      when hw is busy. Otherwise hw may stomp host memory if we free
      memory regardless of the return value of SET_FUNC_STATE.
      
      Fixes: 51ba902a ("net-next/hinic: Initialize hw interface")
      Signed-off-by: default avatarLuo bin <luobin9@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e8a1b0ef
  6. 10 May, 2020 2 commits
    • Florian Fainelli's avatar
      net: dsa: loop: Add module soft dependency · 3047211c
      Florian Fainelli authored
      There is a soft dependency against dsa_loop_bdinfo.ko which sets up the
      MDIO device registration, since there are no symbols referenced by
      dsa_loop.ko, there is no automatic loading of dsa_loop_bdinfo.ko which
      is needed.
      
      Fixes: 98cd1552 ("net: dsa: Mock-up driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3047211c
    • Zefan Li's avatar
      netprio_cgroup: Fix unlimited memory leak of v2 cgroups · 090e28b2
      Zefan Li authored
      If systemd is configured to use hybrid mode which enables the use of
      both cgroup v1 and v2, systemd will create new cgroup on both the default
      root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach
      task to the two cgroups. If the task does some network thing then the v2
      cgroup can never be freed after the session exited.
      
      One of our machines ran into OOM due to this memory leak.
      
      In the scenario described above when sk_alloc() is called
      cgroup_sk_alloc() thought it's in v2 mode, so it stores
      the cgroup pointer in sk->sk_cgrp_data and increments
      the cgroup refcnt, but then sock_update_netprioidx()
      thought it's in v1 mode, so it stores netprioidx value
      in sk->sk_cgrp_data, so the cgroup refcnt will never be freed.
      
      Currently we do the mode switch when someone writes to the ifpriomap
      cgroup control file. The easiest fix is to also do the switch when
      a task is attached to a new cgroup.
      
      Fixes: bd1060a1 ("sock, cgroup: add sock->sk_cgroup")
      Reported-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Tested-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      090e28b2
  7. 09 May, 2020 8 commits
  8. 08 May, 2020 2 commits
    • Kelly Littlepage's avatar
      net: tcp: fix rx timestamp behavior for tcp_recvmsg · cc4de047
      Kelly Littlepage authored
      The stated intent of the original commit is to is to "return the timestamp
      corresponding to the highest sequence number data returned." The current
      implementation returns the timestamp for the last byte of the last fully
      read skb, which is not necessarily the last byte in the recv buffer. This
      patch converts behavior to the original definition, and to the behavior of
      the previous draft versions of commit 98aaa913 ("tcp: Extend
      SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg") which also match this
      behavior.
      
      Fixes: 98aaa913 ("tcp: Extend SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg")
      Co-developed-by: default avatarIris Liu <iris@onechronos.com>
      Signed-off-by: default avatarIris Liu <iris@onechronos.com>
      Signed-off-by: default avatarKelly Littlepage <kelly@onechronos.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cc4de047
    • Cong Wang's avatar
      net: fix a potential recursive NETDEV_FEAT_CHANGE · dd912306
      Cong Wang authored
      syzbot managed to trigger a recursive NETDEV_FEAT_CHANGE event
      between bonding master and slave. I managed to find a reproducer
      for this:
      
        ip li set bond0 up
        ifenslave bond0 eth0
        brctl addbr br0
        ethtool -K eth0 lro off
        brctl addif br0 bond0
        ip li set br0 up
      
      When a NETDEV_FEAT_CHANGE event is triggered on a bonding slave,
      it captures this and calls bond_compute_features() to fixup its
      master's and other slaves' features. However, when syncing with
      its lower devices by netdev_sync_lower_features() this event is
      triggered again on slaves when the LRO feature fails to change,
      so it goes back and forth recursively until the kernel stack is
      exhausted.
      
      Commit 17b85d29 intentionally lets __netdev_update_features()
      return -1 for such a failure case, so we have to just rely on
      the existing check inside netdev_sync_lower_features() and skip
      NETDEV_FEAT_CHANGE event only for this specific failure case.
      
      Fixes: fd867d51 ("net/core: generic support for disabling netdev features down stack")
      Reported-by: syzbot+e73ceacfd8560cc8a3ca@syzkaller.appspotmail.com
      Reported-by: syzbot+c2fb6f9ddcea95ba49b5@syzkaller.appspotmail.com
      Cc: Jarod Wilson <jarod@redhat.com>
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Reviewed-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd912306