1. 12 Aug, 2020 2 commits
    • Stefano Garzarella's avatar
      vsock: fix potential null pointer dereference in vsock_poll() · 1980c058
      Stefano Garzarella authored
      syzbot reported this issue where in the vsock_poll() we find the
      socket state at TCP_ESTABLISHED, but 'transport' is null:
        general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN
        KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
        CPU: 0 PID: 8227 Comm: syz-executor.2 Not tainted 5.8.0-rc7-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:vsock_poll+0x75a/0x8e0 net/vmw_vsock/af_vsock.c:1038
        Call Trace:
         sock_poll+0x159/0x460 net/socket.c:1266
         vfs_poll include/linux/poll.h:90 [inline]
         do_pollfd fs/select.c:869 [inline]
         do_poll fs/select.c:917 [inline]
         do_sys_poll+0x607/0xd40 fs/select.c:1011
         __do_sys_poll fs/select.c:1069 [inline]
         __se_sys_poll fs/select.c:1057 [inline]
         __x64_sys_poll+0x18c/0x440 fs/select.c:1057
         do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This issue can happen if the TCP_ESTABLISHED state is set after we read
      the vsk->transport in the vsock_poll().
      
      We could put barriers to synchronize, but this can only happen during
      connection setup, so we can simply check that 'transport' is valid.
      
      Fixes: c0cfa2d8 ("vsock: add multi-transports support")
      Reported-and-tested-by: syzbot+a61bac2fcc1a7c6623fe@syzkaller.appspotmail.com
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Reviewed-by: default avatarJorgen Hansen <jhansen@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1980c058
    • Edward Cree's avatar
      sfc: fix ef100 design-param checking · 41077c99
      Edward Cree authored
      The handling of the RXQ/TXQ size granularity design-params had two
       problems: it had a 64-bit divide that didn't build on 32-bit platforms,
       and it could divide by zero if the NIC supplied 0 as the value of the
       design-param.  Fix both by checking for 0 and for a granularity bigger
       than our min-size; if the granularity <= EFX_MIN_DMAQ_SIZE then it fits
       in 32 bits, so we can cast it to u32 for the divide.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41077c99
  2. 11 Aug, 2020 11 commits
    • David S. Miller's avatar
      Merge branch 'net-initialize-fastreuse-on-inet_inherit_port' · 633f5b6b
      David S. Miller authored
      Tim Froidcoeur says:
      
      ====================
      net: initialize fastreuse on inet_inherit_port
      
      In the case of TPROXY, bind_conflict optimizations for SO_REUSEADDR or
      SO_REUSEPORT are broken, possibly resulting in O(n) instead of O(1) bind
      behaviour or in the incorrect reuse of a bind.
      
      the kernel keeps track for each bind_bucket if all sockets in the
      bind_bucket support SO_REUSEADDR or SO_REUSEPORT in two fastreuse flags.
      These flags allow skipping the costly bind_conflict check when possible
      (meaning when all sockets have the proper SO_REUSE option).
      
      For every socket added to a bind_bucket, these flags need to be updated.
      As soon as a socket that does not support reuse is added, the flag is
      set to false and will never go back to true, unless the bind_bucket is
      deleted.
      
      Note that there is no mechanism to re-evaluate these flags when a socket
      is removed (this might make sense when removing a socket that would not
      allow reuse; this leaves room for a future patch).
      
      For this optimization to work, it is mandatory that these flags are
      properly initialized and updated.
      
      When a child socket is created from a listen socket in
      __inet_inherit_port, the TPROXY case could create a new bind bucket
      without properly initializing these flags, thus preventing the
      optimization to work. Alternatively, a socket not allowing reuse could
      be added to an existing bind bucket without updating the flags, causing
      bind_conflict to never be called as it should.
      
      Patch 1/2 refactors the fastreuse update code in inet_csk_get_port into a
      small helper function, making the actual fix tiny and easier to understand.
      
      Patch 2/2 calls this new helper when __inet_inherit_port decides to create
      a new bind_bucket or use a different bind_bucket than the one of the listen
      socket.
      
      v4: - rebase on latest linux/net master branch
      v3: - remove company disclaimer from automatic signature
      v2: - remove unnecessary cast
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      633f5b6b
    • Tim Froidcoeur's avatar
      net: initialize fastreuse on inet_inherit_port · d76f3351
      Tim Froidcoeur authored
      In the case of TPROXY, bind_conflict optimizations for SO_REUSEADDR or
      SO_REUSEPORT are broken, possibly resulting in O(n) instead of O(1) bind
      behaviour or in the incorrect reuse of a bind.
      
      the kernel keeps track for each bind_bucket if all sockets in the
      bind_bucket support SO_REUSEADDR or SO_REUSEPORT in two fastreuse flags.
      These flags allow skipping the costly bind_conflict check when possible
      (meaning when all sockets have the proper SO_REUSE option).
      
      For every socket added to a bind_bucket, these flags need to be updated.
      As soon as a socket that does not support reuse is added, the flag is
      set to false and will never go back to true, unless the bind_bucket is
      deleted.
      
      Note that there is no mechanism to re-evaluate these flags when a socket
      is removed (this might make sense when removing a socket that would not
      allow reuse; this leaves room for a future patch).
      
      For this optimization to work, it is mandatory that these flags are
      properly initialized and updated.
      
      When a child socket is created from a listen socket in
      __inet_inherit_port, the TPROXY case could create a new bind bucket
      without properly initializing these flags, thus preventing the
      optimization to work. Alternatively, a socket not allowing reuse could
      be added to an existing bind bucket without updating the flags, causing
      bind_conflict to never be called as it should.
      
      Call inet_csk_update_fastreuse when __inet_inherit_port decides to create
      a new bind_bucket or use a different bind_bucket than the one of the
      listen socket.
      
      Fixes: 093d2823 ("tproxy: fix hash locking issue when using port redirection in __inet_inherit_port()")
      Acked-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarTim Froidcoeur <tim.froidcoeur@tessares.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d76f3351
    • Tim Froidcoeur's avatar
      net: refactor bind_bucket fastreuse into helper · 62ffc589
      Tim Froidcoeur authored
      Refactor the fastreuse update code in inet_csk_get_port into a small
      helper function that can be called from other places.
      Acked-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarTim Froidcoeur <tim.froidcoeur@tessares.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62ffc589
    • Marek Behún's avatar
      net: phy: marvell10g: fix null pointer dereference · 1b8ef142
      Marek Behún authored
      Commit c3e302ed ("net: phy: marvell10g: fix temperature sensor on 2110")
      added a check for PHY ID via phydev->drv->phy_id in a function which is
      called by devres at a time when phydev->drv is already set to null by
      phy_remove function.
      
      This null pointer dereference can be triggered via SFP subsystem with a
      SFP module containing this Marvell PHY. When the SFP interface is put
      down, the SFP subsystem removes the PHY.
      
      Fixes: c3e302ed ("net: phy: marvell10g: fix temperature sensor on 2110")
      Signed-off-by: default avatarMarek Behún <marek.behun@nic.cz>
      Cc: Maxime Chevallier <maxime.chevallier@bootlin.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Baruch Siach <baruch@tkos.co.il>
      Cc: Russell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b8ef142
    • Miaohe Lin's avatar
      net: Fix potential memory leak in proto_register() · 0f5907af
      Miaohe Lin authored
      If we failed to assign proto idx, we free the twsk_slab_name but forget to
      free the twsk_slab. Add a helper function tw_prot_cleanup() to free these
      together and also use this helper function in proto_unregister().
      
      Fixes: b45ce321 ("sock: fix potential memory leak in proto_register()")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f5907af
    • Wang Hai's avatar
      net: qcom/emac: add missed clk_disable_unprepare in error path of emac_clks_phase1_init · 50caa777
      Wang Hai authored
      Fix the missing clk_disable_unprepare() before return
      from emac_clks_phase1_init() in the error handling case.
      
      Fixes: b9b17deb ("net: emac: emac gigabit ethernet controller driver")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarWang Hai <wanghai38@huawei.com>
      Acked-by: default avatarTimur Tabi <timur@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50caa777
    • Xu Wang's avatar
      ionic_lif: Use devm_kcalloc() in ionic_qcq_alloc() · e7164200
      Xu Wang authored
      A multiplication for the size determination of a memory allocation
      indicated that an array data structure should be processed.
      Thus use the corresponding function "devm_kcalloc".
      Signed-off-by: default avatarXu Wang <vulab@iscas.ac.cn>
      Acked-by: default avatarShannon Nelson <snelson@pensando.io>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7164200
    • Qingyu Li's avatar
      net/nfc/rawsock.c: add CAP_NET_RAW check. · 26896f01
      Qingyu Li authored
      When creating a raw AF_NFC socket, CAP_NET_RAW needs to be checked first.
      Signed-off-by: default avatarQingyu Li <ieatmuttonchuan@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26896f01
    • Luo bin's avatar
      hinic: fix strncpy output truncated compile warnings · 1dab5877
      Luo bin authored
      fix the compile warnings of 'strncpy' output truncated before
      terminating nul copying N bytes from a string of the same length
      Signed-off-by: default avatarLuo bin <luobin9@huawei.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1dab5877
    • Xie He's avatar
      drivers/net/wan/x25_asy: Added needed_headroom and a skb->len check · c79f428d
      Xie He authored
      1. Added a skb->len check
      
      This driver expects upper layers to include a pseudo header of 1 byte
      when passing down a skb for transmission. This driver will read this
      1-byte header. This patch added a skb->len check before reading the
      header to make sure the header exists.
      
      2. Added needed_headroom
      
      When this driver transmits data,
        first this driver will remove a pseudo header of 1 byte,
        then the lapb module will prepend the LAPB header of 2 or 3 bytes.
      So the value of needed_headroom in this driver should be 3 - 1.
      
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Cc: Martin Schiller <ms@dev.tdt.de>
      Signed-off-by: default avatarXie He <xie.he.0141@gmail.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c79f428d
    • Ira Weiny's avatar
      net/tls: Fix kmap usage · b06c19d9
      Ira Weiny authored
      When MSG_OOB is specified to tls_device_sendpage() the mapped page is
      never unmapped.
      
      Hold off mapping the page until after the flags are checked and the page
      is actually needed.
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Signed-off-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b06c19d9
  3. 10 Aug, 2020 5 commits
  4. 08 Aug, 2020 12 commits
  5. 07 Aug, 2020 4 commits
    • Randy Dunlap's avatar
      bpf: Delete repeated words in comments · b8c1a309
      Randy Dunlap authored
      Drop repeated words in kernel/bpf/: {has, the}
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200807033141.10437-1-rdunlap@infradead.org
      b8c1a309
    • Andrii Nakryiko's avatar
      selftests/bpf: Fix silent Makefile output · d5ca5905
      Andrii Nakryiko authored
      99aacebe ("selftests: do not use .ONESHELL") removed .ONESHELL, which
      changes how Makefile "silences" multi-command target recipes. selftests/bpf's
      Makefile relied (a somewhat unknowingly) on .ONESHELL behavior of silencing
      all commands within the recipe if the first command contains @ symbol.
      Removing .ONESHELL exposed this hack.
      
      This patch fixes the issue by explicitly silencing each command with $(Q).
      
      Also explicitly define fallback rule for building *.o from *.c, instead of
      relying on non-silent inherited rule. This was causing a non-silent output for
      bench.o object file.
      
      Fixes: 92f7440e ("selftests/bpf: More succinct Makefile output")
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200807033058.848677-1-andriin@fb.com
      d5ca5905
    • Alan Maguire's avatar
      bpf, doc: Remove references to warning message when using bpf_trace_printk() · 7fb20f9e
      Alan Maguire authored
      The BPF helper bpf_trace_printk() no longer uses trace_printk();
      it is now triggers a dedicated trace event.  Hence the described
      warning is no longer present, so remove the discussion of it as
      it may confuse people.
      
      Fixes: ac5a72ea ("bpf: Use dedicated bpf_trace_printk event instead of trace_printk()")
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/1596801029-32395-1-git-send-email-alan.maguire@oracle.com
      7fb20f9e
    • Xie He's avatar
      drivers/net/wan/lapbether: Added needed_headroom and a skb->len check · c7ca03c2
      Xie He authored
      1. Added a skb->len check
      
      This driver expects upper layers to include a pseudo header of 1 byte
      when passing down a skb for transmission. This driver will read this
      1-byte header. This patch added a skb->len check before reading the
      header to make sure the header exists.
      
      2. Changed to use needed_headroom instead of hard_header_len to request
      necessary headroom to be allocated
      
      In net/packet/af_packet.c, the function packet_snd first reserves a
      headroom of length (dev->hard_header_len + dev->needed_headroom).
      Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header,
      which calls dev->header_ops->create, to create the link layer header.
      If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of
      length (dev->hard_header_len), and assumes the user to provide the
      appropriate link layer header.
      
      So according to the logic of af_packet.c, dev->hard_header_len should
      be the length of the header that would be created by
      dev->header_ops->create.
      
      However, this driver doesn't provide dev->header_ops, so logically
      dev->hard_header_len should be 0.
      
      So we should use dev->needed_headroom instead of dev->hard_header_len
      to request necessary headroom to be allocated.
      
      This change fixes kernel panic when this driver is used with AF_PACKET
      SOCK_RAW sockets.
      
      Call stack when panic:
      
      [  168.399197] skbuff: skb_under_panic: text:ffffffff819d95fb len:20
      put:14 head:ffff8882704c0a00 data:ffff8882704c09fd tail:0x11 end:0xc0
      dev:veth0
      ...
      [  168.399255] Call Trace:
      [  168.399259]  skb_push.cold+0x14/0x24
      [  168.399262]  eth_header+0x2b/0xc0
      [  168.399267]  lapbeth_data_transmit+0x9a/0xb0 [lapbether]
      [  168.399275]  lapb_data_transmit+0x22/0x2c [lapb]
      [  168.399277]  lapb_transmit_buffer+0x71/0xb0 [lapb]
      [  168.399279]  lapb_kick+0xe3/0x1c0 [lapb]
      [  168.399281]  lapb_data_request+0x76/0xc0 [lapb]
      [  168.399283]  lapbeth_xmit+0x56/0x90 [lapbether]
      [  168.399286]  dev_hard_start_xmit+0x91/0x1f0
      [  168.399289]  ? irq_init_percpu_irqstack+0xc0/0x100
      [  168.399291]  __dev_queue_xmit+0x721/0x8e0
      [  168.399295]  ? packet_parse_headers.isra.0+0xd2/0x110
      [  168.399297]  dev_queue_xmit+0x10/0x20
      [  168.399298]  packet_sendmsg+0xbf0/0x19b0
      ......
      
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Cc: Martin Schiller <ms@dev.tdt.de>
      Cc: Brian Norris <briannorris@chromium.org>
      Signed-off-by: default avatarXie He <xie.he.0141@gmail.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7ca03c2
  6. 06 Aug, 2020 6 commits
    • Jianlin Lv's avatar
      bpf: Fix compilation warning of selftests · 929e54a9
      Jianlin Lv authored
      Clang compiler version: 12.0.0
      The following warning appears during the selftests/bpf compilation:
      
      prog_tests/send_signal.c:51:3: warning: ignoring return value of ‘write’,
      declared with attribute warn_unused_result [-Wunused-result]
         51 |   write(pipe_c2p[1], buf, 1);
            |   ^~~~~~~~~~~~~~~~~~~~~~~~~~
      prog_tests/send_signal.c:54:3: warning: ignoring return value of ‘read’,
      declared with attribute warn_unused_result [-Wunused-result]
         54 |   read(pipe_p2c[0], buf, 1);
            |   ^~~~~~~~~~~~~~~~~~~~~~~~~
      ......
      
      prog_tests/stacktrace_build_id_nmi.c:13:2: warning: ignoring return value
      of ‘fscanf’,declared with attribute warn_unused_result [-Wunused-resul]
         13 |  fscanf(f, "%llu", &sample_freq);
            |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      test_tcpnotify_user.c:133:2: warning:ignoring return value of ‘system’,
      declared with attribute warn_unused_result [-Wunused-result]
        133 |  system(test_script);
            |  ^~~~~~~~~~~~~~~~~~~
      test_tcpnotify_user.c:138:2: warning:ignoring return value of ‘system’,
      declared with attribute warn_unused_result [-Wunused-result]
        138 |  system(test_script);
            |  ^~~~~~~~~~~~~~~~~~~
      test_tcpnotify_user.c:143:2: warning:ignoring return value of ‘system’,
      declared with attribute warn_unused_result [-Wunused-result]
        143 |  system(test_script);
            |  ^~~~~~~~~~~~~~~~~~~
      
      Add code that fix compilation warning about ignoring return value and
      handles any errors; Check return value of library`s API make the code
      more secure.
      Signed-off-by: default avatarJianlin Lv <Jianlin.Lv@arm.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200806104224.95306-1-Jianlin.Lv@arm.com
      929e54a9
    • Jiri Benc's avatar
      selftests: bpf: Switch off timeout · 6fc5916c
      Jiri Benc authored
      Several bpf tests are interrupted by the default timeout of 45 seconds added
      by commit 852c8cbf ("selftests/kselftest/runner.sh: Add 45 second
      timeout per test"). In my case it was test_progs, test_tunnel.sh,
      test_lwt_ip_encap.sh and test_xdping.sh.
      
      There's not much value in having a timeout for bpf tests, switch it off.
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/7a9198ed10917f4ecab4a3dd74bcda1200791efd.1596739059.git.jbenc@redhat.com
      6fc5916c
    • Stanislav Fomichev's avatar
      bpf: Remove inline from bpf_do_trace_printk · 0d360d64
      Stanislav Fomichev authored
      I get the following error during compilation on my side:
      kernel/trace/bpf_trace.c: In function 'bpf_do_trace_printk':
      kernel/trace/bpf_trace.c:386:34: error: function 'bpf_do_trace_printk' can never be inlined because it uses variable argument lists
       static inline __printf(1, 0) int bpf_do_trace_printk(const char *fmt, ...)
                                        ^
      
      Fixes: ac5a72ea ("bpf: Use dedicated bpf_trace_printk event instead of trace_printk()")
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200806182612.1390883-1-sdf@google.com
      0d360d64
    • Stanislav Fomichev's avatar
      bpf: Add missing return to resolve_btfids · d48556f4
      Stanislav Fomichev authored
      int sets_patch(struct object *obj) doesn't have a 'return 0' at the end.
      
      Fixes: fbbb68de ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object")
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200806155225.637202-1-sdf@google.com
      d48556f4
    • Daniel T. Lee's avatar
      libbf: Fix uninitialized pointer at btf__parse_raw() · 932ac54a
      Daniel T. Lee authored
      Recently, from commit 94a1fedd ("libbpf: Add btf__parse_raw() and
      generic btf__parse() APIs"), new API has been added to libbpf that
      allows to parse BTF from raw data file (btf__parse_raw()).
      
      The commit derives build failure of samples/bpf due to improper access
      of uninitialized pointer at btf_parse_raw().
      
          btf.c: In function btf__parse_raw:
          btf.c:625:28: error: btf may be used uninitialized in this function
            625 |  return err ? ERR_PTR(err) : btf;
                |         ~~~~~~~~~~~~~~~~~~~^~~~~
      
      This commit fixes the build failure of samples/bpf by adding code of
      initializing btf pointer as NULL.
      
      Fixes: 94a1fedd ("libbpf: Add btf__parse_raw() and generic btf__parse() APIs")
      Signed-off-by: default avatarDaniel T. Lee <danieltimlee@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200805223359.32109-1-danieltimlee@gmail.com
      932ac54a
    • Alexei Starovoitov's avatar
      Merge branch 'bpf_iter-uapi-fix' · 0ac10dc1
      Alexei Starovoitov authored
      Yonghong Song says:
      
      ====================
      Andrii raised a concern that current uapi for bpf iterator map
      element is a little restrictive and not suitable for future potential
      complex customization. This is a valid suggestion, considering people
      may indeed add more complex custimization to the iterator, e.g.,
      cgroup_id + user_id, etc. for task or task_file. Another example might
      be map_id plus additional control so that the bpf iterator may bail
      out a bucket earlier if a bucket has too many elements which may hold
      lock too long and impact other parts of systems.
      
      Patch #1 modified uapi with kernel changes. Patch #2
      adjusted libbpf api accordingly.
      
      Changelogs:
        v3 -> v4:
          . add a forward declaration of bpf_iter_link_info in
            tools/lib/bpf/bpf.h in case that libbpf is built against
            not-latest uapi bpf.h.
          . target the patch set to "bpf" instead of "bpf-next"
        v2 -> v3:
          . undo "not reject iter_info.map.map_fd == 0" from v1.
            In the future map_fd may become optional, so let us use map_fd == 0
            indicating the map_fd is not set by user space.
          . add link_info_len to bpf_iter_attach_opts to ensure always correct
            link_info_len from user. Otherwise, libbpf may deduce incorrect
            link_info_len if it uses different uapi header than the user app.
        v1 -> v2:
          . ensure link_create target_fd/flags == 0 since they are not used. (Andrii)
          . if either of iter_info ptr == 0 or iter_info_len == 0, but not both,
            return error to user space. (Andrii)
          . do not reject iter_info.map.map_fd == 0, go ahead to use it trying to
            get a map reference since the map_fd is required for map_elem iterator.
          . use bpf_iter_link_info in bpf_iter_attach_opts instead of map_fd.
            this way, user space is responsible to set up bpf_iter_link_info and
            libbpf just passes the data to the kernel, simplifying libbpf design.
            (Andrii)
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0ac10dc1