1. 01 Jun, 2019 2 commits
    • Alan Maguire's avatar
      selftests/bpf: measure RTT from xdp using xdping · cd538502
      Alan Maguire authored
      xdping allows us to get latency estimates from XDP.  Output looks
      like this:
      
      ./xdping -I eth4 192.168.55.8
      Setting up XDP for eth4, please wait...
      XDP setup disrupts network connectivity, hit Ctrl+C to quit
      
      Normal ping RTT data
      [Ignore final RTT; it is distorted by XDP using the reply]
      PING 192.168.55.8 (192.168.55.8) from 192.168.55.7 eth4: 56(84) bytes of data.
      64 bytes from 192.168.55.8: icmp_seq=1 ttl=64 time=0.302 ms
      64 bytes from 192.168.55.8: icmp_seq=2 ttl=64 time=0.208 ms
      64 bytes from 192.168.55.8: icmp_seq=3 ttl=64 time=0.163 ms
      64 bytes from 192.168.55.8: icmp_seq=8 ttl=64 time=0.275 ms
      
      4 packets transmitted, 4 received, 0% packet loss, time 3079ms
      rtt min/avg/max/mdev = 0.163/0.237/0.302/0.054 ms
      
      XDP RTT data:
      64 bytes from 192.168.55.8: icmp_seq=5 ttl=64 time=0.02808 ms
      64 bytes from 192.168.55.8: icmp_seq=6 ttl=64 time=0.02804 ms
      64 bytes from 192.168.55.8: icmp_seq=7 ttl=64 time=0.02815 ms
      64 bytes from 192.168.55.8: icmp_seq=8 ttl=64 time=0.02805 ms
      
      The xdping program loads the associated xdping_kern.o BPF program
      and attaches it to the specified interface.  If run in client
      mode (the default), it will add a map entry keyed by the
      target IP address; this map will store RTT measurements, current
      sequence number etc.  Finally in client mode the ping command
      is executed, and the xdping BPF program will use the last ICMP
      reply, reformulate it as an ICMP request with the next sequence
      number and XDP_TX it.  After the reply to that request is received
      we can measure RTT and repeat until the desired number of
      measurements is made.  This is why the sequence numbers in the
      normal ping are 1, 2, 3 and 8.  We XDP_TX a modified version
      of ICMP reply 4 and keep doing this until we get the 4 replies
      we need; hence the networking stack only sees reply 8, where
      we have XDP_PASSed it upstream since we are done.
      
      In server mode (-s), xdping simply takes ICMP requests and replies
      to them in XDP rather than passing the request up to the networking
      stack.  No map entry is required.
      
      xdping can be run in native XDP mode (the default, or specified
      via -N) or in skb mode (-S).
      
      A test program test_xdping.sh exercises some of these options.
      
      Note that native XDP does not seem to XDP_TX for veths, hence -N
      is not tested.  Looking at the code, it looks like XDP_TX is
      supported so I'm not sure if that's expected.  Running xdping in
      native mode for ixgbe as both client and server works fine.
      
      Changes since v4
      
      - close fds on cleanup (Song Liu)
      
      Changes since v3
      
      - fixed seq to be __be16 (Song Liu)
      - fixed fd checks in xdping.c (Song Liu)
      
      Changes since v2
      
      - updated commit message to explain why seq number of last
        ICMP reply is 8 not 4 (Song Liu)
      - updated types of seq number, raddr and eliminated csum variable
        in xdpclient/xdpserver functions as it was not needed (Song Liu)
      - added XDPING_DEFAULT_COUNT definition and usage specification of
        default/max counts (Song Liu)
      
      Changes since v1
       - moved from RFC to PATCH
       - removed unused variable in ipv4_csum() (Song Liu)
       - refactored ICMP checks into icmp_check() function called by client
         and server programs and reworked client and server programs due
         to lack of shared code (Song Liu)
       - added checks to ensure that SKB and native mode are not requested
         together (Song Liu)
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      cd538502
    • Jiong Wang's avatar
      bpf: doc: update answer for 32-bit subregister question · c231c22a
      Jiong Wang authored
      There has been quite a few progress around the two steps mentioned in the
      answer to the following question:
      
        Q: BPF 32-bit subregister requirements
      
      This patch updates the answer to reflect what has been done.
      
      v2:
       - Add missing full stop. (Song Liu)
       - Minor tweak on one sentence. (Song Liu)
      
      v1:
       - Integrated rephrase from Quentin and Jakub
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c231c22a
  2. 31 May, 2019 13 commits
    • Alexei Starovoitov's avatar
      Merge branch 'map-charge-cleanup' · d168286d
      Alexei Starovoitov authored
      Roman Gushchin says:
      
      ====================
      During my work on memcg-based memory accounting for bpf maps
      I've done some cleanups and refactorings of the existing
      memlock rlimit-based code. It makes it more robust, unifies
      size to pages conversion, size checks and corresponding error
      codes. Also it adds coverage for cgroup local storage and
      socket local storage maps.
      
      It looks like some preliminary work on the mm side might be
      required to start working on the memcg-based accounting,
      so I'm sending these patches as a separate patchset.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d168286d
    • Roman Gushchin's avatar
      bpf: move memory size checks to bpf_map_charge_init() · c85d6913
      Roman Gushchin authored
      Most bpf map types doing similar checks and bytes to pages
      conversion during memory allocation and charging.
      
      Let's unify these checks by moving them into bpf_map_charge_init().
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c85d6913
    • Roman Gushchin's avatar
      bpf: rework memlock-based memory accounting for maps · b936ca64
      Roman Gushchin authored
      In order to unify the existing memlock charging code with the
      memcg-based memory accounting, which will be added later, let's
      rework the current scheme.
      
      Currently the following design is used:
        1) .alloc() callback optionally checks if the allocation will likely
           succeed using bpf_map_precharge_memlock()
        2) .alloc() performs actual allocations
        3) .alloc() callback calculates map cost and sets map.memory.pages
        4) map_create() calls bpf_map_init_memlock() which sets map.memory.user
           and performs actual charging; in case of failure the map is
           destroyed
        <map is in use>
        1) bpf_map_free_deferred() calls bpf_map_release_memlock(), which
           performs uncharge and releases the user
        2) .map_free() callback releases the memory
      
      The scheme can be simplified and made more robust:
        1) .alloc() calculates map cost and calls bpf_map_charge_init()
        2) bpf_map_charge_init() sets map.memory.user and performs actual
          charge
        3) .alloc() performs actual allocations
        <map is in use>
        1) .map_free() callback releases the memory
        2) bpf_map_charge_finish() performs uncharge and releases the user
      
      The new scheme also allows to reuse bpf_map_charge_init()/finish()
      functions for memcg-based accounting. Because charges are performed
      before actual allocations and uncharges after freeing the memory,
      no bogus memory pressure can be created.
      
      In cases when the map structure is not available (e.g. it's not
      created yet, or is already destroyed), on-stack bpf_map_memory
      structure is used. The charge can be transferred with the
      bpf_map_charge_move() function.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b936ca64
    • Roman Gushchin's avatar
      bpf: group memory related fields in struct bpf_map_memory · 3539b96e
      Roman Gushchin authored
      Group "user" and "pages" fields of bpf_map into the bpf_map_memory
      structure. Later it can be extended with "memcg" and other related
      information.
      
      The main reason for a such change (beside cosmetics) is to pass
      bpf_map_memory structure to charging functions before the actual
      allocation of bpf_map.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3539b96e
    • Roman Gushchin's avatar
      bpf: add memlock precharge for socket local storage · d50836cd
      Roman Gushchin authored
      Socket local storage maps lack the memlock precharge check,
      which is performed before the memory allocation for
      most other bpf map types.
      
      Let's add it in order to unify all map types.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d50836cd
    • Roman Gushchin's avatar
      bpf: add memlock precharge check for cgroup_local_storage · ffc8b144
      Roman Gushchin authored
      Cgroup local storage maps lack the memlock precharge check,
      which is performed before the memory allocation for
      most other bpf map types.
      
      Let's add it in order to unify all map types.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ffc8b144
    • Alexei Starovoitov's avatar
      Merge branch 'propagate-cn-to-tcp' · 576240cf
      Alexei Starovoitov authored
      Lawrence Brakmo says:
      
      ====================
      This patchset adds support for propagating congestion notifications (cn)
      to TCP from cgroup inet skb egress BPF programs.
      
      Current cgroup skb BPF programs cannot trigger TCP congestion window
      reductions, even when they drop a packet. This patch-set adds support
      for cgroup skb BPF programs to send congestion notifications in the
      return value when the packets are TCP packets. Rather than the
      current 1 for keeping the packet and 0 for dropping it, they can
      now return:
          NET_XMIT_SUCCESS    (0)    - continue with packet output
          NET_XMIT_DROP       (1)    - drop packet and do cn
          NET_XMIT_CN         (2)    - continue with packet output and do cn
          -EPERM                     - drop packet
      
      Finally, HBM programs are modified to collect and return more
      statistics.
      
      There has been some discussion regarding the best place to manage
      bandwidths. Some believe this should be done in the qdisc where it can
      also be managed with a BPF program. We believe there are advantages
      for doing it with a BPF program in the cgroup/skb callback. For example,
      it reduces overheads in the cases where there is on primary workload and
      one or more secondary workloads, where each workload is running on its
      own cgroupv2. In this scenario, we only need to throttle the secondary
      workloads and there is no overhead for the primary workload since there
      will be no BPF program attached to its cgroup.
      
      Regardless, we agree that this mechanism should not penalize those that
      are not using it. We tested this by doing 1 byte req/reply RPCs over
      loopback. Each test consists of 30 sec of back-to-back 1 byte RPCs.
      Each test was repeated 50 times with a 1 minute delay between each set
      of 10. We then calculated the average RPCs/sec over the 50 tests. We
      compare upstream with upstream + patchset and no BPF program as well
      as upstream + patchset and a BPF program that just returns ALLOW_PKT.
      Here are the results:
      
      upstream                           80937 RPCs/sec
      upstream + patches, no BPF program 80894 RPCs/sec
      upstream + patches, BPF program    80634 RPCs/sec
      
      These numbers indicate that there is no penalty for these patches
      
      The use of congestion notifications improves the performance of HBM when
      using Cubic. Without congestion notifications, Cubic will not decrease its
      cwnd and HBM will need to drop a large percentage of the packets.
      
      The following results are obtained for rate limits of 1Gbps,
      between two servers using netperf, and only one flow. We also show how
      reducing the max delayed ACK timer can improve the performance when
      using Cubic.
      
      Command used was:
        ./do_hbm_test.sh -l -D --stats -N -r=<rate> [--no_cn] [dctcp] \
                         -s=<server running netserver>
        where:
           <rate>   is 1000
           --no_cn  specifies no cwr notifications
           dctcp    uses dctcp
      
                             Cubic                    DCTCP
      Lim, DA      Mbps cwnd cred drops  Mbps cwnd cred drops
      --------     ---- ---- ---- -----  ---- ---- ---- -----
        1G, 40       35  462 -320 67%     995    1 -212  0.05%
        1G, 40,cn   736    9  -78  0.07   995    1 -212  0.05
        1G,  5,cn   941    2 -189  0.13   995    1 -212  0.05
      
      Notes:
        --no_cn has no effect with DCTCP
        Lim = rate limit
        DA = maximum delay ack timer
        cred = credit in packets
        drops = % packets dropped
      
      v1->v2: Insures that only BPF_CGROUP_INET_EGRESS can return values 2 and 3
              New egress values apply to all protocols, not just TCP
              Cleaned up patch 4, Update BPF_CGROUP_RUN_PROG_INET_EGRESS callers
              Removed changes to __tcp_transmit_skb (patch 5), no longer needed
              Removed sample use of EDT
      v2->v3: Removed the probe timer related changes
      v3->v4: Replaced preempt_enable_no_resched() by preempt_enable()
              in BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY() macro
      ====================
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      576240cf
    • brakmo's avatar
      bpf: Add more stats to HBM · d58c6f72
      brakmo authored
      Adds more stats to HBM, including average cwnd and rtt of all TCP
      flows, percents of packets that are ecn ce marked and distribution
      of return values.
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d58c6f72
    • brakmo's avatar
      bpf: Add cn support to hbm_out_kern.c · ffd81558
      brakmo authored
      Update hbm_out_kern.c to support returning cn notifications.
      Also updates relevant files to allow disabling cn notifications.
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ffd81558
    • brakmo's avatar
      bpf: Update BPF_CGROUP_RUN_PROG_INET_EGRESS calls · 956fe219
      brakmo authored
      Update BPF_CGROUP_RUN_PROG_INET_EGRESS() callers to support returning
      congestion notifications from the BPF programs.
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      956fe219
    • brakmo's avatar
      bpf: Update __cgroup_bpf_run_filter_skb with cn · e7a3160d
      brakmo authored
      For egress packets, __cgroup_bpf_fun_filter_skb() will now call
      BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY() instead of PROG_CGROUP_RUN_ARRAY()
      in order to propagate congestion notifications (cn) requests to TCP
      callers.
      
      For egress packets, this function can return:
         NET_XMIT_SUCCESS    (0)    - continue with packet output
         NET_XMIT_DROP       (1)    - drop packet and notify TCP to call cwr
         NET_XMIT_CN         (2)    - continue with packet output and notify TCP
                                      to call cwr
         -EPERM                     - drop packet
      
      For ingress packets, this function will return -EPERM if any attached
      program was found and if it returned != 1 during execution. Otherwise 0
      is returned.
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e7a3160d
    • brakmo's avatar
      bpf: cgroup inet skb programs can return 0 to 3 · 5cf1e914
      brakmo authored
      Allows cgroup inet skb programs to return values in the range [0, 3].
      The second bit is used to deterine if congestion occurred and higher
      level protocol should decrease rate. E.g. TCP would call tcp_enter_cwr()
      
      The bpf_prog must set expected_attach_type to BPF_CGROUP_INET_EGRESS
      at load time if it uses the new return values (i.e. 2 or 3).
      
      The expected_attach_type is currently not enforced for
      BPF_PROG_TYPE_CGROUP_SKB.  e.g Meaning the current bpf_prog with
      expected_attach_type setting to BPF_CGROUP_INET_EGRESS can attach to
      BPF_CGROUP_INET_INGRESS.  Blindly enforcing expected_attach_type will
      break backward compatibility.
      
      This patch adds a enforce_expected_attach_type bit to only
      enforce the expected_attach_type when it uses the new
      return value.
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5cf1e914
    • brakmo's avatar
      bpf: Create BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY · 1f52f6c0
      brakmo authored
      Create new macro BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY() to be used by
      __cgroup_bpf_run_filter_skb for EGRESS BPF progs so BPF programs can
      request cwr for TCP packets.
      
      Current cgroup skb programs can only return 0 or 1 (0 to drop the
      packet. This macro changes the behavior so the low order bit
      indicates whether the packet should be dropped (0) or not (1)
      and the next bit is used for congestion notification (cn).
      
      Hence, new allowed return values of CGROUP EGRESS BPF programs are:
        0: drop packet
        1: keep packet
        2: drop packet and call cwr
        3: keep packet and call cwr
      
      This macro then converts it to one of NET_XMIT values or -EPERM
      that has the effect of dropping the packet with no cn.
        0: NET_XMIT_SUCCESS  skb should be transmitted (no cn)
        1: NET_XMIT_DROP     skb should be dropped and cwr called
        2: NET_XMIT_CN       skb should be transmitted and cwr called
        3: -EPERM            skb should be dropped (no cn)
      
      Note that when more than one BPF program is called, the packet is
      dropped if at least one of programs requests it be dropped, and
      there is cn if at least one program returns cn.
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1f52f6c0
  3. 29 May, 2019 15 commits
  4. 28 May, 2019 10 commits
    • Alexei Starovoitov's avatar
      Merge branch 'cgroup-auto-detach' · d0a3a4b2
      Alexei Starovoitov authored
      Roman Gushchin says:
      
      ====================
      This patchset implements a cgroup bpf auto-detachment functionality:
      bpf programs are detached as soon as possible after removal of the
      cgroup, without waiting for the release of all associated resources.
      
      Patches 2 and 3 are required to implement a corresponding kselftest
      in patch 4.
      
      v5:
        1) rebase
      
      v4:
        1) release cgroup bpf data using a workqueue
        2) add test_cgroup_attach to .gitignore
      
      v3:
        1) some minor changes and typo fixes
      
      v2:
        1) removed a bogus check in patch 4
        2) moved buf[len] = 0 in patch 2
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d0a3a4b2
    • Roman Gushchin's avatar
      selftests/bpf: add auto-detach test · d5506591
      Roman Gushchin authored
      Add a kselftest to cover bpf auto-detachment functionality.
      The test creates a cgroup, associates some resources with it,
      attaches a couple of bpf programs and deletes the cgroup.
      
      Then it checks that bpf programs are going away in 5 seconds.
      
      Expected output:
        $ ./test_cgroup_attach
        #override:PASS
        #multi:PASS
        #autodetach:PASS
        test_cgroup_attach:PASS
      
      On a kernel without auto-detaching:
        $ ./test_cgroup_attach
        #override:PASS
        #multi:PASS
        #autodetach:FAIL
        test_cgroup_attach:FAIL
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d5506591
    • Roman Gushchin's avatar
      selftests/bpf: enable all available cgroup v2 controllers · 596092ef
      Roman Gushchin authored
      Enable all available cgroup v2 controllers when setting up
      the environment for the bpf kselftests. It's required to properly test
      the bpf prog auto-detach feature. Also it will generally increase
      the code coverage.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      596092ef
    • Roman Gushchin's avatar
      selftests/bpf: convert test_cgrp2_attach2 example into kselftest · ba0c0cc0
      Roman Gushchin authored
      Convert test_cgrp2_attach2 example into a proper test_cgroup_attach
      kselftest. It's better because we do run kselftest on a constant
      basis, so there are better chances to spot a potential regression.
      
      Also make it slightly less verbose to conform kselftests output style.
      
      Output example:
        $ ./test_cgroup_attach
        #override:PASS
        #multi:PASS
        test_cgroup_attach:PASS
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ba0c0cc0
    • Roman Gushchin's avatar
      bpf: decouple the lifetime of cgroup_bpf from cgroup itself · 4bfc0bb2
      Roman Gushchin authored
      Currently the lifetime of bpf programs attached to a cgroup is bound
      to the lifetime of the cgroup itself. It means that if a user
      forgets (or intentionally avoids) to detach a bpf program before
      removing the cgroup, it will stay attached up to the release of the
      cgroup. Since the cgroup can stay in the dying state (the state
      between being rmdir()'ed and being released) for a very long time, it
      leads to a waste of memory. Also, it blocks a possibility to implement
      the memcg-based memory accounting for bpf objects, because a circular
      reference dependency will occur. Charged memory pages are pinning the
      corresponding memory cgroup, and if the memory cgroup is pinning
      the attached bpf program, nothing will be ever released.
      
      A dying cgroup can not contain any processes, so the only chance for
      an attached bpf program to be executed is a live socket associated
      with the cgroup. So in order to release all bpf data early, let's
      count associated sockets using a new percpu refcounter. On cgroup
      removal the counter is transitioned to the atomic mode, and as soon
      as it reaches 0, all bpf programs are detached.
      
      Because cgroup_bpf_release() can block, it can't be called from
      the percpu ref counter callback directly, so instead an asynchronous
      work is scheduled.
      
      The reference counter is not socket specific, and can be used for any
      other types of programs, which can be executed from a cgroup-bpf hook
      outside of the process context, had such a need arise in the future.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: jolsa@redhat.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4bfc0bb2
    • Daniel T. Lee's avatar
      samples/bpf: fix a couple of style issues in bpf_load · 37b54aed
      Daniel T. Lee authored
      This commit fixes a few style problems in samples/bpf/bpf_load.c:
      
       - Magic string use of 'DEBUGFS'
       - Useless zero initialization of a global variable
       - Minor style fix with whitespace
      Signed-off-by: default avatarDaniel T. Lee <danieltimlee@gmail.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      37b54aed
    • Stanislav Fomichev's avatar
      selftests/bpf: fail test_tunnel.sh if subtests fail · 486d3f22
      Stanislav Fomichev authored
      Right now test_tunnel.sh always exits with success even if some
      of the subtests fail. Since the output is very verbose, it's
      hard to spot the issues with subtests. Let's fail the script
      if any subtest fails.
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      486d3f22
    • Daniel Borkmann's avatar
      Merge branch 'bpf-bpftool-dbg-output' · 463910a5
      Daniel Borkmann authored
      Quentin Monnet says:
      
      ====================
      This series adds an option to bpftool to make it print additional
      information via libbpf and the kernel verifier when attempting to load
      programs.
      
      A new API function is added to libbpf in order to pass the log_level
      from bpftool with the bpf_object__* part of the API.
      
      Options for a finer control over the log levels to use for libbpf and
      the verifier could be added in the future, if desired.
      
      v3:
      - Fix and clarify commit logs.
      
      v2:
      - Do not add distinct options for libbpf and verifier logs, just keep the
        one that sets all log levels to their maximum. Rename the option.
      - Do not offer a way to pick desired log levels. The choice is "use the
        option to print all logs" or "stick with the defaults".
      - Do not export BPF_LOG_* flags to user header.
      - Update all man pages (most bpftool operations use libbpf and may print
        libbpf logs). Verifier logs are only used when attempting to load
        programs for now, so bpftool-prog.rst and bpftool.rst remain the only
        pages updated in that regard.
      
      Previous discussion available at:
      
      https://lore.kernel.org/bpf/20190523105426.3938-1-quentin.monnet@netronome.com/
      https://lore.kernel.org/bpf/20190429095227.9745-1-quentin.monnet@netronome.com/
      ====================
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      463910a5
    • Quentin Monnet's avatar
      tools: bpftool: make -d option print debug output from verifier · 55d77807
      Quentin Monnet authored
      The "-d" option is used to require all logs available for bpftool. So
      far it meant telling libbpf to print even debug-level information. But
      there is another source of info that can be made more verbose: when we
      attemt to load programs with bpftool, we can pass a log_level parameter
      to the verifier in order to control the amount of information that is
      printed to the console.
      
      Reuse the "-d" option to print all information the verifier can tell. At
      this time, this means logs related to BPF_LOG_LEVEL1, BPF_LOG_LEVEL2 and
      BPF_LOG_STATS. As mentioned in the discussion on the first version of
      this set, these macros are internal to the kernel
      (include/linux/bpf_verifier.h) and are not meant to be part of the
      stable user API, therefore we simply use the related constants to print
      whatever we can at this time, without trying to tell users what is
      log_level1 or what is statistics.
      
      Verifier logs are only used when loading programs for now (In the
      future: for loading BTF objects with bpftool? Although libbpf does not
      currently offer to print verifier info at debug level if no error
      occurred when loading BTF objects), so bpftool.rst and bpftool-prog.rst
      are the only man pages to get the update.
      
      v3:
      - Add details on log level and BTF loading at the end of commit log.
      
      v2:
      - Remove the possibility to select the log levels to use (v1 offered a
        combination of "log_level1", "log_level2" and "stats").
      - The macros from kernel header bpf_verifier.h are not used (and
        therefore not moved to UAPI header).
      - In v1 this was a distinct option, but is now merged in the only "-d"
        switch to activate libbpf and verifier debug-level logs all at the
        same time.
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      55d77807
    • Quentin Monnet's avatar
      libbpf: add bpf_object__load_xattr() API function to pass log_level · 60276f98
      Quentin Monnet authored
      libbpf was recently made aware of the log_level attribute for programs,
      used to specify the level of information expected to be dumped by the
      verifier. Function bpf_prog_load_xattr() got support for this log_level
      parameter.
      
      But some applications using libbpf rely on another function to load
      programs, bpf_object__load(), which does accept any parameter for log
      level. Create an API function based on bpf_object__load(), but accepting
      an "attr" object as a parameter. Then add a log_level field to that
      object, so that applications calling the new bpf_object__load_xattr()
      can pick the desired log level.
      
      v3:
      - Rewrite commit log.
      
      v2:
      - We are in a new cycle, bump libbpf extraversion number.
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      60276f98