1. 12 Dec, 2021 12 commits
    • Vladimir Oltean's avatar
      net: dsa: tag_sja1105: split sja1105_tagger_data into private and public sections · 950a419d
      Vladimir Oltean authored
      The sja1105 driver messes with the tagging protocol's state when PTP RX
      timestamping is enabled/disabled. This is fundamentally necessary
      because the tagger needs to know what to do when it receives a PTP
      packet. If RX timestamping is enabled, then a metadata follow-up frame
      is expected, and this holds the (partial) timestamp. So the tagger plays
      hide-and-seek with the network stack until it also gets the metadata
      frame, and then presents a single packet, the timestamped PTP packet.
      But when RX timestamping isn't enabled, there is no metadata frame
      expected, so the hide-and-seek game must be turned off and the packet
      must be delivered right away to the network stack.
      
      Considering this, we create a pseudo isolation by devising two tagger
      methods callable by the switch: one to get the RX timestamping state,
      and one to set it. Since we can't export symbols between the tagger and
      the switch driver, these methods are exposed through function pointers.
      
      After this change, the public portion of the sja1105_tagger_data
      contains only function pointers.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      950a419d
    • Vladimir Oltean's avatar
      Revert "net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver" · fcbf979a
      Vladimir Oltean authored
      This reverts commit 6d709cad.
      
      The above change was done to avoid calling symbols exported by the
      switch driver from the tagging protocol driver.
      
      With the tagger-owned storage model, we have a new option on our hands,
      and that is for the switch driver to provide a data consumer handler in
      the form of a function pointer inside the ->connect_tag_protocol()
      method. Having a function pointer avoids the problems of the exported
      symbols approach.
      
      By creating a handler for metadata frames holding TX timestamps on
      SJA1110, we are able to eliminate an skb queue from the tagger data, and
      replace it with a simple, and stateless, function pointer. This skb
      queue is now handled exclusively by sja1105_ptp.c, which makes the code
      easier to follow, as it used to be before the reverted patch.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fcbf979a
    • Vladimir Oltean's avatar
      net: dsa: tag_sja1105: convert to tagger-owned data · c79e8486
      Vladimir Oltean authored
      Currently, struct sja1105_tagger_data is a part of struct
      sja1105_private, and is used by the sja1105 driver to populate dp->priv.
      
      With the movement towards tagger-owned storage, the sja1105 driver
      should not be the owner of this memory.
      
      This change implements the connection between the sja1105 switch driver
      and its tagging protocol, which means that sja1105_tagger_data no longer
      stays in dp->priv but in ds->tagger_data, and that the sja1105 driver
      now only populates the sja1105_port_deferred_xmit callback pointer.
      The kthread worker is now the responsibility of the tagger.
      
      The sja1105 driver also alters the tagger's state some more, especially
      with regard to the PTP RX timestamping state. This will be fixed up a
      bit in further changes.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c79e8486
    • Vladimir Oltean's avatar
      net: dsa: sja1105: move ts_id from sja1105_tagger_data · 22ee9f8e
      Vladimir Oltean authored
      The TX timestamp ID is incremented by the SJA1110 PTP timestamping
      callback (->port_tx_timestamp) for every packet, when cloning it.
      It isn't used by the tagger at all, even though it sits inside the
      struct sja1105_tagger_data.
      
      Also, serialization to this structure is currently done through
      tagger_data->meta_lock, which is a cheap hack because the meta_lock
      isn't used for anything else on SJA1110 (sja1105_rcv_meta_state_machine
      isn't called).
      
      This change moves ts_id from sja1105_tagger_data to sja1105_private and
      introduces a dedicated spinlock for it, also in sja1105_private.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      22ee9f8e
    • Vladimir Oltean's avatar
      net: dsa: sja1105: make dp->priv point directly to sja1105_tagger_data · bfcf1425
      Vladimir Oltean authored
      The design of the sja1105 tagger dp->priv is that each port has a
      separate struct sja1105_port, and the sp->data pointer points to a
      common struct sja1105_tagger_data.
      
      We have removed all per-port members accessible by the tagger, and now
      only struct sja1105_tagger_data remains. Make dp->priv point directly to
      this.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bfcf1425
    • Vladimir Oltean's avatar
      net: dsa: sja1105: remove hwts_tx_en from tagger data · 6f6770ab
      Vladimir Oltean authored
      This tagger property is in fact not used at all by the tagger, only by
      the switch driver. Therefore it makes sense to be moved to
      sja1105_private.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f6770ab
    • Vladimir Oltean's avatar
      net: dsa: sja1105: bring deferred xmit implementation in line with ocelot-8021q · d38049bb
      Vladimir Oltean authored
      When the ocelot-8021q driver was converted to deferred xmit as part of
      commit 8d5f7954 ("net: dsa: felix: break at first CPU port during
      init and teardown"), the deferred implementation was deliberately made
      subtly different from what sja1105 has.
      
      The implementation differences lied on the following observations:
      
      - There might be a race between these two lines in tag_sja1105.c:
      
             skb_queue_tail(&sp->xmit_queue, skb_get(skb));
             kthread_queue_work(sp->xmit_worker, &sp->xmit_work);
      
        and the skb dequeue logic in sja1105_port_deferred_xmit(). For
        example, the xmit_work might be already queued, however the work item
        has just finished walking through the skb queue. Because we don't
        check the return code from kthread_queue_work, we don't do anything if
        the work item is already queued.
      
        However, nobody will take that skb and send it, at least until the
        next timestampable skb is sent. This creates additional (and
        avoidable) TX timestamping latency.
      
        To close that race, what the ocelot-8021q driver does is it doesn't
        keep a single work item per port, and a skb timestamping queue, but
        rather dynamically allocates a work item per packet.
      
      - It is also unnecessary to have more than one kthread that does the
        work. So delete the per-port kthread allocations and replace them with
        a single kthread which is global to the switch.
      
      This change brings the two implementations in line by applying those
      observations to the sja1105 driver as well.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d38049bb
    • Vladimir Oltean's avatar
      net: dsa: sja1105: let deferred packets time out when sent to ports going down · a3d74295
      Vladimir Oltean authored
      This code is not necessary and complicates the conversion of this driver
      to tagger-owned memory. If there is a PTP packet that is sent
      concurrently with the port getting disabled, the deferred xmit mechanism
      is robust enough to time out when it sees that it hasn't been delivered,
      and recovers.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3d74295
    • Vladimir Oltean's avatar
      net: dsa: tag_ocelot: convert to tagger-owned data · 35d97680
      Vladimir Oltean authored
      The felix driver makes very light use of dp->priv, and the tagger is
      effectively stateless. dp->priv is practically only needed to set up a
      callback to perform deferred xmit of PTP and STP packets using the
      ocelot-8021q tagging protocol (the main ocelot tagging protocol makes no
      use of dp->priv, although this driver sets up dp->priv irrespective of
      actual tagging protocol in use).
      
      struct felix_port (what used to be pointed to by dp->priv) is removed
      and replaced with a two-sided structure. The public side of this
      structure, visible to the switch driver, is ocelot_8021q_tagger_data.
      The private side is ocelot_8021q_tagger_private, and the latter
      structure physically encapsulates the former. The public half of the
      tagger data structure can be accessed through a helper of the same name
      (ocelot_8021q_tagger_data) which also sanity-checks the protocol
      currently in use by the switch. The public/private split was requested
      by Andrew Lunn.
      Suggested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35d97680
    • Vladimir Oltean's avatar
      net: dsa: introduce tagger-owned storage for private and shared data · dc452a47
      Vladimir Oltean authored
      Ansuel is working on register access over Ethernet for the qca8k switch
      family. This requires the qca8k tagging protocol driver to receive
      frames which aren't intended for the network stack, but instead for the
      qca8k switch driver itself.
      
      The dp->priv is currently the prevailing method for passing data back
      and forth between the tagging protocol driver and the switch driver.
      However, this method is riddled with caveats.
      
      The DSA design allows in principle for any switch driver to return any
      protocol it desires in ->get_tag_protocol(). The dsa_loop driver can be
      modified to do just that. But in the current design, the memory behind
      dp->priv has to be allocated by the switch driver, so if the tagging
      protocol is paired to an unexpected switch driver, we may end up in NULL
      pointer dereferences inside the kernel, or worse (a switch driver may
      allocate dp->priv according to the expectations of a different tagger).
      
      The latter possibility is even more plausible considering that DSA
      switches can dynamically change tagging protocols in certain cases
      (dsa <-> edsa, ocelot <-> ocelot-8021q), and the current design lends
      itself to mistakes that are all too easy to make.
      
      This patch proposes that the tagging protocol driver should manage its
      own memory, instead of relying on the switch driver to do so.
      After analyzing the different in-tree needs, it can be observed that the
      required tagger storage is per switch, therefore a ds->tagger_data
      pointer is introduced. In principle, per-port storage could also be
      introduced, although there is no need for it at the moment. Future
      changes will replace the current usage of dp->priv with ds->tagger_data.
      
      We define a "binding" event between the DSA switch tree and the tagging
      protocol. During this binding event, the tagging protocol's ->connect()
      method is called first, and this may allocate some memory for each
      switch of the tree. Then a cross-chip notifier is emitted for the
      switches within that tree, and they are given the opportunity to fix up
      the tagger's memory (for example, they might set up some function
      pointers that represent virtual methods for consuming packets).
      Because the memory is owned by the tagger, there exists a ->disconnect()
      method for the tagger (which is the place to free the resources), but
      there doesn't exist a ->disconnect() method for the switch driver.
      This is part of the design. The switch driver should make minimal use of
      the public part of the tagger data, and only after type-checking it
      using the supplied "proto" argument.
      
      In the code there are in fact two binding events, one is the initial
      event in dsa_switch_setup_tag_protocol(). At this stage, the cross chip
      notifier chains aren't initialized, so we call each switch's connect()
      method by hand. Then there is dsa_tree_bind_tag_proto() during
      dsa_tree_change_tag_proto(), and here we have an old protocol and a new
      one. We first connect to the new one before disconnecting from the old
      one, to simplify error handling a bit and to ensure we remain in a valid
      state at all times.
      Co-developed-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc452a47
    • Tobias Waldekranz's avatar
      net: dsa: mv88e6xxx: Add tx fwd offload PVT on intermediate devices · e0068620
      Tobias Waldekranz authored
      In a typical mv88e6xxx switch tree like this:
      
        CPU
         |    .----.
      .--0--. | .--0--.
      | sw0 | | | sw1 |
      '-1-2-' | '-1-2-'
          '---'
      
      If sw1p{1,2} are added to a bridge that sw0p1 is not a part of, sw0
      still needs to add a crosschip PVT entry for the virtual DSA device
      assigned to represent the bridge.
      
      Fixes: ce5df689 ("net: dsa: mv88e6xxx: map virtual bridges with forwarding offload in the PVT")
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0068620
    • xu xin's avatar
      net: Enable neighbor sysctls that is save for userns root · 8c8b7aa7
      xu xin authored
      Inside netns owned by non-init userns, sysctls about ARP/neighbor is
      currently not visible and configurable.
      
      For the attributes these sysctls correspond to, any modifications make
      effects on the performance of networking(ARP, especilly) only in the
      scope of netns, which does not affect other netns.
      
      Actually, some tools via netlink can modify these attribute. iproute2 is
      an example. see as follows:
      
      $ unshare -ur -n
      $ cat /proc/sys/net/ipv4/neigh/lo/retrans_time
      cat: can't open '/proc/sys/net/ipv4/neigh/lo/retrans_time': No such file
      or directory
      $ ip ntable show dev lo
      inet arp_cache
          dev lo
          refcnt 1 reachable 19494 base_reachable 30000 retrans 1000
          gc_stale 60000 delay_probe 5000 queue 101
          app_probes 0 ucast_probes 3 mcast_probes 3
          anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 1000
      
      inet6 ndisc_cache
          dev lo
          refcnt 1 reachable 42394 base_reachable 30000 retrans 1000
          gc_stale 60000 delay_probe 5000 queue 101
          app_probes 0 ucast_probes 3 mcast_probes 3
          anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 0
      $ ip ntable change name arp_cache dev <if> retrans 2000
      inet arp_cache
          dev lo
          refcnt 1 reachable 22917 base_reachable 30000 retrans 2000
          gc_stale 60000 delay_probe 5000 queue 101
          app_probes 0 ucast_probes 3 mcast_probes 3
          anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 1000
      
      inet6 ndisc_cache
          dev lo
          refcnt 1 reachable 35524 base_reachable 30000 retrans 1000
          gc_stale 60000 delay_probe 5000 queue 101
          app_probes 0 ucast_probes 3 mcast_probes 3
          anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 0
      Reported-by: default avatarZeal Robot <zealci@zte.com.cn>
      Signed-off-by: default avatarxu xin <xu.xin16@zte.com.cn>
      Acked-by: default avatarJoanne Koong <joannekoong@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c8b7aa7
  2. 11 Dec, 2021 13 commits
  3. 10 Dec, 2021 15 commits
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · be315829
      Jakub Kicinski authored
      Andrii Nakryiko says:
      
      ====================
      bpf-next 2021-12-10 v2
      
      We've added 115 non-merge commits during the last 26 day(s) which contain
      a total of 182 files changed, 5747 insertions(+), 2564 deletions(-).
      
      The main changes are:
      
      1) Various samples fixes, from Alexander Lobakin.
      
      2) BPF CO-RE support in kernel and light skeleton, from Alexei Starovoitov.
      
      3) A batch of new unified APIs for libbpf, logging improvements, version
         querying, etc. Also a batch of old deprecations for old APIs and various
         bug fixes, in preparation for libbpf 1.0, from Andrii Nakryiko.
      
      4) BPF documentation reorganization and improvements, from Christoph Hellwig
         and Dave Tucker.
      
      5) Support for declarative initialization of BPF_MAP_TYPE_PROG_ARRAY in
         libbpf, from Hengqi Chen.
      
      6) Verifier log fixes, from Hou Tao.
      
      7) Runtime-bounded loops support with bpf_loop() helper, from Joanne Koong.
      
      8) Extend branch record capturing to all platforms that support it,
         from Kajol Jain.
      
      9) Light skeleton codegen improvements, from Kumar Kartikeya Dwivedi.
      
      10) bpftool doc-generating script improvements, from Quentin Monnet.
      
      11) Two libbpf v0.6 bug fixes, from Shuyi Cheng and Vincent Minet.
      
      12) Deprecation warning fix for perf/bpf_counter, from Song Liu.
      
      13) MAX_TAIL_CALL_CNT unification and MIPS build fix for libbpf,
          from Tiezhu Yang.
      
      14) BTF_KING_TYPE_TAG follow-up fixes, from Yonghong Song.
      
      15) Selftests fixes and improvements, from Ilya Leoshkevich, Jean-Philippe
          Brucker, Jiri Olsa, Maxim Mikityanskiy, Tirthendu Sarkar, Yucong Sun,
          and others.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (115 commits)
        libbpf: Add "bool skipped" to struct bpf_map
        libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition
        bpftool: Switch bpf_object__load_xattr() to bpf_object__load()
        selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr()
        selftests/bpf: Add test for libbpf's custom log_buf behavior
        selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load()
        libbpf: Deprecate bpf_object__load_xattr()
        libbpf: Add per-program log buffer setter and getter
        libbpf: Preserve kernel error code and remove kprobe prog type guessing
        libbpf: Improve logging around BPF program loading
        libbpf: Allow passing user log setting through bpf_object_open_opts
        libbpf: Allow passing preallocated log_buf when loading BTF into kernel
        libbpf: Add OPTS-based bpf_btf_load() API
        libbpf: Fix bpf_prog_load() log_buf logic for log_level 0
        samples/bpf: Remove unneeded variable
        bpf: Remove redundant assignment to pointer t
        selftests/bpf: Fix a compilation warning
        perf/bpf_counter: Use bpf_map_create instead of bpf_create_map
        samples: bpf: Fix 'unknown warning group' build warning on Clang
        samples: bpf: Fix xdp_sample_user.o linking with Clang
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20211210234746.2100561-1-andrii@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      be315829
    • Shuyi Cheng's avatar
      libbpf: Add "bool skipped" to struct bpf_map · 229fae38
      Shuyi Cheng authored
      Fix error: "failed to pin map: Bad file descriptor, path:
      /sys/fs/bpf/_rodata_str1_1."
      
      In the old kernel, the global data map will not be created, see [0]. So
      we should skip the pinning of the global data map to avoid
      bpf_object__pin_maps returning error. Therefore, when the map is not
      created, we mark “map->skipped" as true and then check during relocation
      and during pinning.
      
      Fixes: 16e0c35c ("libbpf: Load global data maps lazily on legacy kernels")
      Signed-off-by: default avatarShuyi Cheng <chengshuyi@linux.alibaba.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      229fae38
    • Vincent Minet's avatar
      libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition · b69c5c07
      Vincent Minet authored
      The btf__dedup_deprecated name was misspelled in the definition of the
      compat symbol for btf__dedup. This leads it to be missing from the
      shared library.
      
      This fixes it.
      
      Fixes: 957d350a ("libbpf: Turn btf_dedup_opts into OPTS-based struct")
      Signed-off-by: default avatarVincent Minet <vincent@vincent-minet.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211210063112.80047-1-vincent@vincent-minet.net
      b69c5c07
    • Alexei Starovoitov's avatar
      Merge branch 'Enhance and rework logging controls in libbpf' · bd6b3b35
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      
      Add new open options and per-program setters to control BTF and program
      loading log verboseness and allow providing custom log buffers to capture logs
      of interest. Note how custom log_buf and log_level are orthogonal, which
      matches previous (alas less customizable) behavior of libbpf, even though it
      sort of worked by accident: if someone specified log_level = 1 in
      bpf_object__load_xattr(), first attempt to load any BPF program resulted in
      wasted bpf() syscall with -EINVAL due to !!log_buf != !!log_level. Then on
      retry libbpf would allocated log_buffer and try again, after which prog
      loading would succeed and libbpf would print verbose program loading log
      through its print callback.
      
      This behavior is now documented and made more efficient, not wasting
      unnecessary syscall. But additionally, log_level can be controlled globally on
      a per-bpf_object level through bpf_object_open_opts, as well as on
      a per-program basis with bpf_program__set_log_buf() and
      bpf_program__set_log_level() APIs.
      
      Now that we have a more future-proof way to set log_level, deprecate
      bpf_object__load_xattr().
      
      v2->v3:
        - added log_buf selftests for bpf_prog_load() and bpf_btf_load();
        - fix !log_buf in bpf_prog_load (John);
        - fix log_level==0 in bpf_btf_load (thanks selftest!);
      
      v1->v2:
        - fix log_level == 0 handling of bpf_prog_load, add as patch #1 (Alexei);
        - add comments explaining log_buf_size overflow prevention (Alexei).
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      bd6b3b35
    • Andrii Nakryiko's avatar
      bpftool: Switch bpf_object__load_xattr() to bpf_object__load() · b59e4ce8
      Andrii Nakryiko authored
      Switch all the uses of to-be-deprecated bpf_object__load_xattr() into
      a simple bpf_object__load() calls with optional log_level passed through
      open_opts.kernel_log_level, if -d option is specified.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211209193840.1248570-13-andrii@kernel.org
      b59e4ce8
    • Andrii Nakryiko's avatar
      selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr() · 3fc5fdcc
      Andrii Nakryiko authored
      Switch from bpf_object__load_xattr() to bpf_object__load() and
      kernel_log_level in bpf_object_open_opts.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211209193840.1248570-12-andrii@kernel.org
      3fc5fdcc
    • Andrii Nakryiko's avatar
      selftests/bpf: Add test for libbpf's custom log_buf behavior · 57e88926
      Andrii Nakryiko authored
      Add a selftest that validates that per-program and per-object log_buf
      overrides work as expected. Also test same logic for low-level
      bpf_prog_load() and bpf_btf_load() APIs.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211209193840.1248570-11-andrii@kernel.org
      57e88926
    • Andrii Nakryiko's avatar
      selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load() · dc94121b
      Andrii Nakryiko authored
      Switch all selftests uses of to-be-deprecated bpf_load_btf() with
      equivalent bpf_btf_load() calls.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211209193840.1248570-10-andrii@kernel.org
      dc94121b
    • Andrii Nakryiko's avatar
      libbpf: Deprecate bpf_object__load_xattr() · e7b924ca
      Andrii Nakryiko authored
      Deprecate non-extensible bpf_object__load_xattr() in v0.8 ([0]).
      
      With log_level control through bpf_object_open_opts or
      bpf_program__set_log_level(), we are finally at the point where
      bpf_object__load_xattr() doesn't provide any functionality that can't be
      accessed through other (better) ways. The other feature,
      target_btf_path, is also controllable through bpf_object_open_opts.
      
        [0] Closes: https://github.com/libbpf/libbpf/issues/289Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211209193840.1248570-9-andrii@kernel.org
      e7b924ca
    • Andrii Nakryiko's avatar
      libbpf: Add per-program log buffer setter and getter · b3ce9079
      Andrii Nakryiko authored
      Allow to set user-provided log buffer on a per-program basis ([0]). This
      gives great deal of flexibility in terms of which programs are loaded
      with logging enabled and where corresponding logs go.
      
      Log buffer set with bpf_program__set_log_buf() overrides kernel_log_buf
      and kernel_log_size settings set at bpf_object open time through
      bpf_object_open_opts, if any.
      
      Adjust bpf_object_load_prog_instance() logic to not perform own log buf
      allocation and load retry if custom log buffer is provided by the user.
      
        [0] Closes: https://github.com/libbpf/libbpf/issues/418Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211209193840.1248570-8-andrii@kernel.org
      b3ce9079
    • Andrii Nakryiko's avatar
      libbpf: Preserve kernel error code and remove kprobe prog type guessing · 2eda2145
      Andrii Nakryiko authored
      Instead of rewriting error code returned by the kernel of prog load with
      libbpf-sepcific variants pass through the original error.
      
      There is now also no need to have a backup generic -LIBBPF_ERRNO__LOAD
      fallback error as bpf_prog_load() guarantees that errno will be properly
      set no matter what.
      
      Also drop a completely outdated and pretty useless BPF_PROG_TYPE_KPROBE
      guess logic. It's not necessary and neither it's helpful in modern BPF
      applications.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211209193840.1248570-7-andrii@kernel.org
      2eda2145
    • Andrii Nakryiko's avatar
      libbpf: Improve logging around BPF program loading · ad9a7f96
      Andrii Nakryiko authored
      Add missing "prog '%s': " prefixes in few places and use consistently
      markers for beginning and end of program load logs. Here's an example of
      log output:
      
      libbpf: prog 'handler': BPF program load failed: Permission denied
      libbpf: -- BEGIN PROG LOAD LOG ---
      arg#0 reference type('UNKNOWN ') size cannot be determined: -22
      ; out1 = in1;
      0: (18) r1 = 0xffffc9000cdcc000
      2: (61) r1 = *(u32 *)(r1 +0)
      
      ...
      
      81: (63) *(u32 *)(r4 +0) = r5
       R1_w=map_value(id=0,off=16,ks=4,vs=20,imm=0) R4=map_value(id=0,off=400,ks=4,vs=16,imm=0)
      invalid access to map value, value_size=16 off=400 size=4
      R4 min value is outside of the allowed memory range
      processed 63 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
       -- END PROG LOAD LOG --
      libbpf: failed to load program 'handler'
      libbpf: failed to load object 'test_skeleton'
      
      The entire verifier log, including BEGIN and END markers are now always
      youtput during a single print callback call. This should make it much
      easier to post-process or parse it, if necessary. It's not an explicit
      API guarantee, but it can be reasonably expected to stay like that.
      
      Also __bpf_object__open is renamed to bpf_object_open() as it's always
      an adventure to find the exact function that implements bpf_object's
      open phase, so drop the double underscored and use internal libbpf
      naming convention.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211209193840.1248570-6-andrii@kernel.org
      ad9a7f96
    • Andrii Nakryiko's avatar
      libbpf: Allow passing user log setting through bpf_object_open_opts · e0e3ea88
      Andrii Nakryiko authored
      Allow users to provide their own custom log_buf, log_size, and log_level
      at bpf_object level through bpf_object_open_opts. This log_buf will be
      used during BTF loading. Subsequent patch will use same log_buf during
      BPF program loading, unless overriden at per-bpf_program level.
      
      When such custom log_buf is provided, libbpf won't be attempting
      retrying loading of BTF to try to provide its own log buffer to capture
      kernel's error log output. User is responsible to provide big enough
      buffer, otherwise they run a risk of getting -ENOSPC error from the
      bpf() syscall.
      
      See also comments in bpf_object_open_opts regarding log_level and
      log_buf interactions.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211209193840.1248570-5-andrii@kernel.org
      e0e3ea88
    • Andrii Nakryiko's avatar
      libbpf: Allow passing preallocated log_buf when loading BTF into kernel · 1a190d1e
      Andrii Nakryiko authored
      Add libbpf-internal btf_load_into_kernel() that allows to pass
      preallocated log_buf and custom log_level to be passed into kernel
      during BPF_BTF_LOAD call. When custom log_buf is provided,
      btf_load_into_kernel() won't attempt an retry with automatically
      allocated internal temporary buffer to capture BTF validation log.
      
      It's important to note the relation between log_buf and log_level, which
      slightly deviates from stricter kernel logic. From kernel's POV, if
      log_buf is specified, log_level has to be > 0, and vice versa. While
      kernel has good reasons to request such "sanity, this, in practice, is
      a bit unconvenient and restrictive for libbpf's high-level bpf_object APIs.
      
      So libbpf will allow to set non-NULL log_buf and log_level == 0. This is
      fine and means to attempt to load BTF without logging requested, but if
      it failes, retry the load with custom log_buf and log_level 1. Similar
      logic will be implemented for program loading. In practice this means
      that users can provide custom log buffer just in case error happens, but
      not really request slower verbose logging all the time. This is also
      consistent with libbpf behavior when custom log_buf is not set: libbpf
      first tries to load everything with log_level=0, and only if error
      happens allocates internal log buffer and retries with log_level=1.
      
      Also, while at it, make BTF validation log more obvious and follow the log
      pattern libbpf is using for dumping BPF verifier log during
      BPF_PROG_LOAD. BTF loading resulting in an error will look like this:
      
      libbpf: BTF loading error: -22
      libbpf: -- BEGIN BTF LOAD LOG ---
      magic: 0xeb9f
      version: 1
      flags: 0x0
      hdr_len: 24
      type_off: 0
      type_len: 1040
      str_off: 1040
      str_len: 2063598257
      btf_total_size: 1753
      Total section length too long
      -- END BTF LOAD LOG --
      libbpf: Error loading .BTF into kernel: -22. BTF is optional, ignoring.
      
      This makes it much easier to find relevant parts in libbpf log output.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211209193840.1248570-4-andrii@kernel.org
      1a190d1e
    • Andrii Nakryiko's avatar
      libbpf: Add OPTS-based bpf_btf_load() API · 0ed08d67
      Andrii Nakryiko authored
      Similar to previous bpf_prog_load() and bpf_map_create() APIs, add
      bpf_btf_load() API which is taking optional OPTS struct. Schedule
      bpf_load_btf() for deprecation in v0.8 ([0]).
      
      This makes naming consistent with BPF_BTF_LOAD command, sets up an API
      for extensibility in the future, moves options parameters (log-related
      fields) into optional options, and also allows to pass log_level
      directly.
      
      It also removes log buffer auto-allocation logic from low-level API
      (consistent with bpf_prog_load() behavior), but preserves a special
      treatment of log_level == 0 with non-NULL log_buf, which matches
      low-level bpf_prog_load() and high-level libbpf APIs for BTF and program
      loading behaviors.
      
        [0] Closes: https://github.com/libbpf/libbpf/issues/419Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211209193840.1248570-3-andrii@kernel.org
      0ed08d67