1. 06 Apr, 2023 14 commits
  2. 05 Apr, 2023 8 commits
  3. 04 Apr, 2023 10 commits
  4. 03 Apr, 2023 2 commits
  5. 02 Apr, 2023 1 commit
  6. 01 Apr, 2023 5 commits
    • Anton Protopopov's avatar
      bpf: optimize hashmap lookups when key_size is divisible by 4 · 5b85575a
      Anton Protopopov authored
      The BPF hashmap uses the jhash() hash function. There is an optimized version
      of this hash function which may be used if hash size is a multiple of 4. Apply
      this optimization to the hashmap in a similar way as it is done in the bloom
      filter map.
      
      On practice the optimization is only noticeable for smaller key sizes, which,
      however, is sufficient for many applications. An example is listed in the
      following table of measurements (a hashmap of 65536 elements was used):
      
          --------------------------------------------------------------------
          | key_size | fullness | lookups /sec | lookups (opt) /sec |   gain |
          --------------------------------------------------------------------
          |        4 |      25% |      42.990M |            46.000M |   7.0% |
          |        4 |      50% |      37.910M |            39.094M |   3.1% |
          |        4 |      75% |      34.486M |            36.124M |   4.7% |
          |        4 |     100% |      31.760M |            32.719M |   3.0% |
          --------------------------------------------------------------------
          |        8 |      25% |      43.855M |            49.626M |  13.2% |
          |        8 |      50% |      38.328M |            42.152M |  10.0% |
          |        8 |      75% |      34.483M |            38.088M |  10.5% |
          |        8 |     100% |      31.306M |            34.686M |  10.8% |
          --------------------------------------------------------------------
          |       12 |      25% |      38.398M |            43.770M |  14.0% |
          |       12 |      50% |      33.336M |            37.712M |  13.1% |
          |       12 |      75% |      29.917M |            34.440M |  15.1% |
          |       12 |     100% |      27.322M |            30.480M |  11.6% |
          --------------------------------------------------------------------
          |       16 |      25% |      41.491M |            41.921M |   1.0% |
          |       16 |      50% |      36.206M |            36.474M |   0.7% |
          |       16 |      75% |      32.529M |            33.027M |   1.5% |
          |       16 |     100% |      29.581M |            30.325M |   2.5% |
          --------------------------------------------------------------------
          |       20 |      25% |      34.240M |            36.787M |   7.4% |
          |       20 |      50% |      30.328M |            32.663M |   7.7% |
          |       20 |      75% |      27.536M |            29.354M |   6.6% |
          |       20 |     100% |      24.847M |            26.505M |   6.7% |
          --------------------------------------------------------------------
          |       24 |      25% |      36.329M |            40.608M |  11.8% |
          |       24 |      50% |      31.444M |            35.059M |  11.5% |
          |       24 |      75% |      28.426M |            31.452M |  10.6% |
          |       24 |     100% |      26.278M |            28.741M |   9.4% |
          --------------------------------------------------------------------
          |       28 |      25% |      31.540M |            31.944M |   1.3% |
          |       28 |      50% |      27.739M |            28.063M |   1.2% |
          |       28 |      75% |      24.993M |            25.814M |   3.3% |
          |       28 |     100% |      23.513M |            23.500M |  -0.1% |
          --------------------------------------------------------------------
          |       32 |      25% |      32.116M |            33.953M |   5.7% |
          |       32 |      50% |      28.879M |            29.859M |   3.4% |
          |       32 |      75% |      26.227M |            26.948M |   2.7% |
          |       32 |     100% |      23.829M |            24.613M |   3.3% |
          --------------------------------------------------------------------
          |       64 |      25% |      22.535M |            22.554M |   0.1% |
          |       64 |      50% |      20.471M |            20.675M |   1.0% |
          |       64 |      75% |      19.077M |            19.146M |   0.4% |
          |       64 |     100% |      17.710M |            18.131M |   2.4% |
          --------------------------------------------------------------------
      
      The following script was used to gather the results (SMT & frequency off):
      
          cd tools/testing/selftests/bpf
          for key_size in 4 8 12 16 20 24 28 32 64; do
                  for nr_entries in `seq 16384 16384 65536`; do
                          fullness=$(printf '%3s' $((nr_entries*100/65536)))
                          echo -n "key_size=$key_size: $fullness% full: "
                          sudo ./bench -d2 -a bpf-hashmap-lookup --key_size=$key_size --nr_entries=$nr_entries --max_entries=65536 --nr_loops=2000000 --map_flags=0x40 | grep cpu
                  done
                  echo
          done
      Signed-off-by: default avatarAnton Protopopov <aspsk@isovalent.com>
      Link: https://lore.kernel.org/r/20230401200602.3275-1-aspsk@isovalent.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5b85575a
    • Alexei Starovoitov's avatar
      Merge branch 'Enable RCU semantics for task kptrs' · a033907e
      Alexei Starovoitov authored
      David Vernet says:
      
      ====================
      
      In commit 22df776a ("tasks: Extract rcu_users out of union"), the
      'refcount_t rcu_users' field was extracted out of a union with the
      'struct rcu_head rcu' field. This allows us to use the field for
      refcounting struct task_struct with RCU protection, as the RCU callback
      no longer flips rcu_users to be nonzero after the callback is scheduled.
      
      This patch set leverages this to do a few things:
      
      1. Marks struct task_struct as RCU safe in the verifier, allowing
         referenced kptr tasks stored in maps to be accessed in an RCU
         read region without acquiring a reference (with just a NULL check).
      2. Makes bpf_task_acquire() a KF_ACQUIRE | KF_RCU | KF_RET_NULL kfunc.
      3. Removes bpf_task_kptr_get() and bpf_task_acquire_not_zero(), as
         they're now redundant with the above two changes.
      4. Updates selftests and documentation accordingly.
      ---
      Changelog:
      v1: https://lore.kernel.org/all/20230331005733.406202-1-void@manifault.com/
      v1 -> v2:
      - Remove testcases validating nested trust inheritance. The first
        version used 'struct task_struct __rcu *parent', but because that
        field has the __rcu tag it functions differently on gcc and llvm and
        causes gcc selftests to fail. Alexei is reworking nested trust,
        anyways so let's leave it off for now (Alexei).
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a033907e
    • David Vernet's avatar
      bpf,docs: Update documentation to reflect new task kfuncs · db9d479a
      David Vernet authored
      Now that struct task_struct objects are RCU safe, and bpf_task_acquire()
      can return NULL, we should update the BPF task kfunc documentation to
      reflect the current state of the API.
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/r/20230331195733.699708-4-void@manifault.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      db9d479a
    • David Vernet's avatar
      bpf: Remove now-defunct task kfuncs · f85671c6
      David Vernet authored
      In commit 22df776a ("tasks: Extract rcu_users out of union"), the
      'refcount_t rcu_users' field was extracted out of a union with the
      'struct rcu_head rcu' field. This allows us to safely perform a
      refcount_inc_not_zero() on task->rcu_users when acquiring a reference on
      a task struct. A prior patch leveraged this by making struct task_struct
      an RCU-protected object in the verifier, and by bpf_task_acquire() to
      use the task->rcu_users field for synchronization.
      
      Now that we can use RCU to protect tasks, we no longer need
      bpf_task_kptr_get(), or bpf_task_acquire_not_zero(). bpf_task_kptr_get()
      is truly completely unnecessary, as we can just use RCU to get the
      object. bpf_task_acquire_not_zero() is now equivalent to
      bpf_task_acquire().
      
      In addition to these changes, this patch also updates the associated
      selftests to no longer use these kfuncs.
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/r/20230331195733.699708-3-void@manifault.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f85671c6
    • David Vernet's avatar
      bpf: Make struct task_struct an RCU-safe type · d02c48fa
      David Vernet authored
      struct task_struct objects are a bit interesting in terms of how their
      lifetime is protected by refcounts. task structs have two refcount
      fields:
      
      1. refcount_t usage: Protects the memory backing the task struct. When
         this refcount drops to 0, the task is immediately freed, without
         waiting for an RCU grace period to elapse. This is the field that
         most callers in the kernel currently use to ensure that a task
         remains valid while it's being referenced, and is what's currently
         tracked with bpf_task_acquire() and bpf_task_release().
      
      2. refcount_t rcu_users: A refcount field which, when it drops to 0,
         schedules an RCU callback that drops a reference held on the 'usage'
         field above (which is acquired when the task is first created). This
         field therefore provides a form of RCU protection on the task by
         ensuring that at least one 'usage' refcount will be held until an RCU
         grace period has elapsed. The qualifier "a form of" is important
         here, as a task can remain valid after task->rcu_users has dropped to
         0 and the subsequent RCU gp has elapsed.
      
      In terms of BPF, we want to use task->rcu_users to protect tasks that
      function as referenced kptrs, and to allow tasks stored as referenced
      kptrs in maps to be accessed with RCU protection.
      
      Let's first determine whether we can safely use task->rcu_users to
      protect tasks stored in maps. All of the bpf_task* kfuncs can only be
      called from tracepoint, struct_ops, or BPF_PROG_TYPE_SCHED_CLS, program
      types. For tracepoint and struct_ops programs, the struct task_struct
      passed to a program handler will always be trusted, so it will always be
      safe to call bpf_task_acquire() with any task passed to a program.
      Note, however, that we must update bpf_task_acquire() to be KF_RET_NULL,
      as it is possible that the task has exited by the time the program is
      invoked, even if the pointer is still currently valid because the main
      kernel holds a task->usage refcount. For BPF_PROG_TYPE_SCHED_CLS, tasks
      should never be passed as an argument to the any program handlers, so it
      should not be relevant.
      
      The second question is whether it's safe to use RCU to access a task
      that was acquired with bpf_task_acquire(), and stored in a map. Because
      bpf_task_acquire() now uses task->rcu_users, it follows that if the task
      is present in the map, that it must have had at least one
      task->rcu_users refcount by the time the current RCU cs was started.
      Therefore, it's safe to access that task until the end of the current
      RCU cs.
      
      With all that said, this patch makes struct task_struct is an
      RCU-protected object. In doing so, we also change bpf_task_acquire() to
      be KF_ACQUIRE | KF_RCU | KF_RET_NULL, and adjust any selftests as
      necessary. A subsequent patch will remove bpf_task_kptr_get(), and
      bpf_task_acquire_not_zero() respectively.
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/r/20230331195733.699708-2-void@manifault.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d02c48fa