• Daniel Borkmann's avatar
    bpf, lru: avoid messing with eviction heuristics upon syscall lookup · 50b045a8
    Daniel Borkmann authored
    One of the biggest issues we face right now with picking LRU map over
    regular hash table is that a map walk out of user space, for example,
    to just dump the existing entries or to remove certain ones, will
    completely mess up LRU eviction heuristics and wrong entries such
    as just created ones will get evicted instead. The reason for this
    is that we mark an entry as "in use" via bpf_lru_node_set_ref() from
    system call lookup side as well. Thus upon walk, all entries are
    being marked, so information of actual least recently used ones
    are "lost".
    
    In case of Cilium where it can be used (besides others) as a BPF
    based connection tracker, this current behavior causes disruption
    upon control plane changes that need to walk the map from user space
    to evict certain entries. Discussion result from bpfconf [0] was that
    we should simply just remove marking from system call side as no
    good use case could be found where it's actually needed there.
    Therefore this patch removes marking for regular LRU and per-CPU
    flavor. If there ever should be a need in future, the behavior could
    be selected via map creation flag, but due to mentioned reason we
    avoid this here.
    
      [0] http://vger.kernel.org/bpfconf.html
    
    Fixes: 29ba732a ("bpf: Add BPF_MAP_TYPE_LRU_HASH")
    Fixes: 8f844938 ("bpf: Add BPF_MAP_TYPE_LRU_PERCPU_HASH")
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    50b045a8
hashtab.c 38.5 KB