1. 16 Mar, 2015 9 commits
    • Herbert Xu's avatar
      rhashtable: Fix rhashtable_remove failures · 565e8640
      Herbert Xu authored
      The commit 9d901bc0 ("rhashtable:
      Free bucket tables asynchronously after rehash") causes gratuitous
      failures in rhashtable_remove.
      
      The reason is that it inadvertently introduced multiple rehashing
      from the perspective of readers.  IOW it is now possible to see
      more than two tables during a single RCU critical section.
      
      Fortunately the other reader rhashtable_lookup already deals with
      this correctly thanks to c4db8848
      ("rhashtable: rhashtable: Move future_tbl into struct bucket_table")
      so only rhashtable_remove is broken by this change.
      
      This patch fixes this by looping over every table from the first
      one to the last or until we find the element that we were trying
      to delete.
      
      Incidentally the simple test for detecting rehashing to prevent
      starting another shrinking no longer works.  Since it isn't needed
      anyway (the work queue and the mutex serves as a natural barrier
      to unnecessary rehashes) I've simply killed the test.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      565e8640
    • Herbert Xu's avatar
      rhashtable: Fix use-after-free in rhashtable_walk_stop · 963ecbd4
      Herbert Xu authored
      The commit c4db8848 ("rhashtable:
      Move future_tbl into struct bucket_table") introduced a use-after-
      free bug in rhashtable_walk_stop because it dereferences tbl after
      droping the RCU read lock.
      
      This patch fixes it by moving the RCU read unlock down to the bottom
      of rhashtable_walk_stop.  In fact this was how I had it originally
      but it got dropped while rearranging patches because this one
      depended on the async freeing of bucket_table.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      963ecbd4
    • Petri Gynther's avatar
      net: bcmgenet: add support for Hardware Filter Block · 0034de41
      Petri Gynther authored
      Add support for Hardware Filter Block (HFB) so that incoming Rx traffic
      can be matched and directed to desired Rx queues.
      Signed-off-by: default avatarPetri Gynther <pgynther@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0034de41
    • David S. Miller's avatar
      Merge branch 'ebpf_skb_fields' · 70006af9
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      bpf: allow eBPF access skb fields
      
      V1->V2:
      - refactored field access converter into common helper convert_skb_access()
        used in both classic and extended BPF
      - added missing build_bug_on for field 'len'
      - added comment to uapi/linux/bpf.h as suggested by Daniel
      - dropped exposing 'ifindex' field for now
      
      classic BPF has a way to access skb fields, whereas extended BPF didn't.
      This patch introduces this ability.
      
      Classic BPF can access fields via negative SKF_AD_OFF offset.
      Positive bpf_ld_abs N is treated as load from packet, whereas
      bpf_ld_abs -0x1000 + N is treated as skb fields access.
      Many offsets were hard coded over years: SKF_AD_PROTOCOL, SKF_AD_PKTTYPE, etc.
      The problem with this approach was that for every new field classic bpf
      assembler had to be tweaked.
      
      I've considered doing the same for extended, but for every new field LLVM
      compiler would have to be modifed. Since it would need to add a new intrinsic.
      It could be done with single intrinsic and magic offset or use of inline
      assembler, but neither are clean from compiler backend point of view, since
      they look like calls but shouldn't scratch caller-saved registers.
      
      Another approach was to introduce a new helper functions like bpf_get_pkt_type()
      for every field that we want to access, but that is equally ugly for kernel
      and slow, since helpers are calls and they are slower then just loads.
      In theory helper calls can be 'inlined' inside kernel into direct loads, but
      since they were calls for user space, compiler would have to spill registers
      around such calls anyway. Teaching compiler to treat such helpers differently
      is even uglier.
      
      They were few other ideas considered. At the end the best seems to be to
      introduce a user accessible mirror of in-kernel sk_buff structure:
      
      struct __sk_buff {
          __u32 len;
          __u32 pkt_type;
          __u32 mark;
          __u32 queue_mapping;
      };
      
      bpf programs will do:
      
      int bpf_prog1(struct __sk_buff *skb)
      {
          __u32 var = skb->pkt_type;
      
      which will be compiled to bpf assembler as:
      
      dst_reg = *(u32 *)(src_reg + 4) // 4 == offsetof(struct __sk_buff, pkt_type)
      
      bpf verifier will check validity of access and will convert it to:
      
      dst_reg = *(u8 *)(src_reg + offsetof(struct sk_buff, __pkt_type_offset))
      dst_reg &= 7
      
      since 'pkt_type' is a bitfield.
      
      No new instructions added. LLVM doesn't need to be modified.
      JITs don't change and verifier already knows when it accesses 'ctx' pointer.
      The only thing needed was to convert user visible offset within __sk_buff
      to kernel internal offset within sk_buff.
      For 'len' and other fields conversion is trivial.
      Converting 'pkt_type' takes 2 or 3 instructions depending on endianness.
      More fields can be exposed by adding to the end of the 'struct __sk_buff'.
      Like vlan_tci and others can be added later.
      
      When pkt_type field is moved around, goes into different structure, removed or
      its size changes, the function convert_skb_access() would need to updated and
      it will cover both classic and extended.
      
      Patch 2 updates examples to demonstrates how fields are accessed and
      adds new tests for verifier, since it needs to detect a corner case when
      attacker is using single bpf instruction in two branches with different
      register types.
      
      The 4 fields of __sk_buff are already exposed to user space via classic bpf and
      I believe they're useful in extended as well.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70006af9
    • Alexei Starovoitov's avatar
      samples: bpf: add skb->field examples and tests · 614cd3bd
      Alexei Starovoitov authored
      - modify sockex1 example to count number of bytes in outgoing packets
      - modify sockex2 example to count number of bytes and packets per flow
      - add 4 stress tests that exercise 'skb->field' code path of verifier
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      614cd3bd
    • Alexei Starovoitov's avatar
      bpf: allow extended BPF programs access skb fields · 9bac3d6d
      Alexei Starovoitov authored
      introduce user accessible mirror of in-kernel 'struct sk_buff':
      struct __sk_buff {
          __u32 len;
          __u32 pkt_type;
          __u32 mark;
          __u32 queue_mapping;
      };
      
      bpf programs can do:
      
      int bpf_prog(struct __sk_buff *skb)
      {
          __u32 var = skb->pkt_type;
      
      which will be compiled to bpf assembler as:
      
      dst_reg = *(u32 *)(src_reg + 4) // 4 == offsetof(struct __sk_buff, pkt_type)
      
      bpf verifier will check validity of access and will convert it to:
      
      dst_reg = *(u8 *)(src_reg + offsetof(struct sk_buff, __pkt_type_offset))
      dst_reg &= 7
      
      since skb->pkt_type is a bitfield.
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9bac3d6d
    • David S. Miller's avatar
      Merge branch 'ebpf_helpers' · a498cfe9
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      eBPF updates
      
      Two small eBPF helper additions to better match up with ancillary
      classic BPF functionality.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a498cfe9
    • Daniel Borkmann's avatar
      ebpf: add helper for obtaining current processor id · c04167ce
      Daniel Borkmann authored
      This patch adds the possibility to obtain raw_smp_processor_id() in
      eBPF. Currently, this is only possible in classic BPF where commit
      da2033c2 ("filter: add SKF_AD_RXHASH and SKF_AD_CPU") has added
      facilities for this.
      
      Perhaps most importantly, this would also allow us to track per CPU
      statistics with eBPF maps, or to implement a poor-man's per CPU data
      structure through eBPF maps.
      
      Example function proto-type looks like:
      
        u32 (*smp_processor_id)(void) = (void *)BPF_FUNC_get_smp_processor_id;
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c04167ce
    • Daniel Borkmann's avatar
      ebpf: add prandom helper for packet sampling · 03e69b50
      Daniel Borkmann authored
      This work is similar to commit 4cd3675e ("filter: added BPF
      random opcode") and adds a possibility for packet sampling in eBPF.
      
      Currently, this is only possible in classic BPF and useful to
      combine sampling with f.e. packet sockets, possible also with tc.
      
      Example function proto-type looks like:
      
        u32 (*prandom_u32)(void) = (void *)BPF_FUNC_get_prandom_u32;
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03e69b50
  2. 15 Mar, 2015 11 commits
  3. 14 Mar, 2015 20 commits