1. 27 Jan, 2020 1 commit
    • Cong Wang's avatar
      net_sched: fix ops->bind_class() implementations · 2e24cd75
      Cong Wang authored
      The current implementations of ops->bind_class() are merely
      searching for classid and updating class in the struct tcf_result,
      without invoking either of cl_ops->bind_tcf() or
      cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
      like cbq use them to count filters too. This is why syzbot triggered
      the warning in cbq_destroy_class().
      
      In order to fix this, we have to call cl_ops->bind_tcf() and
      cl_ops->unbind_tcf() like the filter binding path. This patch does
      so by refactoring out two helper functions __tcf_bind_filter()
      and __tcf_unbind_filter(), which are lockless and accept a Qdisc
      pointer, then teaching each implementation to call them correctly.
      
      Note, we merely pass the Qdisc pointer as an opaque pointer to
      each filter, they only need to pass it down to the helper
      functions without understanding it at all.
      
      Fixes: 07d79fc7
      
       ("net_sched: add reverse binding for tc class")
      Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e24cd75
  2. 15 Jun, 2019 1 commit
  3. 30 May, 2019 1 commit
  4. 27 Apr, 2019 2 commits
    • Johannes Berg's avatar
      netlink: make validation more configurable for future strictness · 8cb08174
      Johannes Berg authored
      We currently have two levels of strict validation:
      
       1) liberal (default)
           - undefined (type >= max) & NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
           - garbage at end of message accepted
       2) strict (opt-in)
           - NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
      
      Split out parsing strictness into four different options:
       * TRAILING     - check that there's no trailing data after parsing
                        attributes (in message or nested)
       * MAXTYPE      - reject attrs > max known type
       * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
       * STRICT_ATTRS - strictly validate attribute size
      
      The default for future things should be *everything*.
      The current *_strict() is a combination of TRAILING and MAXTYPE,
      and is renamed to _deprecated_strict().
      The current regular parsing has none of this, and is renamed to
      *_parse_deprecated().
      
      Additionally it allows us to select...
      8cb08174
    • Michal Kubecek's avatar
      netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de
      Michal Kubecek authored
      
      Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
      netlink based interfaces (including recently added ones) are still not
      setting it in kernel generated messages. Without the flag, message parsers
      not aware of attribute semantics (e.g. wireshark dissector or libmnl's
      mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
      the structure of their contents.
      
      Unfortunately we cannot just add the flag everywhere as there may be
      userspace applications which check nlattr::nla_type directly rather than
      through a helper masking out the flags. Therefore the patch renames
      nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
      as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
      are rewritten to use nla_nest_start().
      
      Except for changes in include/net/netlink.h, the patch was generated using
      this semantic patch:
      
      @@ expression E1, E2; @@
      -nla_nest_start(E1, E2)
      +nla_nest_start_noflag(E1, E2)
      
      @@ expression E1, E2; @@
      -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
      +nla_nest_start(E1, E2)
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae0be8de
  5. 27 Feb, 2019 1 commit
  6. 22 Feb, 2019 1 commit
  7. 17 Feb, 2019 1 commit
  8. 12 Feb, 2019 2 commits
  9. 25 May, 2018 1 commit
  10. 24 Jan, 2018 1 commit
  11. 19 Jan, 2018 4 commits
  12. 09 Nov, 2017 1 commit
  13. 29 Oct, 2017 1 commit
  14. 16 Oct, 2017 1 commit
  15. 31 Aug, 2017 1 commit
    • Cong Wang's avatar
      net_sched: add reverse binding for tc class · 07d79fc7
      Cong Wang authored
      
      TC filters when used as classifiers are bound to TC classes.
      However, there is a hidden difference when adding them in different
      orders:
      
      1. If we add tc classes before its filters, everything is fine.
         Logically, the classes exist before we specify their ID's in
         filters, it is easy to bind them together, just as in the current
         code base.
      
      2. If we add tc filters before the tc classes they bind, we have to
         do dynamic lookup in fast path. What's worse, this happens all
         the time not just once, because on fast path tcf_result is passed
         on stack, there is no way to propagate back to the one in tc filters.
      
      This hidden difference hurts performance silently if we have many tc
      classes in hierarchy.
      
      This patch intends to close this gap by doing the reverse binding when
      we create a new class, in this case we can actually search all the
      filters in its parent, match and fixup by classid. And because
      tcf_result is specific to each type of tc filter, we have to introduce
      a new ops for each filter to tell how to bind the class.
      
      Note, we still can NOT totally get rid of those class lookup in
      ->enqueue() because cgroup and flow filters have no way to determine
      the classid at setup time, they still have to go through dynamic lookup.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07d79fc7
  16. 07 Aug, 2017 1 commit
  17. 04 Aug, 2017 3 commits
  18. 21 Apr, 2017 2 commits
    • WANG Cong's avatar
      net_sched: remove useless NULL to tp->root · 43920538
      WANG Cong authored
      
      There is no need to NULL tp->root in ->destroy(), since tp is
      going to be freed very soon, and existing readers are still
      safe to read them.
      
      For cls_route, we always init its tp->root, so it can't be NULL,
      we can drop more useless code.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43920538
    • WANG Cong's avatar
      net_sched: move the empty tp check from ->destroy() to ->delete() · 763dbf63
      WANG Cong authored
      We could have a race condition where in ->classify() path we
      dereference tp->root and meanwhile a parallel ->destroy() makes it
      a NULL. Daniel cured this bug in commit d9363774
      ("net, sched: respect rcu grace period on cls destruction").
      
      This happens when ->destroy() is called for deleting a filter to
      check if we are the last one in tp, this tp is still linked and
      visible at that time. The root cause of this problem is the semantic
      of ->destroy(), it does two things (for non-force case):
      
      1) check if tp is empty
      2) if tp is empty we could really destroy it
      
      and its caller, if cares, needs to check its return value to see if it
      is really destroyed. Therefore we can't unlink tp unless we know it is
      empty.
      
      As suggested by Daniel, we could actually move the test logic to ->delete()
      so that we can safely unlink tp after ->delete() tells us the last one is
      just deleted and before ->destroy().
      
      Fixes: 1e052be6
      
       ("net_sched: destroy proto tp when all filters are gone")
      Cc: Roi Dayan <roid@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      763dbf63
  19. 13 Apr, 2017 1 commit
  20. 20 Sep, 2016 1 commit
  21. 23 Aug, 2016 1 commit
  22. 24 Sep, 2015 1 commit
  23. 09 Mar, 2015 1 commit
    • Cong Wang's avatar
      net_sched: destroy proto tp when all filters are gone · 1e052be6
      Cong Wang authored
      
      Kernel automatically creates a tp for each
      (kind, protocol, priority) tuple, which has handle 0,
      when we add a new filter, but it still is left there
      after we remove our own, unless we don't specify the
      handle (literally means all the filters under
      the tuple). For example this one is left:
      
        # tc filter show dev eth0
        filter parent 8001: protocol arp pref 49152 basic
      
      The user-space is hard to clean up these for kernel
      because filters like u32 are organized in a complex way.
      So kernel is responsible to remove it after all filters
      are gone.  Each type of filter has its own way to
      store the filters, so each type has to provide its
      way to check if all filters are gone.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <cwang@twopensource.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e052be6
  24. 06 Mar, 2015 1 commit
  25. 10 Dec, 2014 1 commit
  26. 09 Dec, 2014 1 commit
  27. 06 Oct, 2014 1 commit
    • John Fastabend's avatar
      net: sched: do not use tcf_proto 'tp' argument from call_rcu · 18cdb37e
      John Fastabend authored
      
      Using the tcf_proto pointer 'tp' from inside the classifiers callback
      is not valid because it may have been cleaned up by another call_rcu
      occuring on another CPU.
      
      'tp' is currently being used by tcf_unbind_filter() in this patch we
      move instances of tcf_unbind_filter outside of the call_rcu() context.
      This is safe to do because any running schedulers will either read the
      valid class field or it will be zeroed.
      
      And all schedulers today when the class is 0 do a lookup using the
      same call used by the tcf_exts_bind(). So even if we have a running
      classifier hit the null class pointer it will do a lookup and get
      to the same result. This is particularly fragile at the moment because
      the only way to verify this is to audit the schedulers call sites.
      Reported-by: default avatarCong Wang <xiyou.wangconf@gmail.com>
      Signed-off-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: default avatarCong Wang <cwang@twopensource.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18cdb37e
  28. 28 Sep, 2014 1 commit
  29. 16 Sep, 2014 1 commit
  30. 13 Sep, 2014 2 commits
  31. 28 Apr, 2014 1 commit