1. 30 Aug, 2016 5 commits
    • Florian Westphal's avatar
      netfilter: conntrack: add gc worker to remove timed-out entries · b87a2f91
      Florian Westphal authored
      Conntrack gc worker to evict stale entries.
      
      GC happens once every 5 seconds, but we only scan at most 1/64th of the
      table (and not more than 8k) buckets to avoid hogging cpu.
      
      This means that a complete scan of the table will take several minutes
      of wall-clock time.
      
      Considering that the gc run will never have to evict any entries
      during normal operation because those will happen from packet path
      this should be fine.
      
      We only need gc to make sure userspace (conntrack event listeners)
      eventually learn of the timeout, and for resource reclaim in case the
      system becomes idle.
      
      We do not disable BH and cond_resched for every bucket so this should
      not introduce noticeable latencies either.
      
      A followup patch will add a small change to speed up GC for the extreme
      case where most entries are timed out on an otherwise idle system.
      
      v2: Use cond_resched_rcu_qs & add comment wrt. missing restart on
      nulls value change in gc worker, suggested by Eric Dumazet.
      
      v3: don't call cancel_delayed_work_sync twice (again, Eric).
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b87a2f91
    • Florian Westphal's avatar
      netfilter: evict stale entries on netlink dumps · 2344d64e
      Florian Westphal authored
      When dumping we already have to look at the entire table, so we might
      as well toss those entries whose timeout value is in the past.
      
      We also look at every entry during resize operations.
      However, eviction there is not as simple because we hold the
      global resize lock so we can't evict without adding a 'expired' list
      to drop from later.  Considering that resizes are very rare it doesn't
      seem worth doing it.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2344d64e
    • Florian Westphal's avatar
      netfilter: conntrack: get rid of conntrack timer · f330a7fd
      Florian Westphal authored
      With stats enabled this eats 80 bytes on x86_64 per nf_conn entry, as
      Eric Dumazet pointed out during netfilter workshop 2016.
      
      Eric also says: "Another reason was the fact that Thomas was about to
      change max timer range [..]" (500462a9, 'timers: Switch to
      a non-cascading wheel').
      
      Remove the timer and use a 32bit jiffies value containing timestamp until
      entry is valid.
      
      During conntrack lookup, even before doing tuple comparision, check
      the timeout value and evict the entry in case it is too old.
      
      The dying bit is used as a synchronization point to avoid races where
      multiple cpus try to evict the same entry.
      
      Because lookup is always lockless, we need to bump the refcnt once
      when we evict, else we could try to evict already-dead entry that
      is being recycled.
      
      This is the standard/expected way when conntrack entries are destroyed.
      
      Followup patches will introduce garbage colliction via work queue
      and further places where we can reap obsoleted entries (e.g. during
      netlink dumps), this is needed to avoid expired conntracks from hanging
      around for too long when lookup rate is low after a busy period.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f330a7fd
    • Florian Westphal's avatar
      netfilter: don't rely on DYING bit to detect when destroy event was sent · 616b14b4
      Florian Westphal authored
      The reliable event delivery mode currently (ab)uses the DYING bit to
      detect which entries on the dying list have to be skipped when
      re-delivering events from the eache worker in reliable event mode.
      
      Currently when we delete the conntrack from main table we only set this
      bit if we could also deliver the netlink destroy event to userspace.
      
      If we fail we move it to the dying list, the ecache worker will
      reattempt event delivery for all confirmed conntracks on the dying list
      that do not have the DYING bit set.
      
      Once timer is gone, we can no longer use if (del_timer()) to detect
      when we 'stole' the reference count owned by the timer/hash entry, so
      we need some other way to avoid racing with other cpu.
      
      Pablo suggested to add a marker in the ecache extension that skips
      entries that have been unhashed from main table but are still waiting
      for the last reference count to be dropped (e.g. because one skb waiting
      on nfqueue verdict still holds a reference).
      
      We do this by adding a tristate.
      If we fail to deliver the destroy event, make a note of this in the
      eache extension.  The worker can then skip all entries that are in
      a different state.  Either they never delivered a destroy event,
      e.g. because the netlink backend was not loaded, or redelivery took
      place already.
      
      Once the conntrack timer is removed we will now be able to replace
      del_timer() test with test_and_set_bit(DYING, &ct->status) to avoid
      racing with other cpu that tries to evict the same conntrack.
      
      Because DYING will then be set right before we report the destroy event
      we can no longer skip event reporting when dying bit is set.
      Suggested-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      616b14b4
    • Florian Westphal's avatar
      netfilter: restart search if moved to other chain · 95a8d19f
      Florian Westphal authored
      In case nf_conntrack_tuple_taken did not find a conflicting entry
      check that all entries in this hash slot were tested and restart
      in case an entry was moved to another chain.
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: ea781f19 ("netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and get rid of call_rcu()")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      95a8d19f
  2. 26 Aug, 2016 3 commits
  3. 23 Aug, 2016 3 commits
  4. 22 Aug, 2016 4 commits
  5. 18 Aug, 2016 1 commit
  6. 17 Aug, 2016 1 commit
  7. 13 Aug, 2016 1 commit
    • Pablo Neira Ayuso's avatar
      netfilter: remove ip_conntrack* sysctl compat code · adf05168
      Pablo Neira Ayuso authored
      This backward compatibility has been around for more than ten years,
      since Yasuyuki Kozakai introduced IPv6 in conntrack. These days, we have
      alternate /proc/net/nf_conntrack* entries, the ctnetlink interface and
      the conntrack utility got adopted by many people in the user community
      according to what I observed on the netfilter user mailing list.
      
      So let's get rid of this.
      
      Note that nf_conntrack_htable_size and unsigned int nf_conntrack_max do
      not need to be exported as symbol anymore.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      adf05168
  8. 12 Aug, 2016 1 commit
  9. 11 Aug, 2016 21 commits