1. 02 Jun, 2014 18 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: allow to delete several objects from a batch · 4fefee57
      Pablo Neira Ayuso authored
      Three changes to allow the deletion of several objects with dependencies
      in one transaction, they are:
      
      1) Introduce speculative counter increment/decrement that is undone in
         the abort path if required, thus we avoid hitting -EBUSY when deleting
         the chain. The counter updates are reverted in the abort path.
      
      2) Increment/decrement table/chain use counter for each set/rule. We need
         this to fully rely on the use counters instead of the list content,
         eg. !list_empty(&chain->rules) which evaluate true in the middle of the
         transaction.
      
      3) Decrement table use counter when an anonymous set is bound to the
         rule in the commit path. This avoids hitting -EBUSY when deleting
         the table that contains anonymous sets. The anonymous sets are released
         in the nf_tables_rule_destroy path. This should not be a problem since
         the rule already bumped the use counter of the chain, so the bound
         anonymous set reflects dependencies through the rule object, which
         already increases the chain use counter.
      
      So the general assumption after this patch is that the use counters are
      bumped by direct object dependencies.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      4fefee57
    • Pablo Neira Ayuso's avatar
      netfilter: nft_rbtree: introduce locking · 7632667d
      Pablo Neira Ayuso authored
      There's no rbtree rcu version yet, so let's fall back on the spinlock
      to protect the concurrent access of this structure both from user
      (to update the set content) and kernel-space (in the packet path).
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7632667d
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: release objects in reverse order in the abort path · a1cee076
      Pablo Neira Ayuso authored
      The patch c7c32e72 ("netfilter: nf_tables: defer all object release via
      rcu") indicates that we always release deleted objects in the reverse
      order, but that is only needed in the abort path. These are the two
      possible scenarios when releasing objects:
      
      1) Deletion scenario in the commit path: no need to release objects in
      the reverse order since userspace already ensures that dependencies are
      fulfilled), ie. userspace tells us to delete rule -> ... -> rule ->
      chain -> table. In this case, we have to release the objects in the
      *same order* as userspace provided.
      
      2) Deletion scenario in the abort path: we have to iterate in the reverse
      order to undo what it cannot be added, ie. userspace sent us a batch
      that includes: table -> chain -> rule -> ... -> rule, and that needs to
      be partially undone. In this case, we have to release objects in the
      reverse order to ensure that the set and chain objects point to valid
      rule and table objects.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a1cee076
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: fix wrong transaction ordering in set elements · 46bbafce
      Pablo Neira Ayuso authored
      The transaction needs to be placed at the end of the commit list,
      otherwise event notifications are reordered and we may crash when
      releasing object via call_rcu.
      
      This problem was introduced in 60319eb1 ("netfilter: nf_tables: use new
      transaction infrastructure to handle elements").
      Reported-by: default avatarArturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      46bbafce
    • Mathieu Poirier's avatar
      netfilter: nfnetlink_acct: Fix memory leak · 4c552a64
      Mathieu Poirier authored
      Allocation of memory need only to happen once, that is
      after the proper checks on the NFACCT_FLAGS have been
      done.  Otherwise the code can return without freeing
      already allocated memory.
      Signed-off-by: default avatarMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      4c552a64
    • David S. Miller's avatar
      Revert "net/mlx4_en: Use affinity hint" · 96b2e73c
      David S. Miller authored
      This reverts commit 70a640d0.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96b2e73c
    • Stephen Boyd's avatar
      net: ks8851: Don't use regulator_get_optional() · d64eed1d
      Stephen Boyd authored
      We shouldn't be using regulator_get_optional() here. These
      regulators are always present as part of the physical design and
      there isn't any way to use an internal regulator or change the
      source of the reference voltage via software. Given that the only
      users of this driver in the kernel are DT based, this change
      should be transparent to them even if they don't specify any
      supplies because the regulator framework will insert dummy
      supplies as needed.
      
      Cc: Nishanth Menon <nm@ti.com>
      Cc: Mark Brown <broonie@kernel.org>
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Reviewed-by: default avatarMark Brown <broonie@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d64eed1d
    • David S. Miller's avatar
      Merge branch 'filter-next' · c532cea9
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      BPF + test suite updates
      
      These are the last bigger BPF changes that I had in my todo
      queue for now. As the first two patches from this series
      contain additional test cases for the test suite, I have
      rebased them on top of current net-next with the set from [1]
      applied to avoid introducing any unnecessary merge conflicts.
      
      For details, please refer to the individual patches. Test
      suite runs fine with the set applied.
      
       [1] http://patchwork.ozlabs.org/patch/352599/
           http://patchwork.ozlabs.org/patch/352600/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c532cea9
    • Daniel Borkmann's avatar
      net: filter: improve filter block macros · f8f6d679
      Daniel Borkmann authored
      Commit 9739eef1 ("net: filter: make BPF conversion more readable")
      started to introduce helper macros similar to BPF_STMT()/BPF_JUMP()
      macros from classic BPF.
      
      However, quite some statements in the filter conversion functions
      remained in the old style which gives a mixture of block macros and
      non block macros in the code. This patch makes the block macros itself
      more readable by using explicit member initialization, and converts
      the remaining ones where possible to remain in a more consistent state.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8f6d679
    • Daniel Borkmann's avatar
      net: filter: get rid of BPF_S_* enum · 34805931
      Daniel Borkmann authored
      This patch finally allows us to get rid of the BPF_S_* enum.
      Currently, the code performs unnecessary encode and decode
      workarounds in seccomp and filter migration itself when a filter
      is being attached in order to overcome BPF_S_* encoding which
      is not used anymore by the new interpreter resp. JIT compilers.
      
      Keeping it around would mean that also in future we would need
      to extend and maintain this enum and related encoders/decoders.
      We can get rid of all that and save us these operations during
      filter attaching. Naturally, also JIT compilers need to be updated
      by this.
      
      Before JIT conversion is being done, each compiler checks if A
      is being loaded at startup to obtain information if it needs to
      emit instructions to clear A first. Since BPF extensions are a
      subset of BPF_LD | BPF_{W,H,B} | BPF_ABS variants, case statements
      for extensions can be removed at that point. To ease and minimalize
      code changes in the classic JITs, we have introduced bpf_anc_helper().
      
      Tested with test_bpf on x86_64 (JIT, int), s390x (JIT, int),
      arm (JIT, int), i368 (int), ppc64 (JIT, int); for sparc we
      unfortunately didn't have access, but changes are analogous to
      the rest.
      
      Joint work with Alexei Starovoitov.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mircea Gherzan <mgherzan@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Acked-by: default avatarChema Gonzalez <chemag@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34805931
    • Daniel Borkmann's avatar
      net: filter: add test for loading SKF_AD_OFF limits · d50bc157
      Daniel Borkmann authored
      This check tests that overloading BPF_LD | BPF_ABS with an
      always invalid BPF extension, that is SKF_AD_MAX, fails to
      make sure classic BPF behaviour is correct in filter checker.
      
      Also, we add a test for loading at packet offset SKF_AD_OFF-1
      which should pass the filter, but later on fail during runtime.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d50bc157
    • Daniel Borkmann's avatar
      net: filter: add slot overlapping test with fully filled M[] · 9fe13baa
      Daniel Borkmann authored
      Also add a test for the scratch memory store that first fills
      all slots and then sucessively reads all of them back adding
      up to A, and eventually returning A. This and the previous
      M[] test with alternating fill/spill will detect possible JIT
      errors on M[].
      Suggested-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fe13baa
    • wangweidong's avatar
      bridge: fix the unbalanced promiscuous count when add_if failed · 019ee792
      wangweidong authored
      As commit 2796d0c6 ("bridge: Automatically manage port
      promiscuous mode."), make the add_if use dev_set_allmulti
      instead of dev_set_promiscuous, so when add_if failed, we
      should do dev_set_allmulti(dev, -1).
      Signed-off-by: default avatarWang Weidong <wangweidong1@huawei.com>
      Reviewed-by: default avatarAmos Kong <akong@redhat.com>
      Acked-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      019ee792
    • David S. Miller's avatar
      net: Revert mlx4 cpumask changes. · ee39facb
      David S. Miller authored
      This reverts commit 70a640d0
      ("net/mlx4_en: Use affinity hint") and commit
      c8865b64 ("cpumask: Utility function
      to set n'th cpu - local cpu first") because these changes break
      the build when SMP is disabled amongst other things.
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee39facb
    • Stephen Boyd's avatar
      net: ks8851: Don't use regulator_get_optional() · 2a82e40d
      Stephen Boyd authored
      We shouldn't be using regulator_get_optional() here. These
      regulators are always present as part of the physical design and
      there isn't any way to use an internal regulator or change the
      source of the reference voltage via software. Given that the only
      users of this driver in the kernel are DT based, this change
      should be transparent to them even if they don't specify any
      supplies because the regulator framework will insert dummy
      supplies as needed.
      
      Cc: Nishanth Menon <nm@ti.com>
      Cc: Mark Brown <broonie@kernel.org>
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Reviewed-by: default avatarMark Brown <broonie@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a82e40d
    • David S. Miller's avatar
      Merge branch 'mlx4-next' · b07166b2
      David S. Miller authored
      Amir Vadai says:
      
      ====================
      cpumask,net: Affinity hint helper function
      
      This patchset will set affinity hint to influence IRQs to be allocated on the
      same NUMA node as the one where the card resides. As discussed in
      http://www.spinics.net/lists/netdev/msg271497.html
      
      If the number of IRQs allocated is greater than the number of local NUMA cores, all
      local cores will be used first, and the rest of the IRQs will be on a remote
      NUMA node.
      If no NUMA support - IRQ's and cores will be mapped 1:1
      
      Since the utility function to calculate the mapping could be useful in other mq
      drivers in the kernel, it was added to cpumask.[ch]
      
      This patchset was tested and applied on top of net-next since the first
      consumer is a network device (mlx4_en).  Over commit 506724c4: "tg3: Override
      clock, link aware and link idle mode during NVRAM dump"
      
      I couldn't find a maintainer for cpumask.c, so only added the kernel mailing
      list
      
      Amir
      
      Changes from V5:
      - Moved the utility function from kernel/irq/manage.c to lib/cpumask.c, and
        renamed it's name accordingly to cpumask_set_cpu_local_first()
      - Added some comments as Thomas Gleixner suggested
      - Changed -EINVAL to -EAGAIN, that describes the error situtation better.
      
      Changes from V4:
      - Patch 1/2: irq: Utility function to get affinity_hint by policy
        Thank you Ben for the great review:
        - Moved the function it kernel/irq/manage.c since it could be useful for
          block mq devices
        - Fixed Typo's
        - Use cpumask_t * instead of cpumask_var_t in function header
        - Restructured the function to remove NULL assignment in a cpumask_var_t
        - Fix for offline local CPU's
      
      Changes from V3:
      - Patch 2/2: net/mlx4_en: Use affinity hint
        - somehow patch file was corrupted
      
      Changes from V2:
      - Patch 1/2: net: Utility function to get affinity_hint by policy
        - Fixed style issues
      
      Changes from V1:
      - Patch 1/2: net: Utility function to get affinity_hint by policy
        - Fixed error flow to return -EINVAL on error (thanks govind)
      - Patch 2/2: net/mlx4_en: Use affinity hint
        - Set ring->affinity_hint to NULL on error
      
      Changes from V0:
      - Fixed small style issues
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b07166b2
    • Yuval Atias's avatar
      net/mlx4_en: Use affinity hint · 70a640d0
      Yuval Atias authored
      The “affinity hint” mechanism is used by the user space
      daemon, irqbalancer, to indicate a preferred CPU mask for irqs.
      Irqbalancer can use this hint to balance the irqs between the
      cpus indicated by the mask.
      
      We wish the HCA to preferentially map the IRQs it uses to numa cores
      close to it.  To accomplish this, we use cpumask_set_cpu_local_first(), that
      sets the affinity hint according the following policy:
      First it maps IRQs to “close” numa cores.  If these are exhausted, the
      remaining IRQs are mapped to “far” numa cores.
      Signed-off-by: default avatarYuval Atias <yuvala@mellanox.com>
      Signed-off-by: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70a640d0
    • Amir Vadai's avatar
      cpumask: Utility function to set n'th cpu - local cpu first · c8865b64
      Amir Vadai authored
      This function sets the n'th cpu - local cpu's first.
      For example: in a 16 cores server with even cpu's local, will get the
      following values:
      cpumask_set_cpu_local_first(0, numa, cpumask) => cpu 0 is set
      cpumask_set_cpu_local_first(1, numa, cpumask) => cpu 2 is set
      ...
      cpumask_set_cpu_local_first(7, numa, cpumask) => cpu 14 is set
      cpumask_set_cpu_local_first(8, numa, cpumask) => cpu 1 is set
      cpumask_set_cpu_local_first(9, numa, cpumask) => cpu 3 is set
      ...
      cpumask_set_cpu_local_first(15, numa, cpumask) => cpu 15 is set
      
      Curently this function will be used by multi queue networking devices to
      calculate the irq affinity mask, such that as many local cpu's as
      possible will be utilized to handle the mq device irq's.
      Signed-off-by: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8865b64
  2. 31 May, 2014 22 commits