1. 17 Oct, 2013 19 commits
    • Paul Durrant's avatar
      xen-netback: enable IPv6 TCP GSO to the guest · 82cada22
      Paul Durrant authored
      This patch adds code to handle SKB_GSO_TCPV6 skbs and construct appropriate
      extra or prefix segments to pass the large packet to the frontend. New
      xenstore flags, feature-gso-tcpv6 and feature-gso-tcpv6-prefix, are sampled
      to determine if the frontend is capable of handling such packets.
      Signed-off-by: default avatarPaul Durrant <paul.durrant@citrix.com>
      Cc: Wei Liu <wei.liu2@citrix.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82cada22
    • Paul Durrant's avatar
      xen-netback: handle IPv6 TCP GSO packets from the guest · a9468587
      Paul Durrant authored
      This patch adds a xenstore feature flag, festure-gso-tcpv6, to advertise
      that netback can handle IPv6 TCP GSO packets. It creates SKB_GSO_TCPV6 skbs
      if the frontend passes an extra segment with the new type
      XEN_NETIF_GSO_TYPE_TCPV6 added to netif.h.
      Signed-off-by: default avatarPaul Durrant <paul.durrant@citrix.com>
      Cc: Wei Liu <wei.liu2@citrix.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Acked-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9468587
    • Paul Durrant's avatar
      xen-netback: Unconditionally set NETIF_F_RXCSUM · 7365bcfa
      Paul Durrant authored
      There is no mechanism to insist that a guest always generates a packet
      with good checksum (at least for IPv4) so we must handle checksum
      offloading from the guest and hence should set NETIF_F_RXCSUM.
      Signed-off-by: default avatarPaul Durrant <paul.durrant@citrix.com>
      Cc: Wei Liu <wei.liu2@citrix.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7365bcfa
    • Paul Durrant's avatar
      xen-netback: add support for IPv6 checksum offload from guest · 2eba61d5
      Paul Durrant authored
      For performance of VM to VM traffic on a single host it is better to avoid
      calculation of TCP/UDP checksum in the sending frontend. To allow this this
      patch adds the code necessary to set up partial checksum for IPv6 packets
      and xenstore flag feature-ipv6-csum-offload to advertise that fact to
      frontends.
      Signed-off-by: default avatarPaul Durrant <paul.durrant@citrix.com>
      Cc: Wei Liu <wei.liu2@citrix.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2eba61d5
    • Paul Durrant's avatar
      xen-netback: add support for IPv6 checksum offload to guest · 146c8a77
      Paul Durrant authored
      Check xenstore flag feature-ipv6-csum-offload to determine if a
      guest is happy to accept IPv6 packets with only partial checksum.
      Signed-off-by: default avatarPaul Durrant <paul.durrant@citrix.com>
      Cc: Wei Liu <wei.liu2@citrix.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      146c8a77
    • David S. Miller's avatar
      Merge branch 'bonding_rcu' · c0f4ace7
      David S. Miller authored
      bonding: patchset for rcu use in bonding
      
      ====================
      The Patch Set convert the xmit of 3ad and alb mode to use rcu lock.
      dd rtnl lock and remove read lock for bond sysfs.
      
      v2 because the bond_for_each_slave_rcu without rcu_read_lock() will occurs one warming, so
      add new function for alb xmit path to avoid warming.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0f4ace7
    • dingtianhong's avatar
      bonding: add rtnl lock and remove read lock for bond sysfs · 4d1ae5fb
      dingtianhong authored
      The bond_for_each_slave() will not be protected by read_lock(),
      only protected by rtnl_lock(), so need to replace read_lock()
      with rtnl_lock().
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d1ae5fb
    • dingtianhong's avatar
      bonding: use RCU protection for alb xmit path · 28c71926
      dingtianhong authored
      The commit 278b2083
      (bonding: initial RCU conversion) has convert the roundrobin,
      active-backup, broadcast and xor xmit path to rcu protection,
      the performance will be better for these mode, so this time,
      convert xmit path for alb mode.
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Cc: Nikolay Aleksandrov <nikolay@redhat.com>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28c71926
    • dingtianhong's avatar
      bonding: use RCU protection for 3ad xmit path · 47e91f56
      dingtianhong authored
      The commit 278b2083
      (bonding: initial RCU conversion) has convert the roundrobin,
      active-backup, broadcast and xor xmit path to rcu protection,
      the performance will be better for these mode, so this time,
      convert xmit path for 3ad mode.
      Suggested-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Suggested-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarWang Yufen <wangyufen@huawei.com>
      Cc: Nikolay Aleksandrov <nikolay@redhat.com>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47e91f56
    • David S. Miller's avatar
      Merge branch 'net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables · da33edcc
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      netfilter updates: nf_tables pull request
      
      The following patchset contains the current original nf_tables tree
      condensed in 17 patches. I have organized them by chronogical order
      since the original nf_tables code was released in 2009 and by
      dependencies between the different patches.
      
      The patches are:
      
      1) Adapt all existing hooks in the tree to pass hook ops to the
         hook callback function, required by nf_tables, from Patrick McHardy.
      
      2) Move alloc_null_binding to nf_nat_core, as it is now also needed by
         nf_tables and ip_tables, original patch from Patrick McHardy but
         required major changes to adapt it to the current tree that I made.
      
      3) Add nf_tables core, including the netlink API, the packet filtering
         engine, expressions and built-in tables, from Patrick McHardy. This
         patch includes accumulated fixes since 2009 and minor enhancements.
         The patch description contains a list of references to the original
         patches for the record. For those that are not familiar to the
         original work, see [1], [2] and [3].
      
      4) Add netlink set API, this replaces the original set infrastructure
         to introduce a netlink API to add/delete sets and to add/delete
         set elements. This includes two set types: the hash and the rb-tree
         sets (used for interval based matching). The main difference with
         ipset is that this infrastructure is data type agnostic. Patch from
         Patrick McHardy.
      
      5) Allow expression operation overload, this API change allows us to
         provide define expression subtypes depending on the configuration
         that is received from user-space via Netlink. It is used by follow
         up patches to provide optimized versions of the payload and cmp
         expressions and the x_tables compatibility layer, from Patrick
         McHardy.
      
      6) Add optimized data comparison operation, it requires the previous
         patch, from Patrick McHardy.
      
      7) Add optimized payload implementation, it requires patch 5, from
         Patrick McHardy.
      
      8) Convert built-in tables to chain types. Each chain type have special
         semantics (filter, route and nat) that are used by userspace to
         configure the chain behaviour. The main chain regarding iptables
         is that tables become containers of chain, with no specific semantics.
         However, you may still configure your tables and chains to retain
         iptables like semantics, patch from me.
      
      9) Add compatibility layer for x_tables. This patch adds support to
         use all existing x_tables extensions from nf_tables, this is used
         to provide a userspace utility that accepts iptables syntax but
         used internally the nf_tables kernel core. This patch includes
         missing features in the nf_tables core such as the per-chain
         stats, default chain policy and number of chain references, which
         are required by the iptables compatibility userspace tool. Patch
         from me.
      
      10) Fix transport protocol matching, this fix is a side effect of the
          x_tables compatibility layer, which now provides a pointer to the
          transport header, from me.
      
      11) Add support for dormant tables, this feature allows you to disable
          all chains and rules that are contained in one table, from me.
      
      12) Add IPv6 NAT support. At the time nf_tables was made, there was no
          NAT IPv6 support yet, from Tomasz Bursztyka.
      
      13) Complete net namespace support. This patch register the protocol
          family per net namespace, so tables (thus, other objects contained
          in tables such as sets, chains and rules) are only visible from the
          corresponding net namespace, from me.
      
      14) Add the insert operation to the nf_tables netlink API, this requires
          adding a new position attribute that allow us to locate where in the
          ruleset a rule needs to be inserted, from Eric Leblond.
      
      15) Add rule batching support, including atomic rule-set updates by
          using rule-set generations. This patch includes a change to nfnetlink
          to include two new control messages to indicate the beginning and
          the end of a batch. The end message is interpreted as the commit
          message, if it's missing, then the rule-set updates contained in the
          batch are aborted, from me.
      
      16) Add trace support to the nf_tables packet filtering core, from me.
      
      17) Add ARP filtering support, original patch from Patrick McHardy, but
          adapted to fit into the chain type infrastructure. This was recovered
          to be used by nft userspace tool and our compatibility arptables
          userspace tool.
      
      There is still work to do to fully replace x_tables [4] [5] but that can
      be done incrementally by extending our netlink API. Moreover, looking at
      netfilter-devel and the amount of contributions to nf_tables we've been
      getting, I think it would be good to have it mainstream to avoid accumulating
      large patchsets skip continuous rebases.
      
      I tried to provide a reasonable patchset, we have more than 100 accumulated
      patches in the original nf_tables tree, so I collapsed many of the small
      fixes to the main patch we had since 2009 and provide a small batch for
      review to netdev, while trying to retain part of the history.
      
      For those who didn't give a try to nf_tables yet, there's a quick howto
      available from Eric Leblond that describes how to get things working [6].
      
      Comments/reviews welcome.
      
      Thanks!
      
      [1] http://lwn.net/Articles/324251/
      [2] http://workshop.netfilter.org/2013/wiki/images/e/ee/Nftables-osd-2013-developer.pdf
      [3] http://lwn.net/Articles/564095/
      [4] http://people.netfilter.org/pablo/map-pending-work.txt
      [4] http://people.netfilter.org/pablo/nftables-todo.txt
      [5] https://home.regit.org/netfilter-en/nftables-quick-howto/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da33edcc
    • Michael Opdenacker's avatar
      irda: update comment mentioning IRQF_DISABLED · 78dea8cc
      Michael Opdenacker authored
      This patch removes a comment mentioning IRQF_DISABLED,
      which is deprecated.
      Signed-off-by: default avatarMichael Opdenacker <michael.opdenacker@free-electrons.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78dea8cc
    • Michael Opdenacker's avatar
      isdn: remove deprecated IRQF_DISABLED · 33235ca4
      Michael Opdenacker authored
      This patch proposes to remove the use of the IRQF_DISABLED flag
      
      It's a NOOP since 2.6.35 and it will be removed one day.
      Signed-off-by: default avatarMichael Opdenacker <michael.opdenacker@free-electrons.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33235ca4
    • David S. Miller's avatar
      Merge branch 'mlx4' · 3a14aede
      David S. Miller authored
      Amir Vadai says:
      
      ====================
      net/mlx4: Mellanox driver update 15-10-2013
      
      This patchset contains small code cleaning patches, and a patch to make
      mlx4_core use module_request() in order to load the relevant link layer module
      (mlx4_en or mlx4_ib) according to the port type.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a14aede
    • Eyal Perry's avatar
      net/mlx4_core: Load higher level modules according to ports type · b046ffe5
      Eyal Perry authored
      Mellanox ConnectX architecture is:  mlx4_core is the lower level
      PCI driver which register on the PCI id, and protocol specific drivers
      are depended on it: mlx4_en - for Ethernet and mlx4_ib for Infiniband.
      NIC could have multiple ports which can change their type dynamically.
      We use the request_module() call to load the relevant protocol driver
      when needed: on loading time or at port type change event.
      Signed-off-by: default avatarEyal Perry <eyalpe@mellanox.com>
      Signed-off-by: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b046ffe5
    • Amir Vadai's avatar
      net/mlx4: Unused local variable in mlx4_opreq_action · 39e210fd
      Amir Vadai authored
      Clean up warning added by commit fe6f700d "net/mlx4_core: Respond to
      operation request by firmware".
      Signed-off-by: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39e210fd
    • Or Gerlitz's avatar
      net/mlx4: Fix typo, move similar defs to same location · 5930e8d0
      Or Gerlitz authored
      Small code cleanup:
      
      1. change MLX4_DEV_CAP_FLAGS2_REASSIGN_MAC_EN to MLX4_DEV_CAP_FLAG2_REASSIGN_MAC_EN
      
      2. put MLX4_SET_PORT_PRIO2TC and MLX4_SET_PORT_SCHEDULER in the same union with the
         other MLX4_SET_PORT_yyy
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5930e8d0
    • Or Gerlitz's avatar
      net/mlx4: Clean the code to eliminate trivial build warnings · fe66bb2d
      Or Gerlitz authored
      Remove code that triggers trivial build warnings.
      
      drivers/net/ethernet/mellanox/mlx4/cmd.c: In function ‘mlx4_set_vf_vlan’:
      drivers/net/ethernet/mellanox/mlx4/cmd.c:2256: warning: variable ‘vf_oper’ set but not used
      drivers/net/ethernet/mellanox/mlx4/mcg.c: In function ‘mlx4_map_sw_to_hw_steering_mode’:
      drivers/net/ethernet/mellanox/mlx4/mcg.c:648: warning: comparison of unsigned expression < 0 is always false
      drivers/net/ethernet/mellanox/mlx4/mcg.c: In function ‘mlx4_map_sw_to_hw_steering_id’:
      drivers/net/ethernet/mellanox/mlx4/mcg.c:685: warning: comparison of unsigned expression < 0 is always false
      drivers/net/ethernet/mellanox/mlx4/mcg.c: In function ‘mlx4_hw_rule_sz’:
      drivers/net/ethernet/mellanox/mlx4/mcg.c:712: warning: comparison of unsigned expression < 0 is always false
      drivers/net/ethernet/mellanox/mlx4/fw.c: In function ‘mlx4_opreq_action’:
      drivers/net/ethernet/mellanox/mlx4/fw.c:1732: warning: variable ‘type_m’ set but not used
      drivers/net/ethernet/mellanox/mlx4/srq.c:302: warning: no previous prototype for ‘mlx4_srq_lookup’
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe66bb2d
    • Eric Dumazet's avatar
      inet_diag: use sock_gen_put() · c1d607cc
      Eric Dumazet authored
      TCP listener refactoring, part 6 :
      
      Use sock_gen_put() from inet_diag_dump_one_icsk() for future
      SYN_RECV support.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1d607cc
    • David S. Miller's avatar
      Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge · d7a20c86
      David S. Miller authored
      Included changes:
      - ensure RecordRoute information is added to BAT_ICMP echo_request/reply only
      - use VLAN_ETH_HLEN when possible
      - use htons when possible
      - substitute old fragmentation code with a new improved implementation by
        Martin Hundebøll
      - create common header for BAT_ICMP packets to improve extendibility
      - consider the network coding overhead when computing the overall room needed by
        batman headers
      - add dummy soft-interface rx mode handler
      - minor code refactoring and cleanups
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7a20c86
  2. 14 Oct, 2013 17 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: add ARP filtering support · ed683f13
      Pablo Neira Ayuso authored
      This patch registers the ARP family and he filter chain type
      for this family.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ed683f13
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: add trace support · b5bc89bf
      Pablo Neira Ayuso authored
      This patch adds support for tracing the packet travel through
      the ruleset, in a similar fashion to x_tables.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b5bc89bf
    • Pablo Neira Ayuso's avatar
      netfilter: nfnetlink: add batch support and use it from nf_tables · 0628b123
      Pablo Neira Ayuso authored
      This patch adds a batch support to nfnetlink. Basically, it adds
      two new control messages:
      
      * NFNL_MSG_BATCH_BEGIN, that indicates the beginning of a batch,
        the nfgenmsg->res_id indicates the nfnetlink subsystem ID.
      
      * NFNL_MSG_BATCH_END, that results in the invocation of the
        ss->commit callback function. If not specified or an error
        ocurred in the batch, the ss->abort function is invoked
        instead.
      
      The end message represents the commit operation in nftables, the
      lack of end message results in an abort. This patch also adds the
      .call_batch function that is only called from the batch receival
      path.
      
      This patch adds atomic rule updates and dumps based on
      bitmask generations. This allows to atomically commit a set of
      rule-set updates incrementally without altering the internal
      state of existing nf_tables expressions/matches/targets.
      
      The idea consists of using a generation cursor of 1 bit and
      a bitmask of 2 bits per rule. Assuming the gencursor is 0,
      then the genmask (expressed as a bitmask) can be interpreted
      as:
      
      00 active in the present, will be active in the next generation.
      01 inactive in the present, will be active in the next generation.
      10 active in the present, will be deleted in the next generation.
       ^
       gencursor
      
      Once you invoke the transition to the next generation, the global
      gencursor is updated:
      
      00 active in the present, will be active in the next generation.
      01 active in the present, needs to zero its future, it becomes 00.
      10 inactive in the present, delete now.
      ^
      gencursor
      
      If a dump is in progress and nf_tables enters a new generation,
      the dump will stop and return -EBUSY to let userspace know that
      it has to retry again. In order to invalidate dumps, a global
      genctr counter is increased everytime nf_tables enters a new
      generation.
      
      This new operation can be used from the user-space utility
      that controls the firewall, eg.
      
      nft -f restore
      
      The rule updates contained in `file' will be applied atomically.
      
      cat file
      -----
      add filter INPUT ip saddr 1.1.1.1 counter accept #1
      del filter INPUT ip daddr 2.2.2.2 counter drop   #2
      -EOF-
      
      Note that the rule 1 will be inactive until the transition to the
      next generation, the rule 2 will be evicted in the next generation.
      
      There is a penalty during the rule update due to the branch
      misprediction in the packet matching framework. But that should be
      quickly resolved once the iteration over the commit list that
      contain rules that require updates is finished.
      
      Event notification happens once the rule-set update has been
      committed. So we skip notifications is case the rule-set update
      is aborted, which can happen in case that the rule-set is tested
      to apply correctly.
      
      This patch squashed the following patches from Pablo:
      
      * nf_tables: atomic rule updates and dumps
      * nf_tables: get rid of per rule list_head for commits
      * nf_tables: use per netns commit list
      * nfnetlink: add batch support and use it from nf_tables
      * nf_tables: all rule updates are transactional
      * nf_tables: attach replacement rule after stale one
      * nf_tables: do not allow deletion/replacement of stale rules
      * nf_tables: remove unused NFTA_RULE_FLAGS
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0628b123
    • Eric Leblond's avatar
      netfilter: nf_tables: add insert operation · 5e948466
      Eric Leblond authored
      This patch adds a new rule attribute NFTA_RULE_POSITION which is
      used to store the position of a rule relatively to the others.
      By providing the create command and specifying the position, the
      rule is inserted after the rule with the handle equal to the
      provided position.
      
      Regarding notification, the position attribute specifies the
      handle of the previous rule to make sure we don't point to any
      stale rule in notifications coming from the commit path.
      
      This patch includes the following fix from Pablo:
      
      * nf_tables: fix rule deletion event reporting
      Signed-off-by: default avatarEric Leblond <eric@regit.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      5e948466
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: complete net namespace support · 99633ab2
      Pablo Neira Ayuso authored
      Register family per netnamespace to ensure that sets are
      only visible in its approapriate namespace.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      99633ab2
    • Tomasz Bursztyka's avatar
      netfilter: nf_tables: Add support for IPv6 NAT · eb31628e
      Tomasz Bursztyka authored
      This patch generalizes the NAT expression to support both IPv4 and IPv6
      using the existing IPv4/IPv6 NAT infrastructure. This also adds the
      NAT chain type for IPv6.
      
      This patch collapses the following patches that were posted to the
      netfilter-devel mailing list, from Tomasz:
      
      * nf_tables: Change NFTA_NAT_ attributes to better semantic significance
      * nf_tables: Split IPv4 NAT into NAT expression and IPv4 NAT chain
      * nf_tables: Add support for IPv6 NAT expression
      * nf_tables: Add support for IPv6 NAT chain
      * nf_tables: Fix up build issue on IPv6 NAT support
      
      And, from Pablo Neira Ayuso:
      
      * fix missing dependencies in nft_chain_nat
      Signed-off-by: default avatarTomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      eb31628e
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: add support for dormant tables · 9ddf6323
      Pablo Neira Ayuso authored
      This patch allows you to temporarily disable an entire table.
      You can change the state of a dormant table via NFT_MSG_NEWTABLE
      messages. Using this operation you can wake up a table, so their
      chains are registered.
      
      This provides atomicity at chain level. Thus, the rule-set of one
      chain is applied at once, avoiding any possible intermediate state
      in every chain. Still, the chains that belongs to a table are
      registered consecutively. This also allows you to have inactive
      tables in the kernel.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      9ddf6323
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: nft_payload: fix transport header base · c54032e0
      Pablo Neira Ayuso authored
      We cannot use skb->transport_header since it's unset, use
      pkt->xt.thoff instead.
      
      Now possible using information made available through the x_tables
      compatibility layer.
      Reported-by: default avatarEric Leblond <eric@regit.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c54032e0
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: add compatibility layer for x_tables · 0ca743a5
      Pablo Neira Ayuso authored
      This patch adds the x_tables compatibility layer. This allows you
      to use existing x_tables matches and targets from nf_tables.
      
      This compatibility later allows us to use existing matches/targets
      for features that are still missing in nf_tables. We can progressively
      replace them with native nf_tables extensions. It also provides the
      userspace compatibility software that allows you to express the
      rule-set using the iptables syntax but using the nf_tables kernel
      components.
      
      In order to get this compatibility layer working, I've done the
      following things:
      
      * add NFNL_SUBSYS_NFT_COMPAT: this new nfnetlink subsystem is used
      to query the x_tables match/target revision, so we don't need to
      use the native x_table getsockopt interface.
      
      * emulate xt structures: this required extending the struct nft_pktinfo
      to include the fragment offset, which is already obtained from
      ip[6]_tables and that is used by some matches/targets.
      
      * add support for default policy to base chains, required to emulate
        x_tables.
      
      * add NFTA_CHAIN_USE attribute to obtain the number of references to
        chains, required by x_tables emulation.
      
      * add chain packet/byte counters using per-cpu.
      
      * support 32-64 bits compat.
      
      For historical reasons, this patch includes the following patches
      that were posted in the netfilter-devel mailing list.
      
      From Pablo Neira Ayuso:
      * nf_tables: add default policy to base chains
      * netfilter: nf_tables: add NFTA_CHAIN_USE attribute
      * nf_tables: nft_compat: private data of target and matches in contiguous area
      * nf_tables: validate hooks for compat match/target
      * nf_tables: nft_compat: release cached matches/targets
      * nf_tables: x_tables support as a compile time option
      * nf_tables: fix alias for xtables over nftables module
      * nf_tables: add packet and byte counters per chain
      * nf_tables: fix per-chain counter stats if no counters are passed
      * nf_tables: don't bump chain stats
      * nf_tables: add protocol and flags for xtables over nf_tables
      * nf_tables: add ip[6]t_entry emulation
      * nf_tables: move specific layer 3 compat code to nf_tables_ipv[4|6]
      * nf_tables: support 32bits-64bits x_tables compat
      * nf_tables: fix compilation if CONFIG_COMPAT is disabled
      
      From Patrick McHardy:
      * nf_tables: move policy to struct nft_base_chain
      * nf_tables: send notifications for base chain policy changes
      
      From Alexander Primak:
      * nf_tables: remove the duplicate NF_INET_LOCAL_OUT
      
      From Nicolas Dichtel:
      * nf_tables: fix compilation when nf-netlink is a module
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0ca743a5
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: convert built-in tables/chains to chain types · 9370761c
      Pablo Neira Ayuso authored
      This patch converts built-in tables/chains to chain types that
      allows you to deploy customized table and chain configurations from
      userspace.
      
      After this patch, you have to specify the chain type when
      creating a new chain:
      
       add chain ip filter output { type filter hook input priority 0; }
                                    ^^^^ ------
      
      The existing chain types after this patch are: filter, route and
      nat. Note that tables are just containers of chains with no specific
      semantics, which is a significant change with regards to iptables.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      9370761c
    • Patrick McHardy's avatar
      netfilter: nft_payload: add optimized payload implementation for small loads · c29b72e0
      Patrick McHardy authored
      Add an optimized payload expression implementation for small (up to 4 bytes)
      aligned data loads from the linear packet area.
      
      This patch also includes original Patrick McHardy's entitled (nf_tables:
      inline nft_payload_fast_eval() into main evaluation loop).
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c29b72e0
    • Patrick McHardy's avatar
      netfilter: nf_tables: add optimized data comparison for small values · cb7dbfd0
      Patrick McHardy authored
      Add an optimized version of nft_data_cmp() that only handles values of to
      4 bytes length.
      
      This patch includes original Patrick McHardy's patch entitled (nf_tables:
      inline nft_cmp_fast_eval() into main evaluation loop).
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      cb7dbfd0
    • Patrick McHardy's avatar
      netfilter: nf_tables: expression ops overloading · ef1f7df9
      Patrick McHardy authored
      Split the expression ops into two parts and support overloading of
      the runtime expression ops based on the requested function through
      a ->select_ops() callback.
      
      This can be used to provide optimized implementations, for instance
      for loading small aligned amounts of data from the packet or inlining
      frequently used operations into the main evaluation loop.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ef1f7df9
    • Patrick McHardy's avatar
      netfilter: nf_tables: add netlink set API · 20a69341
      Patrick McHardy authored
      This patch adds the new netlink API for maintaining nf_tables sets
      independently of the ruleset. The API supports the following operations:
      
      - creation of sets
      - deletion of sets
      - querying of specific sets
      - dumping of all sets
      
      - addition of set elements
      - removal of set elements
      - dumping of all set elements
      
      Sets are identified by name, each table defines an individual namespace.
      The name of a set may be allocated automatically, this is mostly useful
      in combination with the NFT_SET_ANONYMOUS flag, which destroys a set
      automatically once the last reference has been released.
      
      Sets can be marked constant, meaning they're not allowed to change while
      linked to a rule. This allows to perform lockless operation for set
      types that would otherwise require locking.
      
      Additionally, if the implementation supports it, sets can (as before) be
      used as maps, associating a data value with each key (or range), by
      specifying the NFT_SET_MAP flag and can be used for interval queries by
      specifying the NFT_SET_INTERVAL flag.
      
      Set elements are added and removed incrementally. All element operations
      support batching, reducing netlink message and set lookup overhead.
      
      The old "set" and "hash" expressions are replaced by a generic "lookup"
      expression, which binds to the specified set. Userspace is not aware
      of the actual set implementation used by the kernel anymore, all
      configuration options are generic.
      
      Currently the implementation selection logic is largely missing and the
      kernel will simply use the first registered implementation supporting the
      requested operation. Eventually, the plan is to have userspace supply a
      description of the data characteristics and select the implementation
      based on expected performance and memory use.
      
      This patch includes the new 'lookup' expression to look up for element
      matching in the set.
      
      This patch includes kernel-doc descriptions for this set API and it
      also includes the following fixes.
      
      From Patrick McHardy:
      * netfilter: nf_tables: fix set element data type in dumps
      * netfilter: nf_tables: fix indentation of struct nft_set_elem comments
      * netfilter: nf_tables: fix oops in nft_validate_data_load()
      * netfilter: nf_tables: fix oops while listing sets of built-in tables
      * netfilter: nf_tables: destroy anonymous sets immediately if binding fails
      * netfilter: nf_tables: propagate context to set iter callback
      * netfilter: nf_tables: add loop detection
      
      From Pablo Neira Ayuso:
      * netfilter: nf_tables: allow to dump all existing sets
      * netfilter: nf_tables: fix wrong type for flags variable in newelem
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      20a69341
    • Patrick McHardy's avatar
      netfilter: add nftables · 96518518
      Patrick McHardy authored
      This patch adds nftables which is the intended successor of iptables.
      This packet filtering framework reuses the existing netfilter hooks,
      the connection tracking system, the NAT subsystem, the transparent
      proxying engine, the logging infrastructure and the userspace packet
      queueing facilities.
      
      In a nutshell, nftables provides a pseudo-state machine with 4 general
      purpose registers of 128 bits and 1 specific purpose register to store
      verdicts. This pseudo-machine comes with an extensible instruction set,
      a.k.a. "expressions" in the nftables jargon. The expressions included
      in this patch provide the basic functionality, they are:
      
      * bitwise: to perform bitwise operations.
      * byteorder: to change from host/network endianess.
      * cmp: to compare data with the content of the registers.
      * counter: to enable counters on rules.
      * ct: to store conntrack keys into register.
      * exthdr: to match IPv6 extension headers.
      * immediate: to load data into registers.
      * limit: to limit matching based on packet rate.
      * log: to log packets.
      * meta: to match metainformation that usually comes with the skbuff.
      * nat: to perform Network Address Translation.
      * payload: to fetch data from the packet payload and store it into
        registers.
      * reject (IPv4 only): to explicitly close connection, eg. TCP RST.
      
      Using this instruction-set, the userspace utility 'nft' can transform
      the rules expressed in human-readable text representation (using a
      new syntax, inspired by tcpdump) to nftables bytecode.
      
      nftables also inherits the table, chain and rule objects from
      iptables, but in a more configurable way, and it also includes the
      original datatype-agnostic set infrastructure with mapping support.
      This set infrastructure is enhanced in the follow up patch (netfilter:
      nf_tables: add netlink set API).
      
      This patch includes the following components:
      
      * the netlink API: net/netfilter/nf_tables_api.c and
        include/uapi/netfilter/nf_tables.h
      * the packet filter core: net/netfilter/nf_tables_core.c
      * the expressions (described above): net/netfilter/nft_*.c
      * the filter tables: arp, IPv4, IPv6 and bridge:
        net/ipv4/netfilter/nf_tables_ipv4.c
        net/ipv6/netfilter/nf_tables_ipv6.c
        net/ipv4/netfilter/nf_tables_arp.c
        net/bridge/netfilter/nf_tables_bridge.c
      * the NAT table (IPv4 only):
        net/ipv4/netfilter/nf_table_nat_ipv4.c
      * the route table (similar to mangle):
        net/ipv4/netfilter/nf_table_route_ipv4.c
        net/ipv6/netfilter/nf_table_route_ipv6.c
      * internal definitions under:
        include/net/netfilter/nf_tables.h
        include/net/netfilter/nf_tables_core.h
      * It also includes an skeleton expression:
        net/netfilter/nft_expr_template.c
        and the preliminary implementation of the meta target
        net/netfilter/nft_meta_target.c
      
      It also includes a change in struct nf_hook_ops to add a new
      pointer to store private data to the hook, that is used to store
      the rule list per chain.
      
      This patch is based on the patch from Patrick McHardy, plus merged
      accumulated cleanups, fixes and small enhancements to the nftables
      code that has been done since 2009, which are:
      
      From Patrick McHardy:
      * nf_tables: adjust netlink handler function signatures
      * nf_tables: only retry table lookup after successful table module load
      * nf_tables: fix event notification echo and avoid unnecessary messages
      * nft_ct: add l3proto support
      * nf_tables: pass expression context to nft_validate_data_load()
      * nf_tables: remove redundant definition
      * nft_ct: fix maxattr initialization
      * nf_tables: fix invalid event type in nf_tables_getrule()
      * nf_tables: simplify nft_data_init() usage
      * nf_tables: build in more core modules
      * nf_tables: fix double lookup expression unregistation
      * nf_tables: move expression initialization to nf_tables_core.c
      * nf_tables: build in payload module
      * nf_tables: use NFPROTO constants
      * nf_tables: rename pid variables to portid
      * nf_tables: save 48 bits per rule
      * nf_tables: introduce chain rename
      * nf_tables: check for duplicate names on chain rename
      * nf_tables: remove ability to specify handles for new rules
      * nf_tables: return error for rule change request
      * nf_tables: return error for NLM_F_REPLACE without rule handle
      * nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification
      * nf_tables: fix NLM_F_MULTI usage in netlink notifications
      * nf_tables: include NLM_F_APPEND in rule dumps
      
      From Pablo Neira Ayuso:
      * nf_tables: fix stack overflow in nf_tables_newrule
      * nf_tables: nft_ct: fix compilation warning
      * nf_tables: nft_ct: fix crash with invalid packets
      * nft_log: group and qthreshold are 2^16
      * nf_tables: nft_meta: fix socket uid,gid handling
      * nft_counter: allow to restore counters
      * nf_tables: fix module autoload
      * nf_tables: allow to remove all rules placed in one chain
      * nf_tables: use 64-bits rule handle instead of 16-bits
      * nf_tables: fix chain after rule deletion
      * nf_tables: improve deletion performance
      * nf_tables: add missing code in route chain type
      * nf_tables: rise maximum number of expressions from 12 to 128
      * nf_tables: don't delete table if in use
      * nf_tables: fix basechain release
      
      From Tomasz Bursztyka:
      * nf_tables: Add support for changing users chain's name
      * nf_tables: Change chain's name to be fixed sized
      * nf_tables: Add support for replacing a rule by another one
      * nf_tables: Update uapi nftables netlink header documentation
      
      From Florian Westphal:
      * nft_log: group is u16, snaplen u32
      
      From Phil Oester:
      * nf_tables: operational limit match
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      96518518
    • Pablo Neira Ayuso's avatar
      netfilter: nf_nat: move alloc_null_binding to nf_nat_core.c · f59cb045
      Pablo Neira Ayuso authored
      Similar to nat_decode_session, alloc_null_binding is needed for both
      ip_tables and nf_tables, so move it to nf_nat_core.c. This change
      is required by nf_tables.
      
      This is an adapted version of the original patch from Patrick McHardy.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f59cb045
    • Patrick McHardy's avatar
      netfilter: pass hook ops to hookfn · 795aa6ef
      Patrick McHardy authored
      Pass the hook ops to the hookfn to allow for generic hook
      functions. This change is required by nf_tables.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      795aa6ef
  3. 12 Oct, 2013 4 commits