1. 13 May, 2024 1 commit
    • Jakub Kicinski's avatar
      Merge tag 'nf-next-24-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next · c85e41bf
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for net-next:
      
      Patch #1 skips transaction if object type provides no .update interface.
      
      Patch #2 skips NETDEV_CHANGENAME which is unused.
      
      Patch #3 enables conntrack to handle Multicast Router Advertisements and
      	 Multicast Router Solicitations from the Multicast Router Discovery
      	 protocol (RFC4286) as untracked opposed to invalid packets.
      	 From Linus Luessing.
      
      Patch #4 updates DCCP conntracker to mark invalid as invalid, instead of
      	 dropping them, from Jason Xing.
      
      Patch #5 uses NF_DROP instead of -NF_DROP since NF_DROP is 0,
      	 also from Jason.
      
      Patch #6 removes reference in netfilter's sysctl documentation on pickup
      	 entries which were already removed by Florian Westphal.
      
      Patch #7 removes check for IPS_OFFLOAD flag to disable early drop which
      	 allows to evict entries from the conntrack table,
      	 also from Florian.
      
      Patches #8 to #16 updates nf_tables pipapo set backend to allocate
      	 the datastructure copy on-demand from preparation phase,
      	 to better deal with OOM situations where .commit step is too late
      	 to fail. Series from Florian Westphal.
      
      Patch #17 adds a selftest with packetdrill to cover conntrack TCP state
      	 transitions, also from Florian.
      
      Patch #18 use GFP_KERNEL to clone elements from control plane to avoid
      	 quick atomic reserves exhaustion with large sets, reporter refers
      	 to million entries magnitude.
      
      * tag 'nf-next-24-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
        netfilter: nf_tables: allow clone callbacks to sleep
        selftests: netfilter: add packetdrill based conntrack tests
        netfilter: nft_set_pipapo: remove dirty flag
        netfilter: nft_set_pipapo: move cloning of match info to insert/removal path
        netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone
        netfilter: nft_set_pipapo: merge deactivate helper into caller
        netfilter: nft_set_pipapo: prepare walk function for on-demand clone
        netfilter: nft_set_pipapo: prepare destroy function for on-demand clone
        netfilter: nft_set_pipapo: make pipapo_clone helper return NULL
        netfilter: nft_set_pipapo: move prove_locking helper around
        netfilter: conntrack: remove flowtable early-drop test
        netfilter: conntrack: documentation: remove reference to non-existent sysctl
        netfilter: use NF_DROP instead of -NF_DROP
        netfilter: conntrack: dccp: try not to drop skb in conntrack
        netfilter: conntrack: fix ct-state for ICMPv6 Multicast Router Discovery
        netfilter: nf_tables: remove NETDEV_CHANGENAME from netdev chain event handler
        netfilter: nf_tables: skip transaction if update object is not implemented
      ====================
      
      Link: https://lore.kernel.org/r/20240512161436.168973-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c85e41bf
  2. 11 May, 2024 28 commits
  3. 10 May, 2024 9 commits
    • David S. Miller's avatar
      Merge tag 'gtp-24-05-07' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/gtp · f8beae07
      David S. Miller authored
      Pablo neira Ayuso says:
      
      ====================
      gtp pull request 24-05-07
      
      This v3 includes:
      - fix for clang uninitialized variable per Jakub.
      - address Smatch and Coccinelle reports per Simon
      - remove inline in new IPv6 support per Simon
      - fix memleaks in netlink control plane per Simon
      -o-
      
      The following patchset contains IPv6 GTP driver support for net-next,
      this also includes IPv6 over IPv4 and vice-versa:
      
      Patch #1 removes a unnecessary stack variable initialization in the
               socket routine.
      
      Patch #2 deals with GTP extension headers. This variable length extension
               header to decapsulate packets accordingly. Otherwise, packets are
               dropped when these extension headers are present which breaks
               interoperation with other non-Linux based GTP implementations.
      
      Patch #3 prepares for IPv6 support by moving IPv4 specific fields in PDP
               context objects to a union.
      
      Patch #4 adds IPv6 support while retaining backward compatibility.
               Three new attributes allows to declare an IPv6 GTP tunnel
               GTPA_FAMILY, GTPA_PEER_ADDR6 and GTPA_MS_ADDR6 as well as
               IFLA_GTP_LOCAL6 to declare the IPv6 GTP UDP socket. Up to this
               patch, only IPv6 outer in IPv6 inner is supported.
      
      Patch #5 uses IPv6 address /64 prefix for UE/MS in the inner headers.
               Unlike IPv4, which provides a 1:1 mapping between UE/MS,
               IPv6 tunnel encapsulates traffic for /64 address as specified
               by 3GPP TS. Patch has been split from Patch #4 to highlight
               this behaviour.
      
      Patch #6 passes up IPv6 link-local traffic, such as IPv6 SLAAC, for
               handling to userspace so they are handled as control packets.
      
      Patch #7 prepares to allow for GTP IPv4 over IPv6 and vice-versa by
               moving IP specific debugging out of the function to build
               IPv4 and IPv6 GTP packets.
      
      Patch #8 generalizes TOS/DSCP handling following similar approach as
               in the existing iptunnel infrastructure.
      
      Patch #9 adds a helper function to build an IPv4 GTP packet in the outer
               header.
      
      Patch #10 adds a helper function to build an IPv6 GTP packet in the outer
                header.
      
      Patch #11 adds support for GTP IPv4-over-IPv6 and vice-versa.
      
      Patch #12 allows to use the same TID/TEID (tunnel identifier) for inner
                IPv4 and IPv6 packets for better UE/MS dual stack integration.
      
      This series integrates with the osmocom.org project CI and TTCN-3 test
      infrastructure (Oliver Smith) as well as the userspace libgtpnl library.
      
      Thanks to Harald Welte, Oliver Smith and Pau Espin for reviewing and
      providing feedback through the osmocom.org redmine platform to make this
      happen.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8beae07
    • Florian Westphal's avatar
      netfilter: nf_tables: allow clone callbacks to sleep · fa23e0d4
      Florian Westphal authored
      Sven Auhagen reports transaction failures with following error:
        ./main.nft:13:1-26: Error: Could not process rule: Cannot allocate memory
        percpu: allocation failed, size=16 align=8 atomic=1, atomic alloc failed, no space left
      
      This points to failing pcpu allocation with GFP_ATOMIC flag.
      However, transactions happen from user context and are allowed to sleep.
      
      One case where we can call into percpu allocator with GFP_ATOMIC is
      nft_counter expression.
      
      Normally this happens from control plane, so this could use GFP_KERNEL
      instead.  But one use case, element insertion from packet path,
      needs to use GFP_ATOMIC allocations (nft_dynset expression).
      
      At this time, .clone callbacks always use GFP_ATOMIC for this reason.
      
      Add gfp_t argument to the .clone function and pass GFP_KERNEL or
      GFP_ATOMIC flag depending on context, this allows all clone memory
      allocations to sleep for the normal (transaction) case.
      
      Cc: Sven Auhagen <sven.auhagen@voleatech.de>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      fa23e0d4
    • Florian Westphal's avatar
      selftests: netfilter: add packetdrill based conntrack tests · a8a388c2
      Florian Westphal authored
      Add a new test script that uses packetdrill tool to exercise conntrack
      state machine.
      
      Needs ip/ip6tables and conntrack tool (to check if we have an entry in
      the expected state).
      
      Test cases added here cover following scenarios:
      1. already-acked (retransmitted) packets are not tagged as INVALID
      2. RST packet coming when conntrack is already closing (FIN/CLOSE_WAIT)
        transitions conntrack to CLOSE even if the RST is not an exact match
      3. RST packets with out-of-window sequence numbers are marked as INVALID
      4. SYN+Challenge ACK: check that challenge ack is allowed to pass
      5. Old SYN/ACK: check conntrack handles the case where SYN is answered
        with SYN/ACK for an old, previous connection attempt
      6. Check SYN reception while in ESTABLISHED state generates a challenge
         ack, RST response clears 'outdated' state + next SYN retransmit gets
         us into 'SYN_RECV' conntrack state.
      
      Tests get run twice, once with ipv4 and once with ipv6.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a8a388c2
    • Florian Westphal's avatar
      netfilter: nft_set_pipapo: remove dirty flag · 532aec7e
      Florian Westphal authored
      After previous change:
       ->clone exists: ->dirty is always true
       ->clone == NULL ->dirty is always false
      
      So remove this flag.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      532aec7e
    • Florian Westphal's avatar
      netfilter: nft_set_pipapo: move cloning of match info to insert/removal path · 3f1d886c
      Florian Westphal authored
      This set type keeps two copies of the sets' content,
         priv->match (live version, used to match from packet path)
         priv->clone (work-in-progress version of the 'future' priv->match).
      
      All additions and removals are done on priv->clone.  When transaction
      completes, priv->clone becomes priv->match and a new clone is allocated
      for use by next transaction.
      
      Problem is that the cloning requires GFP_KERNEL allocations but we
      cannot fail at either commit or abort time.
      
      This patch defers the clone until we get an insertion or removal
      request.  This allows us to handle OOM situations correctly.
      
      This also allows to remove ->dirty in a followup change:
      
      If ->clone exists, ->dirty is always true
      If ->clone is NULL, ->dirty is always false, no elements were added
      or removed (except catchall elements which are external to the specific
      set backend).
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      3f1d886c
    • Florian Westphal's avatar
      netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone · a2381067
      Florian Westphal authored
      The helper uses priv->clone unconditionally which will fail once we do
      the clone conditionally on first insert or removal.
      
      'nft get element' from userspace needs to use priv->match since this
      runs from rcu read side lock section.
      
      Prepare for this by passing the match backend data as argument.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a2381067
    • gaoxingwang's avatar
      net: ipv6: fix wrong start position when receive hop-by-hop fragment · 1cd354fe
      gaoxingwang authored
      In IPv6, ipv6_rcv_core will parse the hop-by-hop type extension header and increase skb->transport_header by one extension header length.
      But if there are more other extension headers like fragment header at this time, the skb->transport_header points to the second extension header,
      not the transport layer header or the first extension header.
      
      This will result in the start and nexthdrp variable not pointing to the same position in ipv6frag_thdr_trunced,
      and ipv6_skip_exthdr returning incorrect offset and frag_off.Sometimes,the length of the last sharded packet is smaller than the calculated incorrect offset, resulting in packet loss.
      We can use network header to offset and calculate the correct position to solve this problem.
      
      Fixes: 9d9e937b (ipv6/netfilter: Discard first fragment not including all headers)
      Signed-off-by: default avatarGao Xingwang <gaoxingwang1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cd354fe
    • Eric Dumazet's avatar
      tcp: get rid of twsk_unique() · 383eed2d
      Eric Dumazet authored
      DCCP is going away soon, and had no twsk_unique() method.
      
      We can directly call tcp_twsk_unique() for TCP sockets.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240507164140.940547-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      383eed2d
    • Praveen Kumar Kannoju's avatar
      net/sched: adjust device watchdog timer to detect stopped queue at right time · 33fb988b
      Praveen Kumar Kannoju authored
      Applications are sensitive to long network latency, particularly
      heartbeat monitoring ones. Longer the tx timeout recovery higher the
      risk with such applications on a production machines. This patch
      remedies, yet honoring device set tx timeout.
      
      Modify watchdog next timeout to be shorter than the device specified.
      Compute the next timeout be equal to device watchdog timeout less the
      how long ago queue stop had been done. At next watchdog timeout tx
      timeout handler is called into if still in stopped state. Either called
      or not called, restore the watchdog timeout back to device specified.
      Signed-off-by: default avatarPraveen Kumar Kannoju <praveen.kannoju@oracle.com>
      Link: https://lore.kernel.org/r/20240508133617.4424-1-praveen.kannoju@oracle.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      33fb988b
  4. 09 May, 2024 2 commits