1. 27 Aug, 2024 8 commits
    • Daniel Borkmann's avatar
      netkit: Disable netpoll support · d9660879
      Daniel Borkmann authored
      Follow-up to 45160ceb ("net: veth: Disable netpoll support") to
      also disable netpoll for netkit interfaces. Same conditions apply
      here as well.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Breno Leitao <leitao@debian.org>
      Cc: Nikolay Aleksandrov <razor@blackwall.org>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Reviewed-by: default avatarBreno Leitao <leitao@debian.org>
      Link: https://lore.kernel.org/r/eab2d69ba2f4c260aef62e4ff0d803e9f60c2c5d.1724414250.git.daniel@iogearbox.netSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      d9660879
    • Paolo Abeni's avatar
      Merge branch 'tc-adjust-network-header-after-2nd-vlan-push' · f8fdda9e
      Paolo Abeni authored
      Boris Sukholitko says:
      
      ====================
      tc: adjust network header after 2nd vlan push
      
      <tldr>
      skb network header of the single-tagged vlan packet continues to point the
      vlan payload (e.g. IP) after second vlan tag is pushed by tc act_vlan. This
      causes problem at the dissector which expects double-tagged packet network
      header to point to the inner vlan.
      
      The fix is to adjust network header in tcf_act_vlan.c but requires
      refactoring of skb_vlan_push function.
      </tldr>
      
      Consider the following shell script snippet configuring TC rules on the
      veth interface:
      
      ip link add veth0 type veth peer veth1
      ip link set veth0 up
      ip link set veth1 up
      
      tc qdisc add dev veth0 clsact
      
      tc filter add dev veth0 ingress pref 10 chain 0 flower \
      	num_of_vlans 2 cvlan_ethtype 0x800 action goto chain 5
      tc filter add dev veth0 ingress pref 20 chain 0 flower \
      	num_of_vlans 1 action vlan push id 100 \
      	protocol 0x8100 action goto chain 5
      tc filter add dev veth0 ingress pref 30 chain 5 flower \
      	num_of_vlans 2 cvlan_ethtype 0x800 action simple sdata "success"
      
      Sending double-tagged vlan packet with the IP payload inside:
      
      cat <<ENDS | text2pcap - - | tcpreplay -i veth1 -
      0000  00 00 00 00 00 11 00 00 00 00 00 22 81 00 00 64   ..........."...d
      0010  81 00 00 14 08 00 45 04 00 26 04 d2 00 00 7f 11   ......E..&......
      0020  18 ef 0a 00 00 01 14 00 00 02 00 00 00 00 00 12   ................
      0030  e1 c7 00 00 00 00 00 00 00 00 00 00               ............
      ENDS
      
      will match rule 10, goto rule 30 in chain 5 and correctly emit "success" to
      the dmesg.
      
      OTOH, sending single-tagged vlan packet:
      
      cat <<ENDS | text2pcap - - | tcpreplay -i veth1 -
      0000  00 00 00 00 00 11 00 00 00 00 00 22 81 00 00 14   ..........."....
      0010  08 00 45 04 00 2a 04 d2 00 00 7f 11 18 eb 0a 00   ..E..*..........
      0020  00 01 14 00 00 02 00 00 00 00 00 16 e1 bf 00 00   ................
      0030  00 00 00 00 00 00 00 00 00 00 00 00               ............
      ENDS
      
      will match rule 20, will push the second vlan tag but will *not* match
      rule 30. IOW, the match at rule 30 fails if the second vlan was freshly
      pushed by the kernel.
      
      Lets look at  __skb_flow_dissect working on the double-tagged vlan packet.
      Here is the relevant code from around net/core/flow_dissector.c:1277
      copy-pasted here for convenience:
      
      	if (dissector_vlan == FLOW_DISSECTOR_KEY_MAX &&
      	    skb && skb_vlan_tag_present(skb)) {
      		proto = skb->protocol;
      	} else {
      		vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan),
      					    data, hlen, &_vlan);
      		if (!vlan) {
      			fdret = FLOW_DISSECT_RET_OUT_BAD;
      			break;
      		}
      
      		proto = vlan->h_vlan_encapsulated_proto;
      		nhoff += sizeof(*vlan);
      	}
      
      The "else" clause above gets the protocol of the encapsulated packet from
      the skb data at the network header location. printk debugging has showed
      that in the good double-tagged packet case proto is
      htons(0x800 == ETH_P_IP) as expected. However in the single-tagged packet
      case proto is garbage leading to the failure to match tc filter 30.
      
      proto is being set from the skb header pointed by nhoff parameter which is
      defined at the beginning of __skb_flow_dissect
      (net/core/flow_dissector.c:1055 in the current version):
      
      		nhoff = skb_network_offset(skb);
      
      Therefore the culprit seems to be that the skb network offset is different
      between double-tagged packet received from the interface and single-tagged
      packet having its vlan tag pushed by TC.
      
      Lets look at the interesting points of the lifetime of the single/double
      tagged packets as they traverse our packet flow.
      
      Both of them will start at __netif_receive_skb_core where the first vlan
      tag will be stripped:
      
      	if (eth_type_vlan(skb->protocol)) {
      		skb = skb_vlan_untag(skb);
      		if (unlikely(!skb))
      			goto out;
      	}
      
      At this stage in double-tagged case skb->data points to the second vlan tag
      while in single-tagged case skb->data points to the network (eg. IP)
      header.
      
      Looking at TC vlan push action (net/sched/act_vlan.c) we have the following
      code at tcf_vlan_act (interesting points are in square brackets):
      
      	if (skb_at_tc_ingress(skb))
      [1]		skb_push_rcsum(skb, skb->mac_len);
      
      	....
      
      	case TCA_VLAN_ACT_PUSH:
      		err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid |
      				    (p->tcfv_push_prio << VLAN_PRIO_SHIFT),
      				    0);
      		if (err)
      			goto drop;
      		break;
      
      	....
      
      out:
      	if (skb_at_tc_ingress(skb))
      [3]		skb_pull_rcsum(skb, skb->mac_len);
      
      And skb_vlan_push (net/core/skbuff.c:6204) function does:
      
      		err = __vlan_insert_tag(skb, skb->vlan_proto,
      					skb_vlan_tag_get(skb));
      		if (err)
      			return err;
      
      		skb->protocol = skb->vlan_proto;
      [2]		skb->mac_len += VLAN_HLEN;
      
      in the case of pushing the second tag. Lets look at what happens with
      skb->data of the single-tagged packet at each of the above points:
      
      1. As a result of the skb_push_rcsum, skb->data is moved back to the start
         of the packet.
      
      2. First VLAN tag is moved from the skb into packet buffer, skb->mac_len is
         incremented, skb->data still points to the start of the packet.
      
      3. As a result of the skb_pull_rcsum, skb->data is moved forward by the
         modified skb->mac_len, thus pointing to the network header again.
      
      Then __skb_flow_dissect will get confused by having double-tagged vlan
      packet with the skb->data at the network header.
      
      The solution for the bug is to preserve "skb->data at second vlan header"
      semantics in the skb_vlan_push function. We do this by manipulating
      skb->network_header rather than skb->mac_len. skb_vlan_push callers are
      updated to do skb_reset_mac_len.
      
      More about the patch series:
      
      * patch 1 fixes skb_vlan_push and the callers
      * patch 2 adds ingress tc_actions test
      * patch 3 adds egress tc_actions test
      ====================
      
      Link: https://patch.msgid.link/20240822103510.468293-1-boris.sukholitko@broadcom.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f8fdda9e
    • Boris Sukholitko's avatar
      selftests: tc_actions: test egress 2nd vlan push · 2da44703
      Boris Sukholitko authored
      Add new test checking the correctness of inner vlan flushing to the skb
      data when outer vlan tag is added through act_vlan on egress.
      Signed-off-by: default avatarBoris Sukholitko <boris.sukholitko@broadcom.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2da44703
    • Boris Sukholitko's avatar
      selftests: tc_actions: test ingress 2nd vlan push · 59c330ec
      Boris Sukholitko authored
      Add new test checking the correctness of inner vlan flushing to the skb
      data when outer vlan tag is added through act_vlan on ingress.
      Signed-off-by: default avatarBoris Sukholitko <boris.sukholitko@broadcom.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      59c330ec
    • Boris Sukholitko's avatar
      tc: adjust network header after 2nd vlan push · 93886372
      Boris Sukholitko authored
      <tldr>
      skb network header of the single-tagged vlan packet continues to point the
      vlan payload (e.g. IP) after second vlan tag is pushed by tc act_vlan. This
      causes problem at the dissector which expects double-tagged packet network
      header to point to the inner vlan.
      
      The fix is to adjust network header in tcf_act_vlan.c but requires
      refactoring of skb_vlan_push function.
      </tldr>
      
      Consider the following shell script snippet configuring TC rules on the
      veth interface:
      
      ip link add veth0 type veth peer veth1
      ip link set veth0 up
      ip link set veth1 up
      
      tc qdisc add dev veth0 clsact
      
      tc filter add dev veth0 ingress pref 10 chain 0 flower \
      	num_of_vlans 2 cvlan_ethtype 0x800 action goto chain 5
      tc filter add dev veth0 ingress pref 20 chain 0 flower \
      	num_of_vlans 1 action vlan push id 100 \
      	protocol 0x8100 action goto chain 5
      tc filter add dev veth0 ingress pref 30 chain 5 flower \
      	num_of_vlans 2 cvlan_ethtype 0x800 action simple sdata "success"
      
      Sending double-tagged vlan packet with the IP payload inside:
      
      cat <<ENDS | text2pcap - - | tcpreplay -i veth1 -
      0000  00 00 00 00 00 11 00 00 00 00 00 22 81 00 00 64   ..........."...d
      0010  81 00 00 14 08 00 45 04 00 26 04 d2 00 00 7f 11   ......E..&......
      0020  18 ef 0a 00 00 01 14 00 00 02 00 00 00 00 00 12   ................
      0030  e1 c7 00 00 00 00 00 00 00 00 00 00               ............
      ENDS
      
      will match rule 10, goto rule 30 in chain 5 and correctly emit "success" to
      the dmesg.
      
      OTOH, sending single-tagged vlan packet:
      
      cat <<ENDS | text2pcap - - | tcpreplay -i veth1 -
      0000  00 00 00 00 00 11 00 00 00 00 00 22 81 00 00 14   ..........."....
      0010  08 00 45 04 00 2a 04 d2 00 00 7f 11 18 eb 0a 00   ..E..*..........
      0020  00 01 14 00 00 02 00 00 00 00 00 16 e1 bf 00 00   ................
      0030  00 00 00 00 00 00 00 00 00 00 00 00               ............
      ENDS
      
      will match rule 20, will push the second vlan tag but will *not* match
      rule 30. IOW, the match at rule 30 fails if the second vlan was freshly
      pushed by the kernel.
      
      Lets look at  __skb_flow_dissect working on the double-tagged vlan packet.
      Here is the relevant code from around net/core/flow_dissector.c:1277
      copy-pasted here for convenience:
      
      	if (dissector_vlan == FLOW_DISSECTOR_KEY_MAX &&
      	    skb && skb_vlan_tag_present(skb)) {
      		proto = skb->protocol;
      	} else {
      		vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan),
      					    data, hlen, &_vlan);
      		if (!vlan) {
      			fdret = FLOW_DISSECT_RET_OUT_BAD;
      			break;
      		}
      
      		proto = vlan->h_vlan_encapsulated_proto;
      		nhoff += sizeof(*vlan);
      	}
      
      The "else" clause above gets the protocol of the encapsulated packet from
      the skb data at the network header location. printk debugging has showed
      that in the good double-tagged packet case proto is
      htons(0x800 == ETH_P_IP) as expected. However in the single-tagged packet
      case proto is garbage leading to the failure to match tc filter 30.
      
      proto is being set from the skb header pointed by nhoff parameter which is
      defined at the beginning of __skb_flow_dissect
      (net/core/flow_dissector.c:1055 in the current version):
      
      		nhoff = skb_network_offset(skb);
      
      Therefore the culprit seems to be that the skb network offset is different
      between double-tagged packet received from the interface and single-tagged
      packet having its vlan tag pushed by TC.
      
      Lets look at the interesting points of the lifetime of the single/double
      tagged packets as they traverse our packet flow.
      
      Both of them will start at __netif_receive_skb_core where the first vlan
      tag will be stripped:
      
      	if (eth_type_vlan(skb->protocol)) {
      		skb = skb_vlan_untag(skb);
      		if (unlikely(!skb))
      			goto out;
      	}
      
      At this stage in double-tagged case skb->data points to the second vlan tag
      while in single-tagged case skb->data points to the network (eg. IP)
      header.
      
      Looking at TC vlan push action (net/sched/act_vlan.c) we have the following
      code at tcf_vlan_act (interesting points are in square brackets):
      
      	if (skb_at_tc_ingress(skb))
      [1]		skb_push_rcsum(skb, skb->mac_len);
      
      	....
      
      	case TCA_VLAN_ACT_PUSH:
      		err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid |
      				    (p->tcfv_push_prio << VLAN_PRIO_SHIFT),
      				    0);
      		if (err)
      			goto drop;
      		break;
      
      	....
      
      out:
      	if (skb_at_tc_ingress(skb))
      [3]		skb_pull_rcsum(skb, skb->mac_len);
      
      And skb_vlan_push (net/core/skbuff.c:6204) function does:
      
      		err = __vlan_insert_tag(skb, skb->vlan_proto,
      					skb_vlan_tag_get(skb));
      		if (err)
      			return err;
      
      		skb->protocol = skb->vlan_proto;
      [2]		skb->mac_len += VLAN_HLEN;
      
      in the case of pushing the second tag. Lets look at what happens with
      skb->data of the single-tagged packet at each of the above points:
      
      1. As a result of the skb_push_rcsum, skb->data is moved back to the start
         of the packet.
      
      2. First VLAN tag is moved from the skb into packet buffer, skb->mac_len is
         incremented, skb->data still points to the start of the packet.
      
      3. As a result of the skb_pull_rcsum, skb->data is moved forward by the
         modified skb->mac_len, thus pointing to the network header again.
      
      Then __skb_flow_dissect will get confused by having double-tagged vlan
      packet with the skb->data at the network header.
      
      The solution for the bug is to preserve "skb->data at second vlan header"
      semantics in the skb_vlan_push function. We do this by manipulating
      skb->network_header rather than skb->mac_len. skb_vlan_push callers are
      updated to do skb_reset_mac_len.
      Signed-off-by: default avatarBoris Sukholitko <boris.sukholitko@broadcom.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      93886372
    • Jakub Kicinski's avatar
      Merge branch 'add-embedded-sync-feature-for-a-dpll-s-pin' · d0cb324c
      Jakub Kicinski authored
      Arkadiusz Kubalewski says:
      
      ====================
      Add Embedded SYNC feature for a dpll's pin
      
      Introduce and allow DPLL subsystem users to get/set capabilities of
      Embedded SYNC on a dpll's pin.
      Signed-off-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      ====================
      
      Link: https://patch.msgid.link/20240822222513.255179-1-arkadiusz.kubalewski@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d0cb324c
    • Arkadiusz Kubalewski's avatar
      ice: add callbacks for Embedded SYNC enablement on dpll pins · 87abc566
      Arkadiusz Kubalewski authored
      Allow the user to get and set configuration of Embedded SYNC feature
      on the ice driver dpll pins.
      Reviewed-by: default avatarAleksandr Loktionov <aleksandr.loktionov@intel.com>
      Signed-off-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://patch.msgid.link/20240822222513.255179-3-arkadiusz.kubalewski@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      87abc566
    • Arkadiusz Kubalewski's avatar
      dpll: add Embedded SYNC feature for a pin · cda1fba1
      Arkadiusz Kubalewski authored
      Implement and document new pin attributes for providing Embedded SYNC
      capabilities to the DPLL subsystem users through a netlink pin-get
      do/dump messages. Allow the user to set Embedded SYNC frequency with
      pin-set do netlink message.
      Reviewed-by: default avatarAleksandr Loktionov <aleksandr.loktionov@intel.com>
      Signed-off-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://patch.msgid.link/20240822222513.255179-2-arkadiusz.kubalewski@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cda1fba1
  2. 26 Aug, 2024 32 commits