1. 22 Apr, 2022 16 commits
  2. 21 Apr, 2022 4 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 59f0c244
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from xfrm and can.
      
        Current release - regressions:
      
         - rxrpc: restore removed timer deletion
      
        Current release - new code bugs:
      
         - gre: fix device lookup for l3mdev use-case
      
         - xfrm: fix egress device lookup for l3mdev use-case
      
        Previous releases - regressions:
      
         - sched: cls_u32: fix netns refcount changes in u32_change()
      
         - smc: fix sock leak when release after smc_shutdown()
      
         - xfrm: limit skb_page_frag_refill use to a single page
      
         - eth: atlantic: invert deep par in pm functions, preventing null
           derefs
      
         - eth: stmmac: use readl_poll_timeout_atomic() in atomic state
      
        Previous releases - always broken:
      
         - gre: fix skb_under_panic on xmit
      
         - openvswitch: fix OOB access in reserve_sfa_size()
      
         - dsa: hellcreek: calculate checksums in tagger
      
         - eth: ice: fix crash in switchdev mode
      
         - eth: igc:
            - fix infinite loop in release_swfw_sync
            - fix scheduling while atomic"
      
      * tag 'net-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits)
        drivers: net: hippi: Fix deadlock in rr_close()
        selftests: mlxsw: vxlan_flooding_ipv6: Prevent flooding of unwanted packets
        selftests: mlxsw: vxlan_flooding: Prevent flooding of unwanted packets
        nfc: MAINTAINERS: add Bug entry
        net: stmmac: Use readl_poll_timeout_atomic() in atomic state
        doc/ip-sysctl: add bc_forwarding
        netlink: reset network and mac headers in netlink_dump()
        net: mscc: ocelot: fix broken IP multicast flooding
        net: dsa: hellcreek: Calculate checksums in tagger
        net: atlantic: invert deep par in pm functions, preventing null derefs
        can: isotp: stop timeout monitoring when no first frame was sent
        bonding: do not discard lowest hash bit for non layer3+4 hashing
        net: lan966x: Make sure to release ptp interrupt
        ipv6: make ip6_rt_gc_expire an atomic_t
        net: Handle l3mdev in ip_tunnel_init_flow
        l3mdev: l3mdev_master_upper_ifindex_by_index_rcu should be using netdev_master_upper_dev_get_rcu
        net/sched: cls_u32: fix possible leak in u32_init_knode()
        net/sched: cls_u32: fix netns refcount changes in u32_change()
        powerpc: Update MAINTAINERS for ibmvnic and VAS
        net: restore alpha order to Ethernet devices in config
        ...
      59f0c244
    • Haowen Bai's avatar
      net: eql: Use kzalloc instead of kmalloc/memset · 9c8774e6
      Haowen Bai authored
      Use kzalloc rather than duplicating its implementation, which
      makes code simple and easy to understand.
      Signed-off-by: default avatarHaowen Bai <baihaowen@meizu.com>
      Link: https://lore.kernel.org/r/1650277333-31090-1-git-send-email-baihaowen@meizu.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9c8774e6
    • Minghao Chi's avatar
      4facbe3d
    • Duoming Zhou's avatar
      drivers: net: hippi: Fix deadlock in rr_close() · bc6de287
      Duoming Zhou authored
      There is a deadlock in rr_close(), which is shown below:
      
         (Thread 1)                |      (Thread 2)
                                   | rr_open()
      rr_close()                   |  add_timer()
       spin_lock_irqsave() //(1)   |  (wait a time)
       ...                         | rr_timer()
       del_timer_sync()            |  spin_lock_irqsave() //(2)
       (wait timer to stop)        |  ...
      
      We hold rrpriv->lock in position (1) of thread 1 and
      use del_timer_sync() to wait timer to stop, but timer handler
      also need rrpriv->lock in position (2) of thread 2.
      As a result, rr_close() will block forever.
      
      This patch extracts del_timer_sync() from the protection of
      spin_lock_irqsave(), which could let timer handler to obtain
      the needed lock.
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Link: https://lore.kernel.org/r/20220417125519.82618-1-duoming@zju.edu.cnSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      bc6de287
  3. 20 Apr, 2022 20 commits
    • Linus Torvalds's avatar
      Merge tag 'xtensa-20220416' of https://github.com/jcmvbkbc/linux-xtensa · b2534357
      Linus Torvalds authored
      Pull xtensa fixes from Max Filippov:
      
       - fix patching CPU selection in patch_text
      
       - fix potential deadlock in ISS platform serial driver
      
       - fix potential register clobbering in coprocessor exception handler
      
      * tag 'xtensa-20220416' of https://github.com/jcmvbkbc/linux-xtensa:
        xtensa: fix a7 clobbering in coprocessor context load/store
        arch: xtensa: platforms: Fix deadlock in rs_close()
        xtensa: patch_text: Fixup last cpu should be master
      b2534357
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-5.18-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · 10c5f102
      Linus Torvalds authored
      Pull erofs fixes from Gao Xiang:
       "One patch to fix a use-after-free race related to the on-stack
        z_erofs_decompressqueue, which happens very rarely but needs to be
        fixed properly soon.
      
        The other patch fixes some sysfs Sphinx warnings"
      
      * tag 'erofs-for-5.18-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        Documentation/ABI: sysfs-fs-erofs: Fix Sphinx errors
        erofs: fix use-after-free of on-stack io[]
      10c5f102
    • Linus Torvalds's avatar
      Revert "fs/pipe: use kvcalloc to allocate a pipe_buffer array" · 906f9040
      Linus Torvalds authored
      This reverts commit 5a519c8f.
      
      It turns out that making the pipe almost arbitrarily large has some
      rather unexpected downsides.  The kernel test robot reports a kernel
      warning that is due to pipe->max_usage now growing to the point where
      the iter_file_splice_write() buffer allocation can no longer be
      satisfied as a slab allocation, and the
      
              int nbufs = pipe->max_usage;
              struct bio_vec *array = kcalloc(nbufs, sizeof(struct bio_vec),
                                              GFP_KERNEL);
      
      code sequence there will now always fail as a result.
      
      That code could be modified to use kvcalloc() too, but I feel very
      uncomfortable making those kinds of changes for a very niche use case
      that really should have other options than make these kinds of
      fundamental changes to pipe behavior.
      
      Maybe the CRIU process dumping should be multi-threaded, and use
      multiple pipes and multiple cores, rather than try to use one larger
      pipe to minimize splice() calls.
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Link: https://lore.kernel.org/all/20220420073717.GD16310@xsang-OptiPlex-9020/
      Cc: Andrei Vagin <avagin@gmail.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      906f9040
    • Mikulas Patocka's avatar
      x86: __memcpy_flushcache: fix wrong alignment if size > 2^32 · a6823e4e
      Mikulas Patocka authored
      The first "if" condition in __memcpy_flushcache is supposed to align the
      "dest" variable to 8 bytes and copy data up to this alignment.  However,
      this condition may misbehave if "size" is greater than 4GiB.
      
      The statement min_t(unsigned, size, ALIGN(dest, 8) - dest); casts both
      arguments to unsigned int and selects the smaller one.  However, the
      cast truncates high bits in "size" and it results in misbehavior.
      
      For example:
      
      	suppose that size == 0x100000001, dest == 0x200000002
      	min_t(unsigned, size, ALIGN(dest, 8) - dest) == min_t(0x1, 0xe) == 0x1;
      	...
      	dest += 0x1;
      
      so we copy just one byte "and" dest remains unaligned.
      
      This patch fixes the bug by replacing unsigned with size_t.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a6823e4e
    • Ido Schimmel's avatar
      selftests: mlxsw: vxlan_flooding_ipv6: Prevent flooding of unwanted packets · 5e624215
      Ido Schimmel authored
      The test verifies that packets are correctly flooded by the bridge and
      the VXLAN device by matching on the encapsulated packets at the other
      end. However, if packets other than those generated by the test also
      ingress the bridge (e.g., MLD packets), they will be flooded as well and
      interfere with the expected count.
      
      Make the test more robust by making sure that only the packets generated
      by the test can ingress the bridge. Drop all the rest using tc filters
      on the egress of 'br0' and 'h1'.
      
      In the software data path, the problem can be solved by matching on the
      inner destination MAC or dropping unwanted packets at the egress of the
      VXLAN device, but this is not currently supported by mlxsw.
      
      Fixes: d01724dd ("selftests: mlxsw: spectrum-2: Add a test for VxLAN flooding with IPv6")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e624215
    • Ido Schimmel's avatar
      selftests: mlxsw: vxlan_flooding: Prevent flooding of unwanted packets · 044011fd
      Ido Schimmel authored
      The test verifies that packets are correctly flooded by the bridge and
      the VXLAN device by matching on the encapsulated packets at the other
      end. However, if packets other than those generated by the test also
      ingress the bridge (e.g., MLD packets), they will be flooded as well and
      interfere with the expected count.
      
      Make the test more robust by making sure that only the packets generated
      by the test can ingress the bridge. Drop all the rest using tc filters
      on the egress of 'br0' and 'h1'.
      
      In the software data path, the problem can be solved by matching on the
      inner destination MAC or dropping unwanted packets at the egress of the
      VXLAN device, but this is not currently supported by mlxsw.
      
      Fixes: 94d302de ("selftests: mlxsw: Add a test for VxLAN flooding")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      044011fd
    • David S. Miller's avatar
      Merge branch 'mlxsw-line-card-status-tracking' · 365014f5
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Line cards status tracking
      
      When a line card is provisioned, netdevs corresponding to the ports
      found on the line card are registered. User space can then perform
      various logical configurations (e.g., splitting, setting MTU) on these
      netdevs.
      
      However, since the line card is not present / powered on (i.e., it is
      not in 'active' state), user space cannot access the various components
      found on the line card. For example, user space cannot read the
      temperature of gearboxes or transceiver modules found on the line card
      via hwmon / thermal. Similarly, it cannot dump the EEPROM contents of
      these transceiver modules. The above is only possible when the line card
      becomes active.
      
      This patchset solves the problem by tracking the status of each line
      card and invoking callbacks from interested parties when a line card
      becomes active / inactive.
      
      Patchset overview:
      
      Patch #1 adds the infrastructure in the line cards core that allows
      users to registers a set of callbacks that are invoked when a line card
      becomes active / inactive. To avoid races, if a line card is already
      active during registration, the got_active() callback is invoked.
      
      Patches #2-#3 are preparations.
      
      Patch #4 changes the port module core to register a set of callbacks
      with the line cards core. See detailed description with examples in the
      commit message.
      
      Patches #5-#6 do the same with regards to thermal / hwmon support, so
      that user space will be able to monitor the temperature of various
      components on the line card when it becomes active.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      365014f5
    • Vadim Pasternak's avatar
      mlxsw: core_hwmon: Add interfaces for line card initialization and de-initialization · 99a03b31
      Vadim Pasternak authored
      Add callback functions for line card 'hwmon' initialization and
      de-initialization. Each line card is associated with the relevant
      'hwmon' device, which may contain thermal attributes for the cages
      and gearboxes found on this line card.
      
      The line card 'hwmon' initialization / de-initialization APIs are to be
      called when line card is set to active / inactive state by
      got_active() / got_inactive() callbacks from line card state machine.
      
      For example cage temperature for module #9 located at line card #7 will
      be exposed by utility 'sensors' like:
      linecard#07
      front panel 009:	+32.0C  (crit = +70.0C, emerg = +80.0C)
      And temperature for gearbox #3 located at line card #5 will be exposed
      like:
      linecard#05
      gearbox 003:		+41.0C  (highest = +41.0C)
      Signed-off-by: default avatarVadim Pasternak <vadimp@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99a03b31
    • Vadim Pasternak's avatar
      mlxsw: core_thermal: Add interfaces for line card initialization and de-initialization · f11a323d
      Vadim Pasternak authored
      Add callback functions for line card thermal area initialization and
      de-initialization. Each line card is associated with the relevant
      thermal area, which may contain thermal zones for cages and gearboxes
      found on this line card.
      
      The line card thermal initialization / de-initialization APIs are to be
      called when line card is set to active / inactive state by
      got_active() / got_inactive() callbacks from line card state machine.
      
      For example thermal zone for module #9 located at line card #7 will
      have type:
      mlxsw-lc7-module9.
      And thermal zone for gearbox #2 located at line card #5 will have type:
      mlxsw-lc5-gearbox2.
      Signed-off-by: default avatarVadim Pasternak <vadimp@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f11a323d
    • Vadim Pasternak's avatar
      mlxsw: core_env: Add interfaces for line card initialization and de-initialization · 06a0fc43
      Vadim Pasternak authored
      Netdevs for ports found on line cards are registered upon provisioning.
      However, user space is not allowed to access the transceiver modules
      found on a line card until the line card becomes active.
      
      Therefore, register event operations with the line card core to get
      notifications whenever a line card becomes active or inactive.
      
      When user space tries to dump the EEPROM of a transceiver module or reset
      it and the corresponding line card is inactive, emit an error
      message:
      ethtool -m enp1s0nl7p9
      netlink error: mlxsw_core: Cannot read EEPROM of module on an inactive line card
      netlink error: Input/output error
      
      When user space tries to set the power mode policy of such a transceiver,
      cache the configuration and apply it when the line card becomes active. This
      is consistent with other port configuration (e.g., MTU setting) that user space
      is able to perform while the line card is provisioned, but inactive.
      Signed-off-by: default avatarVadim Pasternak <vadimp@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06a0fc43
    • Vadim Pasternak's avatar
      mlxsw: core_env: Split module power mode setting to a separate function · a11e1ec1
      Vadim Pasternak authored
      Move the code that applies the module power mode to the device to a
      separate function. This function will be invoked by the next patch to
      set the power mode on transceiver modules found on a line card when the
      line card becomes active.
      Signed-off-by: default avatarVadim Pasternak <vadimp@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a11e1ec1
    • Vadim Pasternak's avatar
      mlxsw: core: Add bus argument to environment init API · 7b261af9
      Vadim Pasternak authored
      Pass bus argument to mlxsw_env_init(). The purpose is to get access to
      device handle, which is to be provided to error message in case of line
      card activation failure.
      Signed-off-by: default avatarVadim Pasternak <vadimp@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b261af9
    • Jiri Pirko's avatar
      mlxsw: core_linecards: Introduce ops for linecards status change tracking · de28976d
      Jiri Pirko authored
      Introduce an infrastructure allowing users to register a set
      of operations which are to be called whenever a line card gets
      active/inactive.
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarVadim Pasternak <vadimp@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de28976d
    • Krzysztof Kozlowski's avatar
      nfc: MAINTAINERS: add Bug entry · c5d0fc54
      Krzysztof Kozlowski authored
      Add a Bug section, indicating preferred mailing method for bug reports,
      to NFC Subsystem entry.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5d0fc54
    • David S. Miller's avatar
      Merge tag 'linux-can-next-for-5.19-20220419' of... · 85ef87ba
      David S. Miller authored
      Merge tag 'linux-can-next-for-5.19-20220419' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can-next 2022-04-19
      
      this is a pull request of 17 patches for net-next/master.
      
      The first 2 patches are by me and target the CAN driver
      infrastructure. One patch renames a function in the rx_offload helper
      the other one updates the CAN bitrate calculation to prefer small bit
      rate pre-scalers over larger ones, which is encouraged by the CAN in
      Automation.
      
      Kris Bahnsen contributes a patch to fix the links to Technologic
      Systems web resources in the sja1000 driver.
      
      Christophe Leroy's patch prepares the mpc5xxx_can driver for upcoming
      powerpc header cleanup.
      
      Minghao Chi's patch converts the flexcan driver to use
      pm_runtime_resume_and_get().
      
      The next 2 patches target the Xilinx CAN driver. Lukas Bulwahn's patch
      fixes an entry in the MAINTAINERS file. A patch by me marks the bit
      timing constants as const.
      
      Wolfram Sang's patch documents r8a77961 support on the
      renesas,rcar-canfd bindings document.
      
      The next 2 patches are by me and add support for the mcp251863 chip to
      the mcp251xfd driver.
      
      The last 7 patches are by Pavel Pisa, Martin Jerabek et al. and add
      the ctucanfd driver for the CTU CAN FD IP Core.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85ef87ba
    • Kevin Hao's avatar
      net: stmmac: Use readl_poll_timeout_atomic() in atomic state · 234901de
      Kevin Hao authored
      The init_systime() may be invoked in atomic state. We have observed the
      following call trace when running "phc_ctl /dev/ptp0 set" on a Intel
      Agilex board.
        BUG: sleeping function called from invalid context at drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c:74
        in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 381, name: phc_ctl
        preempt_count: 1, expected: 0
        RCU nest depth: 0, expected: 0
        Preemption disabled at:
        [<ffff80000892ef78>] stmmac_set_time+0x34/0x8c
        CPU: 2 PID: 381 Comm: phc_ctl Not tainted 5.18.0-rc2-next-20220414-yocto-standard+ #567
        Hardware name: SoCFPGA Agilex SoCDK (DT)
        Call trace:
         dump_backtrace.part.0+0xc4/0xd0
         show_stack+0x24/0x40
         dump_stack_lvl+0x7c/0xa0
         dump_stack+0x18/0x34
         __might_resched+0x154/0x1c0
         __might_sleep+0x58/0x90
         init_systime+0x78/0x120
         stmmac_set_time+0x64/0x8c
         ptp_clock_settime+0x60/0x9c
         pc_clock_settime+0x6c/0xc0
         __arm64_sys_clock_settime+0x88/0xf0
         invoke_syscall+0x5c/0x130
         el0_svc_common.constprop.0+0x4c/0x100
         do_el0_svc+0x7c/0xa0
         el0_svc+0x58/0xcc
         el0t_64_sync_handler+0xa4/0x130
         el0t_64_sync+0x18c/0x190
      
      So we should use readl_poll_timeout_atomic() here instead of
      readl_poll_timeout().
      
      Also adjust the delay time to 10us to fix a "__bad_udelay" build error
      reported by "kernel test robot <lkp@intel.com>". I have tested this on
      Intel Agilex and NXP S32G boards, there is no delay needed at all.
      So the 10us delay should be long enough for most cases.
      
      Fixes: ff8ed737 ("net: stmmac: use readl_poll_timeout() function in init_systime()")
      Signed-off-by: default avatarKevin Hao <haokexin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      234901de
    • David S. Miller's avatar
      Merge branch 'net-sched-flower-num-vlan-tags' · c1f6f1e6
      David S. Miller authored
      Boris Sukholitko says:
      
      ====================
      net/sched: flower: match on the number of vlan tags
      
      Our customers in the fiber telecom world have network configurations
      where they would like to control their traffic according to the number
      of tags appearing in the packet.
      
      For example, TR247 GPON conformance test suite specification mostly
      talks about untagged, single, double tagged packets and gives lax
      guidelines on the vlan protocol vs. number of vlan tags.
      
      This is different from the common IT networks where 802.1Q and 802.1ad
      protocols are usually describe single and double tagged packet. GPON
      configurations that we work with have arbitrary mix the above protocols
      and number of vlan tags in the packet.
      
      The following patch series implement number of vlans flower filter. They
      add num_of_vlans flower filter as an alternative to vlan ethtype protocol
      matching. The end result is that the following command becomes possible:
      
      tc filter add dev eth1 ingress flower \
        num_of_vlans 1 vlan_prio 5 action drop
      
      Also, from our logs, we have redirect rules such that:
      
      tc filter add dev $GPON ingress flower num_of_vlans $N \
           action mirred egress redirect dev $DEV
      
      where N can range from 0 to 3 and $DEV is the function of $N.
      
      Also there are rules setting skb mark based on the number of vlans:
      
      tc filter add dev $GPON ingress flower num_of_vlans $N vlan_prio \
          $P action skbedit mark $M
      
      More about the patch series:
        - patches 1-2 remove duplicate code by introducing is_key_vlan
          helper.
        - patch 3, 4 implement num_of_vlans in the dissector and in the
          flower.
        - patch 5 uses the num_of_vlans filter to allow further matching on
          vlan attributes.
      
      Complementary iproute2 patches are being sent separately.
      
      Thanks,
      Boris.
      
      - v4: rebased to the latest net-next
      - v3:
          - more example commands in patch 3 description (request by Jamal)
          - patch 5 description made clearer (thanks to Jiri)
      - v2:
          - add suitable subject prefixes
          - more evolved patch 5 description
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1f6f1e6
    • Boris Sukholitko's avatar
      net/sched: flower: Consider the number of tags for vlan filters · 99fdb22b
      Boris Sukholitko authored
      Before this patch the existence of vlan filters was conditional on the vlan
      protocol being matched in the tc rule. For example, the following rule:
      
      tc filter add dev eth1 ingress flower vlan_prio 5
      
      was illegal because vlan protocol (e.g. 802.1q) does not appear in the rule.
      
      Remove the above restriction by looking at the num_of_vlans filter to
      allow further matching on vlan attributes. The following rule becomes
      legal as a result of this commit:
      
      tc filter add dev eth1 ingress flower num_of_vlans 1 vlan_prio 5
      
      because having num_of_vlans==1 implies that the packet is single tagged.
      
      Change is_vlan_key helper to look at the number of vlans in addition to
      the vlan ethertype. The outcome of this change is that outer (e.g. vlan_prio)
      and inner (e.g. cvlan_prio) tag vlan filters require the number of vlan
      tags to be greater then 0 and 1 accordingly.
      
      As a result of is_vlan_key change, the ethertype may be set to 0 when
      matching on the number of vlans. Update fl_set_key_vlan to avoid setting
      key, mask vlan_tpid for the 0 ethertype.
      Signed-off-by: default avatarBoris Sukholitko <boris.sukholitko@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99fdb22b
    • Boris Sukholitko's avatar
      net/sched: flower: Add number of vlan tags filter · b4000312
      Boris Sukholitko authored
      These are bookkeeping parts of the new num_of_vlans filter.
      Defines, dump, load and set are being done here.
      Signed-off-by: default avatarBoris Sukholitko <boris.sukholitko@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4000312
    • Boris Sukholitko's avatar
      flow_dissector: Add number of vlan tags dissector · 34951fcf
      Boris Sukholitko authored
      Our customers in the fiber telecom world have network configurations
      where they would like to control their traffic according to the number
      of tags appearing in the packet.
      
      For example, TR247 GPON conformance test suite specification mostly
      talks about untagged, single, double tagged packets and gives lax
      guidelines on the vlan protocol vs. number of vlan tags.
      
      This is different from the common IT networks where 802.1Q and 802.1ad
      protocols are usually describe single and double tagged packet. GPON
      configurations that we work with have arbitrary mix the above protocols
      and number of vlan tags in the packet.
      
      The goal is to make the following TC commands possible:
      
      tc filter add dev eth1 ingress flower \
        num_of_vlans 1 vlan_prio 5 action drop
      
      From our logs, we have redirect rules such that:
      
      tc filter add dev $GPON ingress flower num_of_vlans $N \
           action mirred egress redirect dev $DEV
      
      where N can range from 0 to 3 and $DEV is the function of $N.
      
      Also there are rules setting skb mark based on the number of vlans:
      
      tc filter add dev $GPON ingress flower num_of_vlans $N vlan_prio \
          $P action skbedit mark $M
      
      This new dissector allows extracting the number of vlan tags existing in
      the packet.
      Signed-off-by: default avatarBoris Sukholitko <boris.sukholitko@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34951fcf