1. 13 Oct, 2021 9 commits
    • Vladimir Oltean's avatar
      net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driver · deab6b1c
      Vladimir Oltean authored
      As explained here:
      https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
      DSA tagging protocol drivers cannot depend on symbols exported by switch
      drivers, because this creates a circular dependency that breaks module
      autoloading.
      
      The tag_ocelot.c file depends on the ocelot_ptp_rew_op() function
      exported by the common ocelot switch lib. This function looks at
      OCELOT_SKB_CB(skb) and computes how to populate the REW_OP field of the
      DSA tag, for PTP timestamping (the command: one-step/two-step, and the
      TX timestamp identifier).
      
      None of that requires deep insight into the driver, it is quite
      stateless, as it only depends upon the skb->cb. So let's make it a
      static inline function and put it in include/linux/dsa/ocelot.h, a
      file that despite its name is used by the ocelot switch driver for
      populating the injection header too - since commit 40d3f295 ("net:
      mscc: ocelot: use common tag parsing code with DSA").
      
      With that function declared as static inline, its body is expanded
      inside each call site, so the dependency is broken and the DSA tagger
      can be built without the switch library, upon which the felix driver
      depends.
      
      Fixes: 39e5308b ("net: mscc: ocelot: support PTP Sync one-step timestamping")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      deab6b1c
    • Vladimir Oltean's avatar
      net: mscc: ocelot: cross-check the sequence id from the timestamp FIFO with the skb PTP header · ebb4c6a9
      Vladimir Oltean authored
      The sad reality is that when a PTP frame with a TX timestamping request
      is transmitted, it isn't guaranteed that it will make it all the way to
      the wire (due to congestion inside the switch), and that a timestamp
      will be taken by the hardware and placed in the timestamp FIFO where an
      IRQ will be raised for it.
      
      The implication is that if enough PTP frames are silently dropped by the
      hardware such that the timestamp ID has rolled over, it is possible to
      match a timestamp to an old skb.
      
      Furthermore, nobody will match on the real skb corresponding to this
      timestamp, since we stupidly matched on a previous one that was stale in
      the queue, and stopped there.
      
      So PTP timestamping will be broken and there will be no way to recover.
      
      It looks like the hardware parses the sequenceID from the PTP header,
      and also provides that metadata for each timestamp. The driver currently
      ignores this, but it shouldn't.
      
      As an extra resiliency measure, do the following:
      
      - check whether the PTP sequenceID also matches between the skb and the
        timestamp, treat the skb as stale otherwise and free it
      
      - if we see a stale skb, don't stop there and try to match an skb one
        more time, chances are there's one more skb in the queue with the same
        timestamp ID, otherwise we wouldn't have ever found the stale one (it
        is by timestamp ID that we matched it).
      
      While this does not prevent PTP packet drops, it at least prevents
      the catastrophic consequences of incorrect timestamp matching.
      
      Since we already call ptp_classify_raw in the TX path, save the result
      in the skb->cb of the clone, and just use that result in the interrupt
      code path.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ebb4c6a9
    • Vladimir Oltean's avatar
      net: mscc: ocelot: deny TX timestamping of non-PTP packets · fba01283
      Vladimir Oltean authored
      It appears that Ocelot switches cannot timestamp non-PTP frames,
      I tested this using the isochron program at:
      https://github.com/vladimiroltean/tsn-scripts
      
      with the result that the driver increments the ocelot_port->ts_id
      counter as expected, puts it in the REW_OP, but the hardware seems to
      not timestamp these packets at all, since no IRQ is emitted.
      
      Therefore check whether we are sending PTP frames, and refuse to
      populate REW_OP otherwise.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fba01283
    • Vladimir Oltean's avatar
      net: mscc: ocelot: warn when a PTP IRQ is raised for an unknown skb · 9fde506e
      Vladimir Oltean authored
      When skb_match is NULL, it means we received a PTP IRQ for a timestamp
      ID that the kernel has no idea about, since there is no skb in the
      timestamping queue with that timestamp ID.
      
      This is a grave error and not something to just "continue" over.
      So print a big warning in case this happens.
      
      Also, move the check above ocelot_get_hwtimestamp(), there is no point
      in reading the full 64-bit current PTP time if we're not going to do
      anything with it anyway for this skb.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9fde506e
    • Vladimir Oltean's avatar
      net: mscc: ocelot: avoid overflowing the PTP timestamp FIFO · 52849bcf
      Vladimir Oltean authored
      PTP packets with 2-step TX timestamp requests are matched to packets
      based on the egress port number and a 6-bit timestamp identifier.
      All PTP timestamps are held in a common FIFO that is 128 entry deep.
      
      This patch ensures that back-to-back timestamping requests cannot exceed
      the hardware FIFO capacity. If that happens, simply send the packets
      without requesting a TX timestamp to be taken (in the case of felix,
      since the DSA API has a void return code in ds->ops->port_txtstamp) or
      drop them (in the case of ocelot).
      
      I've moved the ts_id_lock from a per-port basis to a per-switch basis,
      because we need separate accounting for both numbers of PTP frames in
      flight. And since we need locking to inc/dec the per-switch counter,
      that also offers protection for the per-port counter and hence there is
      no reason to have a per-port counter anymore.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      52849bcf
    • Vladimir Oltean's avatar
      net: mscc: ocelot: make use of all 63 PTP timestamp identifiers · c57fe003
      Vladimir Oltean authored
      At present, there is a problem when user space bombards a port with PTP
      event frames which have TX timestamping requests (or when a tc-taprio
      offload is installed on a port, which delays the TX timestamps by a
      significant amount of time). The driver will happily roll over the 2-bit
      timestamp ID and this will cause incorrect matches between an skb and
      the TX timestamp collected from the FIFO.
      
      The Ocelot switches have a 6-bit PTP timestamp identifier, and the value
      63 is reserved, so that leaves identifiers 0-62 to be used.
      
      The timestamp identifiers are selected by the REW_OP packet field, and
      are actually shared between CPU-injected frames and frames which match a
      VCAP IS2 rule that modifies the REW_OP. The hardware supports
      partitioning between the two uses of the REW_OP field through the
      PTP_ID_LOW and PTP_ID_HIGH registers, and by default reserves the PTP
      IDs 0-3 for CPU-injected traffic and the rest for VCAP IS2.
      
      The driver does not use VCAP IS2 to set REW_OP for 2-step timestamping,
      and it also writes 0xffffffff to both PTP_ID_HIGH and PTP_ID_LOW in
      ocelot_init_timestamp() which makes all timestamp identifiers available
      to CPU injection.
      
      Therefore, we can make use of all 63 timestamp identifiers, which should
      allow more timestampable packets to be in flight on each port. This is
      only part of the solution, more issues will be addressed in future changes.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c57fe003
    • Jakub Kicinski's avatar
      Merge branch 'fix-circular-dependency-between-sja1105-and-tag_sja1105' · 3af760e4
      Jakub Kicinski authored
      Vladimir Oltean says:
      
      ====================
      Fix circular dependency between sja1105 and tag_sja1105
      
      As discussed here:
      https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
      DSA tagging protocols cannot use symbols exported by switch drivers.
      
      Eliminate the two instances of that from tag_sja1105, and that allows us
      to have a working setup with modules again.
      ====================
      
      Re-applying to net, this was mistakenly applied to net-next,
      see first Link.
      
      Link: https://lore.kernel.org/r/20211012114044.2526146-1-vladimir.oltean@nxp.com/
      Link: https://lore.kernel.org/r/20210922143726.2431036-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3af760e4
    • Vladimir Oltean's avatar
      net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver · 4ac0567e
      Vladimir Oltean authored
      It's nice to be able to test a tagging protocol with dsa_loop, but not
      at the cost of losing the ability of building the tagging protocol and
      switch driver as modules, because as things stand, there is a circular
      dependency between the two. Tagging protocol drivers cannot depend on
      switch drivers, that is a hard fact.
      
      The reasoning behind the blamed patch was that accessing dp->priv should
      first make sure that the structure behind that pointer is what we really
      think it is.
      
      Currently the "sja1105" and "sja1110" tagging protocols only operate
      with the sja1105 switch driver, just like any other tagging protocol and
      switch combination. The only way to mix and match them is by modifying
      the code, and this applies to dsa_loop as well (by default that uses
      DSA_TAG_PROTO_NONE). So while in principle there is an issue, in
      practice there isn't one.
      
      Until we extend dsa_loop to allow user space configuration, treat the
      problem as a non-issue and just say that DSA ports found by tag_sja1105
      are always sja1105 ports, which is in fact true. But keep the
      dsa_port_is_sja1105 function so that it's easy to patch it during
      testing, and rely on dead code elimination.
      
      Fixes: 994d2cbb ("net: dsa: tag_sja1105: be dsa_loop-safe")
      Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4ac0567e
    • Vladimir Oltean's avatar
      net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver · 28da0555
      Vladimir Oltean authored
      The problem is that DSA tagging protocols really must not depend on the
      switch driver, because this creates a circular dependency at insmod
      time, and the switch driver will effectively not load when the tagging
      protocol driver is missing.
      
      The code was structured in the way it was for a reason, though. The DSA
      driver-facing API for PTP timestamping relies on the assumption that
      two-step TX timestamps are provided by the hardware in an out-of-band
      manner, typically by raising an interrupt and making that timestamp
      available inside some sort of FIFO which is to be accessed over
      SPI/MDIO/etc.
      
      So the API puts .port_txtstamp into dsa_switch_ops, because it is
      expected that the switch driver needs to save some state (like put the
      skb into a queue until its TX timestamp arrives).
      
      On SJA1110, TX timestamps are provided by the switch as Ethernet
      packets, so this makes them be received and processed by the tagging
      protocol driver. This in itself is great, because the timestamps are
      full 64-bit and do not require reconstruction, and since Ethernet is the
      fastest I/O method available to/from the switch, PTP timestamps arrive
      very quickly, no matter how bottlenecked the SPI connection is, because
      SPI interaction is not needed at all.
      
      DSA's code structure and strict isolation between the tagging protocol
      driver and the switch driver break the natural code organization.
      
      When the tagging protocol driver receives a packet which is classified
      as a metadata packet containing timestamps, it passes those timestamps
      one by one to the switch driver, which then proceeds to compare them
      based on the recorded timestamp ID that was generated in .port_txtstamp.
      
      The communication between the tagging protocol and the switch driver is
      done through a method exported by the switch driver, sja1110_process_meta_tstamp.
      To satisfy build requirements, we force a dependency to build the
      tagging protocol driver as a module when the switch driver is a module.
      However, as explained in the first paragraph, that causes the circular
      dependency.
      
      To solve this, move the skb queue from struct sja1105_private :: struct
      sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data.
      The latter is a data structure for which hacks have already been put
      into place to be able to create persistent storage per switch that is
      accessible from the tagging protocol driver (see sja1105_setup_ports).
      
      With the skb queue directly accessible from the tagging protocol driver,
      we can now move sja1110_process_meta_tstamp into the tagging driver
      itself, and avoid exporting a symbol.
      
      Fixes: 566b18c8 ("net: dsa: sja1105: implement TX timestamping for SJA1110")
      Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      28da0555
  2. 12 Oct, 2021 11 commits
    • Alvin Šipraga's avatar
      net: dsa: fix spurious error message when unoffloaded port leaves bridge · 43a4b4db
      Alvin Šipraga authored
      Flip the sign of a return value check, thereby suppressing the following
      spurious error:
      
        port 2 failed to notify DSA_NOTIFIER_BRIDGE_LEAVE: -EOPNOTSUPP
      
      ... which is emitted when removing an unoffloaded DSA switch port from a
      bridge.
      
      Fixes: d371b7c9 ("net: dsa: Unset vlan_filtering when ports leave the bridge")
      Signed-off-by: default avatarAlvin Šipraga <alsi@bang-olufsen.dk>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20211012112730.3429157-1-alvin@pqrs.dkSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      43a4b4db
    • Baowen Zheng's avatar
      nfp: flow_offload: move flow_indr_dev_register from app init to app start · 60d950f4
      Baowen Zheng authored
      In commit 74fc4f82 ("net: Fix offloading indirect devices dependency
      on qdisc order creation"), it adds a process to trigger the callback to
      setup the bo callback when the driver regists a callback.
      
      In our current implement, we are not ready to run the callback when nfp
      call the function flow_indr_dev_register, then there will be error
      message as:
      
      kernel: Oops: 0000 [#1] SMP PTI
      kernel: CPU: 0 PID: 14119 Comm: kworker/0:0 Tainted: G
      kernel: Workqueue: events work_for_cpu_fn
      kernel: RIP: 0010:nfp_flower_indr_setup_tc_cb+0x258/0x410
      kernel: RSP: 0018:ffffbc1e02c57bf8 EFLAGS: 00010286
      kernel: RAX: 0000000000000000 RBX: ffff9c761fabc000 RCX: 0000000000000001
      kernel: RDX: 0000000000000001 RSI: fffffffffffffff0 RDI: ffffffffc0be9ef1
      kernel: RBP: ffffbc1e02c57c58 R08: ffffffffc08f33aa R09: ffff9c6db7478800
      kernel: R10: 0000009c003f6e00 R11: ffffbc1e02800000 R12: ffffbc1e000d9000
      kernel: R13: ffffbc1e000db428 R14: ffff9c6db7478800 R15: ffff9c761e884e80
      kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      kernel: CR2: fffffffffffffff0 CR3: 00000009e260a004 CR4: 00000000007706f0
      kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      kernel: PKRU: 55555554
      kernel: Call Trace:
      kernel: ? flow_indr_dev_register+0xab/0x210
      kernel: ? __cond_resched+0x15/0x30
      kernel: ? kmem_cache_alloc_trace+0x44/0x4b0
      kernel: ? nfp_flower_setup_tc+0x1d0/0x1d0 [nfp]
      kernel: flow_indr_dev_register+0x158/0x210
      kernel: ? tcf_block_unbind+0xe0/0xe0
      kernel: nfp_flower_init+0x40b/0x650 [nfp]
      kernel: nfp_net_pci_probe+0x25f/0x960 [nfp]
      kernel: ? nfp_rtsym_read_le+0x76/0x130 [nfp]
      kernel: nfp_pci_probe+0x6a9/0x820 [nfp]
      kernel: local_pci_probe+0x45/0x80
      
      So we need to call flow_indr_dev_register in app start process instead of
      init stage.
      
      Fixes: 74fc4f82 ("net: Fix offloading indirect devices dependency on qdisc order creation")
      Signed-off-by: default avatarBaowen Zheng <baowen.zheng@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Link: https://lore.kernel.org/r/20211012124850.13025-1-louis.peens@corigine.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      60d950f4
    • Jacob Keller's avatar
      ice: fix locking for Tx timestamp tracking flush · 4d4a223a
      Jacob Keller authored
      Commit 4dd0d5c3 ("ice: add lock around Tx timestamp tracker flush")
      added a lock around the Tx timestamp tracker flow which is used to
      cleanup any left over SKBs and prepare for device removal.
      
      This lock is problematic because it is being held around a call to
      ice_clear_phy_tstamp. The clear function takes a mutex to send a PHY
      write command to firmware. This could lead to a deadlock if the mutex
      actually sleeps, and causes the following warning on a kernel with
      preemption debugging enabled:
      
      [  715.419426] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:573
      [  715.427900] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3100, name: rmmod
      [  715.435652] INFO: lockdep is turned off.
      [  715.439591] Preemption disabled at:
      [  715.439594] [<0000000000000000>] 0x0
      [  715.446678] CPU: 52 PID: 3100 Comm: rmmod Tainted: G        W  OE     5.15.0-rc4+ #42 bdd7ec3018e725f159ca0d372ce8c2c0e784891c
      [  715.458058] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020
      [  715.468483] Call Trace:
      [  715.470940]  dump_stack_lvl+0x6a/0x9a
      [  715.474613]  ___might_sleep.cold+0x224/0x26a
      [  715.478895]  __mutex_lock+0xb3/0x1440
      [  715.482569]  ? stack_depot_save+0x378/0x500
      [  715.486763]  ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.494979]  ? kfree+0xc1/0x520
      [  715.498128]  ? mutex_lock_io_nested+0x12a0/0x12a0
      [  715.502837]  ? kasan_set_free_info+0x20/0x30
      [  715.507110]  ? __kasan_slab_free+0x10b/0x140
      [  715.511385]  ? slab_free_freelist_hook+0xc7/0x220
      [  715.516092]  ? kfree+0xc1/0x520
      [  715.519235]  ? ice_deinit_lag+0x16c/0x220 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.527359]  ? ice_remove+0x1cf/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.535133]  ? pci_device_remove+0xab/0x1d0
      [  715.539318]  ? __device_release_driver+0x35b/0x690
      [  715.544110]  ? driver_detach+0x214/0x2f0
      [  715.548035]  ? bus_remove_driver+0x11d/0x2f0
      [  715.552309]  ? pci_unregister_driver+0x26/0x250
      [  715.556840]  ? ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.564799]  ? __do_sys_delete_module.constprop.0+0x2d8/0x4e0
      [  715.570554]  ? do_syscall_64+0x3b/0x90
      [  715.574303]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  715.579529]  ? start_flush_work+0x542/0x8f0
      [  715.583719]  ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.591923]  ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.599960]  ? wait_for_completion_io+0x250/0x250
      [  715.604662]  ? lock_acquire+0x196/0x200
      [  715.608504]  ? do_raw_spin_trylock+0xa5/0x160
      [  715.612864]  ice_sbq_rw_reg+0x1e6/0x2f0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.620813]  ? ice_reset+0x130/0x130 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.628497]  ? __debug_check_no_obj_freed+0x1e8/0x3c0
      [  715.633550]  ? trace_hardirqs_on+0x1c/0x130
      [  715.637748]  ice_write_phy_reg_e810+0x70/0xf0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.646220]  ? do_raw_spin_trylock+0xa5/0x160
      [  715.650581]  ? ice_ptp_release+0x910/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.658797]  ? ice_ptp_release+0x255/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.667013]  ice_clear_phy_tstamp+0x2c/0x110 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.675403]  ice_ptp_release+0x408/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.683440]  ice_remove+0x560/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.691037]  ? _raw_spin_unlock_irqrestore+0x46/0x73
      [  715.696005]  pci_device_remove+0xab/0x1d0
      [  715.700018]  __device_release_driver+0x35b/0x690
      [  715.704637]  driver_detach+0x214/0x2f0
      [  715.708389]  bus_remove_driver+0x11d/0x2f0
      [  715.712489]  pci_unregister_driver+0x26/0x250
      [  715.716857]  ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.724637]  __do_sys_delete_module.constprop.0+0x2d8/0x4e0
      [  715.730210]  ? free_module+0x6d0/0x6d0
      [  715.733963]  ? task_work_run+0xe1/0x170
      [  715.737803]  ? exit_to_user_mode_loop+0x17f/0x1d0
      [  715.742509]  ? rcu_read_lock_sched_held+0x12/0x80
      [  715.747215]  ? trace_hardirqs_on+0x1c/0x130
      [  715.751401]  do_syscall_64+0x3b/0x90
      [  715.754981]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  715.760033] RIP: 0033:0x7f4dfe59000b
      [  715.763612] Code: 73 01 c3 48 8b 0d 6d 1e 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 1e 0c 00 f7 d8 64 89 01 48
      [  715.782357] RSP: 002b:00007ffe8c891708 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
      [  715.789923] RAX: ffffffffffffffda RBX: 00005558a20468b0 RCX: 00007f4dfe59000b
      [  715.797054] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005558a2046918
      [  715.804189] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
      [  715.811319] R10: 00007f4dfe603ac0 R11: 0000000000000206 R12: 00007ffe8c891940
      [  715.818455] R13: 00007ffe8c8920a3 R14: 00005558a20462a0 R15: 00005558a20468b0
      
      Notice that this is the only case where we use the lock in this way. In
      the cleanup kthread and work kthread the lock is only taken around the
      bit accesses. This was done intentionally to avoid this kind of issue.
      The way the lock is used, we only protect ordering of bit sets vs bit
      clears. The Tx writers in the hot path don't need to be protected
      against the entire kthread loop. The Tx queues threads only need to
      ensure that they do not re-use an index that is currently in use. The
      cleanup loop does not need to block all new set bits, since it will
      re-queue itself if new timestamps are present.
      
      Fix the tracker flow so that it uses the same flow as the standard
      cleanup thread. In addition, ensure the in_use bitmap actually gets
      cleared properly.
      
      This fixes the warning and also avoids the potential deadlock that might
      have occurred otherwise.
      
      Fixes: 4dd0d5c3 ("ice: add lock around Tx timestamp tracker flush")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d4a223a
    • David S. Miller's avatar
      Merge branch 'ioam-fixes' · 7389074c
      David S. Miller authored
      Justin Iurman says:
      
      ====================
      Correct the IOAM behavior for undefined trace type bits
      
      (@Jakub @David: there will be a conflict for #2 when merging net->net-next, due
      to commit [1]. The conflict is only 5-10 lines for #2 (#1 should be fine) inside
      the file tools/testing/selftests/net/ioam6.sh, so quite short though possibly
      ugly. Sorry for that, I didn't expect to post this one... Had I known, I'd have
      made the opposite.)
      
      Modify both the input and output behaviors regarding the trace type when one of
      the undefined bits is set. The goal is to keep the interoperability when new
      fields (aka new bits inside the range 12-21) will be defined.
      
      The draft [2] says the following:
      ---------------------------------------------------------------
      "Bit 12-21  Undefined.  These values are available for future
             assignment in the IOAM Trace-Type Registry (Section 8.2).
             Every future node data field corresponding to one of
             these bits MUST be 4-octets long.  An IOAM encapsulating
             node MUST set the value of each undefined bit to 0.  If
             an IOAM transit node receives a packet with one or more
             of these bits set to 1, it MUST either:
      
             1.  Add corresponding node data filled with the reserved
                 value 0xFFFFFFFF, after the node data fields for the
                 IOAM-Trace-Type bits defined above, such that the
                 total node data added by this node in units of
                 4-octets is equal to NodeLen, or
      
             2.  Not add any node data fields to the packet, even for
                 the IOAM-Trace-Type bits defined above."
      ---------------------------------------------------------------
      
      The output behavior has been modified to respect the fact that "an IOAM encap
      node MUST set the value of each undefined bit to 0" (i.e., undefined bits can't
      be set anymore).
      
      As for the input behavior, current implementation is based on the second choice
      (i.e., "not add any data fields to the packet [...]"). With this solution, any
      interoperability is lost (i.e., if a new bit is defined, then an "old" kernel
      implementation wouldn't fill IOAM data when such new bit is set inside the trace
      type).
      
      The input behavior is therefore relaxed and these undefined bits are now allowed
      to be set. It is only possible thanks to the sentence "every future node data
      field corresponding to one of these bits MUST be 4-octets long". Indeed, the
      default empty value (the one for 4-octet fields) is inserted whenever an
      undefined bit is set.
      
        [1] cfbe9b00
        [2] https://datatracker.ietf.org/doc/html/draft-ietf-ippm-ioam-data#section-5.4.1
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7389074c
    • Justin Iurman's avatar
      selftests: net: modify IOAM tests for undef bits · 7b1700e0
      Justin Iurman authored
      The output behavior for undefined bits is now directly tested inside the bash
      script. Trying to set an undefined bit should be refused.
      
      The input behavior for undefined bits has been removed due to the fact that we
      would need another sender allowed to set undefined bits.
      Signed-off-by: default avatarJustin Iurman <justin.iurman@uliege.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b1700e0
    • Justin Iurman's avatar
      ipv6: ioam: move the check for undefined bits · 2bbc977c
      Justin Iurman authored
      The check for undefined bits in the trace type is moved from the input side to
      the output side, while the input side is relaxed and now inserts default empty
      values when an undefined bit is set.
      Signed-off-by: default avatarJustin Iurman <justin.iurman@uliege.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2bbc977c
    • Arun Ramadoss's avatar
      net: dsa: microchip: Added the condition for scheduling ksz_mib_read_work · ef1100ef
      Arun Ramadoss authored
      When the ksz module is installed and removed using rmmod, kernel crashes
      with null pointer dereferrence error. During rmmod, ksz_switch_remove
      function tries to cancel the mib_read_workqueue using
      cancel_delayed_work_sync routine and unregister switch from dsa.
      
      During dsa_unregister_switch it calls ksz_mac_link_down, which in turn
      reschedules the workqueue since mib_interval is non-zero.
      Due to which queue executed after mib_interval and it tries to access
      dp->slave. But the slave is unregistered in the ksz_switch_remove
      function. Hence kernel crashes.
      
      To avoid this crash, before canceling the workqueue, resetted the
      mib_interval to 0.
      
      v1 -> v2:
      -Removed the if condition in ksz_mib_read_work
      
      Fixes: 469b390e ("net: dsa: microchip: use delayed_work instead of timer + work")
      Signed-off-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef1100ef
    • Vegard Nossum's avatar
      r8152: select CRC32 and CRYPTO/CRYPTO_HASH/CRYPTO_SHA256 · 9973a430
      Vegard Nossum authored
      Fix the following build/link errors by adding a dependency on
      CRYPTO, CRYPTO_HASH, CRYPTO_SHA256 and CRC32:
      
        ld: drivers/net/usb/r8152.o: in function `rtl8152_fw_verify_checksum':
        r8152.c:(.text+0x2b2a): undefined reference to `crypto_alloc_shash'
        ld: r8152.c:(.text+0x2bed): undefined reference to `crypto_shash_digest'
        ld: r8152.c:(.text+0x2c50): undefined reference to `crypto_destroy_tfm'
        ld: drivers/net/usb/r8152.o: in function `_rtl8152_set_rx_mode':
        r8152.c:(.text+0xdcb0): undefined reference to `crc32_le'
      
      Fixes: 9370f2d0 ("r8152: support request_firmware for RTL8153")
      Fixes: ac718b69 ("net/usb: new driver for RTL8152")
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9973a430
    • Maarten Zanders's avatar
      net: dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's · 4a3e0aed
      Maarten Zanders authored
      mv88e6xxx_port_ppu_updates() interpretes data in the PORT_STS
      register incorrectly for internal ports (ie no PPU). In these
      cases, the PHY_DETECT bit indicates link status. This results
      in forcing the MAC state whenever the PHY link goes down which
      is not intended. As a side effect, LED's configured to show
      link status stay lit even though the physical link is down.
      
      Add a check in mac_link_down and mac_link_up to see if it
      concerns an external port and only then, look at PPU status.
      
      Fixes: 5d5b231d (net: dsa: mv88e6xxx: use PHY_DETECT in mac_link_up/mac_link_down)
      Reported-by: default avatarMaarten Zanders <m.zanders@televic.com>
      Reviewed-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarMaarten Zanders <maarten.zanders@mind.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a3e0aed
    • Wan Jiabing's avatar
      net: mscc: ocelot: Fix dumplicated argument in ocelot · 74a3bc42
      Wan Jiabing authored
      Fix the following coccicheck warning:
      drivers/net/ethernet/mscc/ocelot.c:474:duplicated argument to & or |
      drivers/net/ethernet/mscc/ocelot.c:476:duplicated argument to & or |
      drivers/net/ethernet/mscc/ocelot_net.c:1627:duplicated argument
      to & or |
      
      These DEV_CLOCK_CFG_MAC_TX_RST are duplicate here.
      Here should be DEV_CLOCK_CFG_MAC_RX_RST.
      
      Fixes: e6e12df6 ("net: mscc: ocelot: convert to phylink")
      Signed-off-by: default avatarWan Jiabing <wanjiabing@vivo.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74a3bc42
    • Stephen Boyd's avatar
      af_unix: Rename UNIX-DGRAM to UNIX to maintain backwards compatability · 0edf0824
      Stephen Boyd authored
      Then name of this protocol changed in commit 94531cfc ("af_unix: Add
      unix_stream_proto for sockmap") because that commit added stream support
      to the af_unix protocol. Renaming the existing protocol makes a ChromeOS
      protocol test[1] fail now that the name has changed in
      /proc/net/protocols from "UNIX" to "UNIX-DGRAM".
      
      Let's put the name back to how it was while keeping the stream protocol
      as "UNIX-STREAM" so that the procfs interface doesn't change. This fixes
      the test and maintains backwards compatibility in proc.
      
      Cc: Jiang Wang <jiang.wang@bytedance.com>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: Cong Wang <cong.wang@bytedance.com>
      Cc: Jakub Sitnicki <jakub@cloudflare.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Dmitry Osipenko <digetx@gmail.com>
      Link: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/supported_protocols.go;l=50;drc=e8b1c3f94cb40a054f4aa1ef1aff61e75dc38f18 [1]
      Fixes: 94531cfc ("af_unix: Add unix_stream_proto for sockmap")
      Signed-off-by: default avatarStephen Boyd <swboyd@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0edf0824
  3. 09 Oct, 2021 7 commits
    • Xuan Zhuo's avatar
      virtio-net: fix for skb_over_panic inside big mode · 732b74d6
      Xuan Zhuo authored
      commit 12628565 ("Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net")
      accidentally reverted the effect of
      commit 1a802423 ("virtio-net: fix for skb_over_panic inside big mode")
      on drivers/net/virtio_net.c
      
      As a result, users of crosvm (which is using large packet mode)
      are experiencing crashes with 5.14-rc1 and above that do not
      occur with 5.13.
      
      Crash trace:
      
      [   61.346677] skbuff: skb_over_panic: text:ffffffff881ae2c7 len:3762 put:3762 head:ffff8a5ec8c22000 data:ffff8a5ec8c22010 tail:0xec2 end:0xec0 dev:<NULL>
      [   61.369192] kernel BUG at net/core/skbuff.c:111!
      [   61.372840] invalid opcode: 0000 [#1] SMP PTI
      [   61.374892] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.14.0-rc1 linux-v5.14-rc1-for-mesa-ci.tar.bz2 #1
      [   61.376450] Hardware name: ChromiumOS crosvm, BIOS 0
      
      ..
      
      [   61.393635] Call Trace:
      [   61.394127]  <IRQ>
      [   61.394488]  skb_put.cold+0x10/0x10
      [   61.395095]  page_to_skb+0xf7/0x410
      [   61.395689]  receive_buf+0x81/0x1660
      [   61.396228]  ? netif_receive_skb_list_internal+0x1ad/0x2b0
      [   61.397180]  ? napi_gro_flush+0x97/0xe0
      [   61.397896]  ? detach_buf_split+0x67/0x120
      [   61.398573]  virtnet_poll+0x2cf/0x420
      [   61.399197]  __napi_poll+0x25/0x150
      [   61.399764]  net_rx_action+0x22f/0x280
      [   61.400394]  __do_softirq+0xba/0x257
      [   61.401012]  irq_exit_rcu+0x8e/0xb0
      [   61.401618]  common_interrupt+0x7b/0xa0
      [   61.402270]  </IRQ>
      
      See
      https://lore.kernel.org/r/5edaa2b7c2fe4abd0347b8454b2ac032b6694e2c.camel%40collabora.com
      for the report.
      
      Apply the original 1a802423 ("virtio-net: fix for skb_over_panic inside big mode")
      again, the original logic still holds:
      
      In virtio-net's large packet mode, there is a hole in the space behind
      buf.
      
          hdr_padded_len - hdr_len
      
      We must take this into account when calculating tailroom.
      
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
      Fixes: 12628565 ("Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net")
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Reported-by: default avatarCorentin Noël <corentin.noel@collabora.com>
      Tested-by: default avatarCorentin Noël <corentin.noel@collabora.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      732b74d6
    • Florian Fainelli's avatar
      net: phy: Do not shutdown PHYs in READY state · f4982393
      Florian Fainelli authored
      In case a PHY device was probed thus in the PHY_READY state, but not
      configured and with no network device attached yet, we should not be
      trying to shut it down because it has been brought back into reset by
      phy_device_reset() towards the end of phy_probe() and anyway we have not
      configured the PHY yet.
      
      Fixes: e2f016cf ("net: phy: add a shutdown procedure")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4982393
    • chongjiapeng's avatar
      qed: Fix missing error code in qed_slowpath_start() · a5a14ea7
      chongjiapeng authored
      The error code is missing in this code scenario, add the error code
      '-EINVAL' to the return value 'rc'.
      
      Eliminate the follow smatch warning:
      
      drivers/net/ethernet/qlogic/qed/qed_main.c:1298 qed_slowpath_start()
      warn: missing error code 'rc'.
      Reported-by: default avatarAbaci Robot <abaci@linux.alibaba.com>
      Fixes: d51e4af5 ("qed: aRFS infrastructure support")
      Signed-off-by: default avatarchongjiapeng <jiapeng.chong@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5a14ea7
    • Vladimir Oltean's avatar
      net: dsa: hold rtnl_lock in dsa_switch_setup_tag_protocol · 1951b3f1
      Vladimir Oltean authored
      It was a documented fact that ds->ops->change_tag_protocol() offered
      rtnetlink mutex protection to the switch driver, since there was an
      ASSERT_RTNL right before the call in dsa_switch_change_tag_proto()
      (initiated from sysfs).
      
      The blamed commit introduced another call path for
      ds->ops->change_tag_protocol() which does not hold the rtnl_mutex.
      This is:
      
      dsa_tree_setup
      -> dsa_tree_setup_switches
         -> dsa_switch_setup
            -> dsa_switch_setup_tag_protocol
               -> ds->ops->change_tag_protocol()
         -> dsa_port_setup
            -> dsa_slave_create
               -> register_netdevice(slave_dev)
      -> dsa_tree_setup_master
         -> dsa_master_setup
            -> dev->dsa_ptr = cpu_dp
      
      The reason why the rtnl_mutex is held in the sysfs call path is to
      ensure that, once the master and all the DSA interfaces are down (which
      is required so that no packets flow), they remain down during the
      tagging protocol change.
      
      The above calling order illustrates the fact that it should not be risky
      to change the initial tagging protocol to the one specified in the
      device tree at the given time:
      
      - packets cannot enter the dsa_switch_rcv() packet type handler since
        netdev_uses_dsa() for the master will not yet return true, since
        dev->dsa_ptr has not yet been populated
      
      - packets cannot enter the dsa_slave_xmit() function because no DSA
        interface has yet been registered
      
      So from the DSA core's perspective, holding the rtnl_mutex is indeed not
      necessary.
      
      Yet, drivers may need to do things which need rtnl_mutex protection. For
      example:
      
      felix_set_tag_protocol
      -> felix_setup_tag_8021q
         -> dsa_tag_8021q_register
            -> dsa_tag_8021q_setup
               -> dsa_tag_8021q_port_setup
                  -> vlan_vid_add
                     -> ASSERT_RTNL
      
      These drivers do not really have a choice to take the rtnl_mutex
      themselves, since in the sysfs case, the rtnl_mutex is already held.
      
      Fixes: deff7107 ("net: dsa: Allow default tag protocol to be overridden from DT")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1951b3f1
    • Zheyu Ma's avatar
      isdn: mISDN: Fix sleeping function called from invalid context · 6510e80a
      Zheyu Ma authored
      The driver can call card->isac.release() function from an atomic
      context.
      
      Fix this by calling this function after releasing the lock.
      
      The following log reveals it:
      
      [   44.168226 ] BUG: sleeping function called from invalid context at kernel/workqueue.c:3018
      [   44.168941 ] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 5475, name: modprobe
      [   44.169574 ] INFO: lockdep is turned off.
      [   44.169899 ] irq event stamp: 0
      [   44.170160 ] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
      [   44.170627 ] hardirqs last disabled at (0): [<ffffffff814209ed>] copy_process+0x132d/0x3e00
      [   44.171240 ] softirqs last  enabled at (0): [<ffffffff81420a1a>] copy_process+0x135a/0x3e00
      [   44.171852 ] softirqs last disabled at (0): [<0000000000000000>] 0x0
      [   44.172318 ] Preemption disabled at:
      [   44.172320 ] [<ffffffffa009b0a9>] nj_release+0x69/0x500 [netjet]
      [   44.174441 ] Call Trace:
      [   44.174630 ]  dump_stack_lvl+0xa8/0xd1
      [   44.174912 ]  dump_stack+0x15/0x17
      [   44.175166 ]  ___might_sleep+0x3a2/0x510
      [   44.175459 ]  ? nj_release+0x69/0x500 [netjet]
      [   44.175791 ]  __might_sleep+0x82/0xe0
      [   44.176063 ]  ? start_flush_work+0x20/0x7b0
      [   44.176375 ]  start_flush_work+0x33/0x7b0
      [   44.176672 ]  ? trace_irq_enable_rcuidle+0x85/0x170
      [   44.177034 ]  ? kasan_quarantine_put+0xaa/0x1f0
      [   44.177372 ]  ? kasan_quarantine_put+0xaa/0x1f0
      [   44.177711 ]  __flush_work+0x11a/0x1a0
      [   44.177991 ]  ? flush_work+0x20/0x20
      [   44.178257 ]  ? lock_release+0x13c/0x8f0
      [   44.178550 ]  ? __kasan_check_write+0x14/0x20
      [   44.178872 ]  ? do_raw_spin_lock+0x148/0x360
      [   44.179187 ]  ? read_lock_is_recursive+0x20/0x20
      [   44.179530 ]  ? __kasan_check_read+0x11/0x20
      [   44.179846 ]  ? do_raw_spin_unlock+0x55/0x900
      [   44.180168 ]  ? ____kasan_slab_free+0x116/0x140
      [   44.180505 ]  ? _raw_spin_unlock_irqrestore+0x41/0x60
      [   44.180878 ]  ? skb_queue_purge+0x1a3/0x1c0
      [   44.181189 ]  ? kfree+0x13e/0x290
      [   44.181438 ]  flush_work+0x17/0x20
      [   44.181695 ]  mISDN_freedchannel+0xe8/0x100
      [   44.182006 ]  isac_release+0x210/0x260 [mISDNipac]
      [   44.182366 ]  nj_release+0xf6/0x500 [netjet]
      [   44.182685 ]  nj_remove+0x48/0x70 [netjet]
      [   44.182989 ]  pci_device_remove+0xa9/0x250
      Signed-off-by: default avatarZheyu Ma <zheyuma97@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6510e80a
    • Shannon Nelson's avatar
      ionic: don't remove netdev->dev_addr when syncing uc list · 5c976a56
      Shannon Nelson authored
      Bridging, and possibly other upper stack gizmos, adds the
      lower device's netdev->dev_addr to its own uc list, and
      then requests it be deleted when the upper bridge device is
      removed.  This delete request also happens with the bridging
      vlan_filtering is enabled and then disabled.
      
      Bonding has a similar behavior with the uc list, but since it
      also uses set_mac to manage netdev->dev_addr, it doesn't have
      the same the failure case.
      
      Because we store our netdev->dev_addr in our uc list, we need
      to ignore the delete request from dev_uc_sync so as to not
      lose the address and all hope of communicating.  Note that
      ndo_set_mac_address is expressly changing netdev->dev_addr,
      so no limitation is set there.
      
      Fixes: 2a654540 ("ionic: Add Rx filter and rx_mode ndo support")
      Signed-off-by: default avatarShannon Nelson <snelson@pensando.io>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c976a56
    • Haiyang Zhang's avatar
      net: mana: Fix error handling in mana_create_rxq() · be049936
      Haiyang Zhang authored
      Fix error handling in mana_create_rxq() when
      cq->gdma_id >= gc->max_num_cqs.
      
      Fixes: ca9c54d2 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Link: https://lore.kernel.org/r/1633698691-31721-1-git-send-email-haiyangz@microsoft.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      be049936
  4. 08 Oct, 2021 13 commits
    • Xiaolong Huang's avatar
      isdn: cpai: check ctr->cnr to avoid array index out of bound · 1f3e2e97
      Xiaolong Huang authored
      The cmtp_add_connection() would add a cmtp session to a controller
      and run a kernel thread to process cmtp.
      
      	__module_get(THIS_MODULE);
      	session->task = kthread_run(cmtp_session, session, "kcmtpd_ctr_%d",
      								session->num);
      
      During this process, the kernel thread would call detach_capi_ctr()
      to detach a register controller. if the controller
      was not attached yet, detach_capi_ctr() would
      trigger an array-index-out-bounds bug.
      
      [   46.866069][ T6479] UBSAN: array-index-out-of-bounds in
      drivers/isdn/capi/kcapi.c:483:21
      [   46.867196][ T6479] index -1 is out of range for type 'capi_ctr *[32]'
      [   46.867982][ T6479] CPU: 1 PID: 6479 Comm: kcmtpd_ctr_0 Not tainted
      5.15.0-rc2+ #8
      [   46.869002][ T6479] Hardware name: QEMU Standard PC (i440FX + PIIX,
      1996), BIOS 1.14.0-2 04/01/2014
      [   46.870107][ T6479] Call Trace:
      [   46.870473][ T6479]  dump_stack_lvl+0x57/0x7d
      [   46.870974][ T6479]  ubsan_epilogue+0x5/0x40
      [   46.871458][ T6479]  __ubsan_handle_out_of_bounds.cold+0x43/0x48
      [   46.872135][ T6479]  detach_capi_ctr+0x64/0xc0
      [   46.872639][ T6479]  cmtp_session+0x5c8/0x5d0
      [   46.873131][ T6479]  ? __init_waitqueue_head+0x60/0x60
      [   46.873712][ T6479]  ? cmtp_add_msgpart+0x120/0x120
      [   46.874256][ T6479]  kthread+0x147/0x170
      [   46.874709][ T6479]  ? set_kthread_struct+0x40/0x40
      [   46.875248][ T6479]  ret_from_fork+0x1f/0x30
      [   46.875773][ T6479]
      Signed-off-by: default avatarXiaolong Huang <butterflyhuangxx@gmail.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Link: https://lore.kernel.org/r/20211008065830.305057-1-butterflyhuangxx@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1f3e2e97
    • Sebastian Andrzej Siewior's avatar
      mqprio: Correct stats in mqprio_dump_class_stats(). · 14132690
      Sebastian Andrzej Siewior authored
      Introduction of lockless subqueues broke the class statistics.
      Before the change stats were accumulated in `bstats' and `qstats'
      on the stack which was then copied to struct gnet_dump.
      
      After the change the `bstats' and `qstats' are initialized to 0
      and never updated, yet still fed to gnet_dump. The code updates
      the global qdisc->cpu_bstats and qdisc->cpu_qstats instead,
      clobbering them. Most likely a copy-paste error from the code in
      mqprio_dump().
      
      __gnet_stats_copy_basic() and __gnet_stats_copy_queue() accumulate
      the values for per-CPU case but for global stats they overwrite
      the value, so only stats from the last loop iteration / tc end up
      in sch->[bq]stats.
      
      Use the on-stack [bq]stats variables again and add the stats manually
      in the global case.
      
      Fixes: ce679e8d ("net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio")
      Cc: John Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      https://lore.kernel.org/all/20211007175000.2334713-2-bigeasy@linutronix.de/Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      14132690
    • Jakub Kicinski's avatar
      Merge branch 'dsa-bridge-tx-forwarding-offload-fixes-part-1' · bccf56c4
      Jakub Kicinski authored
      Vladimir Oltean says:
      
      ====================
      DSA bridge TX forwarding offload fixes - part 1
      
      This is part 1 of a series of fixes to the bridge TX forwarding offload
      feature introduced for v5.15. Sadly, the other fixes are so intrusive
      that they cannot be reasonably be sent to the "net" tree, as they also
      include API changes. So they are left as part 2 for net-next.
      ====================
      
      Link: https://lore.kernel.org/r/20211007164711.2897238-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bccf56c4
    • Vladimir Oltean's avatar
      net: dsa: mv88e6xxx: isolate the ATU databases of standalone and bridged ports · 5bded825
      Vladimir Oltean authored
      Similar to commit 6087175b ("net: dsa: mt7530: use independent VLAN
      learning on VLAN-unaware bridges"), software forwarding between an
      unoffloaded LAG port (a bonding interface with an unsupported policy)
      and a mv88e6xxx user port directly under a bridge is broken.
      
      We adopt the same strategy, which is to make the standalone ports not
      find any ATU entry learned on a bridge port.
      
      Theory: the mv88e6xxx ATU is looked up by FID and MAC address. There are
      as many FIDs as VIDs (4096). The FID is derived from the VID when
      possible (the VTU maps a VID to a FID), with a fallback to the port
      based default FID value when not (802.1Q Mode is disabled on the port,
      or the classified VID isn't present in the VTU).
      
      The mv88e6xxx driver makes the following use of FIDs and VIDs:
      
      - the port's DefaultVID (to which untagged & pvid-tagged packets get
        classified) is 0 and is absent from the VTU, so this kind of packets is
        processed in FID 0, the default FID assigned by mv88e6xxx_setup_port.
      
      - every time a bridge VLAN is created, mv88e6xxx_port_vlan_join() ->
        mv88e6xxx_atu_new() associates a FID with that VID which increases
        linearly starting from 1. Like this:
      
        bridge vlan add dev lan0 vid 100 # FID 1
        bridge vlan add dev lan1 vid 100 # still FID 1
        bridge vlan add dev lan2 vid 1024 # FID 2
      
      The FID allocation made by the driver is sub-optimal for the following
      reasons:
      
      (a) A standalone port has a DefaultPVID of 0 and a default FID of 0 too.
          A VLAN-unaware bridged port has a DefaultPVID of 0 and a default FID
          of 0 too. The difference is that the bridged ports may learn ATU
          entries, while the standalone port has the requirement that it must
          not, and must not find them either. Standalone ports must not use
          the same FID as ports belonging to a bridge. All standalone ports
          can use the same FID, since the ATU will never have an entry in
          that FID.
      
      (b) Multiple VLAN-unaware bridges will all use a DefaultPVID of 0 and a
          default FID of 0 on all their ports. The FDBs will not be isolated
          between these bridges. Every VLAN-unaware bridge must use the same
          FID on all its ports, different from the FID of other bridge ports.
      
      (c) Each bridge VLAN uses a unique FID which is useful for Independent
          VLAN Learning, but the same VLAN ID on multiple VLAN-aware bridges
          will result in the same FID being used by mv88e6xxx_atu_new().
          The correct behavior is for VLAN 1 in br0 to have a different FID
          compared to VLAN 1 in br1.
      
      This patch cannot fix all the above. Traditionally the DSA framework did
      not care about this, and the reality is that DSA core involvement is
      needed for the aforementioned issues to be solved. The only thing we can
      solve here is an issue which does not require API changes, and that is
      issue (a), aka use a different FID for standalone ports vs ports under
      VLAN-unaware bridges.
      
      The first step is deciding what VID and FID to use for standalone ports,
      and what VID and FID for bridged ports. The 0/0 pair for standalone
      ports is what they used up till now, let's keep using that. For bridged
      ports, there are 2 cases:
      
      - VLAN-aware ports will never end up using the port default FID, because
        packets will always be classified to a VID in the VTU or dropped
        otherwise. The FID is the one associated with the VID in the VTU.
      
      - On VLAN-unaware ports, we _could_ leave their DefaultVID (pvid) at
        zero (just as in the case of standalone ports), and just change the
        port's default FID from 0 to a different number (say 1).
      
      However, Tobias points out that there is one more requirement to cater to:
      cross-chip bridging. The Marvell DSA header does not carry the FID in
      it, only the VID. So once a packet crosses a DSA link, if it has a VID
      of zero it will get classified to the default FID of that cascade port.
      Relying on a port default FID for upstream cascade ports results in
      contradictions: a default FID of 0 breaks ATU isolation of bridged ports
      on the downstream switch, a default FID of 1 breaks standalone ports on
      the downstream switch.
      
      So not only must standalone ports have different FIDs compared to
      bridged ports, they must also have different DefaultVID values.
      IEEE 802.1Q defines two reserved VID values: 0 and 4095. So we simply
      choose 4095 as the DefaultVID of ports belonging to VLAN-unaware
      bridges, and VID 4095 maps to FID 1.
      
      For the xmit operation to look up the same ATU database, we need to put
      VID 4095 in DSA tags sent to ports belonging to VLAN-unaware bridges
      too. All shared ports are configured to map this VID to the bridging
      FID, because they are members of that VLAN in the VTU. Shared ports
      don't need to have 802.1QMode enabled in any way, they always parse the
      VID from the DSA header, they don't need to look at the 802.1Q header.
      
      We install VID 4095 to the VTU in mv88e6xxx_setup_port(), with the
      mention that mv88e6xxx_vtu_setup() which was located right below that
      call was flushing the VTU so those entries wouldn't be preserved.
      So we need to relocate the VTU flushing prior to the port initialization
      during ->setup(). Also note that this is why it is safe to assume that
      VID 4095 will get associated with FID 1: the user ports haven't been
      created, so there is no avenue for the user to create a bridge VLAN
      which could otherwise race with the creation of another FID which would
      otherwise use up the non-reserved FID value of 1.
      
      [ Currently mv88e6xxx_port_vlan_join() doesn't have the option of
        specifying a preferred FID, it always calls mv88e6xxx_atu_new(). ]
      
      mv88e6xxx_port_db_load_purge() is the function to access the ATU for
      FDB/MDB entries, and it used to determine the FID to use for
      VLAN-unaware FDB entries (VID=0) using mv88e6xxx_port_get_fid().
      But the driver only called mv88e6xxx_port_set_fid() once, during probe,
      so no surprises, the port FID was always 0, the call to get_fid() was
      redundant. As much as I would have wanted to not touch that code, the
      logic is broken when we add a new FID which is not the port-based
      default. Now the port-based default FID only corresponds to standalone
      ports, and FDB/MDB entries belong to the bridging service. So while in
      the future, when the DSA API will support FDB isolation, we will have to
      figure out the FID based on the bridge number, for now there's a single
      bridging FID, so hardcode that.
      
      Lastly, the tagger needs to check, when it is transmitting a VLAN
      untagged skb, whether it is sending it towards a bridged or a standalone
      port. When we see it is bridged we assume the bridge is VLAN-unaware.
      Not because it cannot be VLAN-aware but:
      
      - if we are transmitting from a VLAN-aware bridge we are likely doing so
        using TX forwarding offload. That code path guarantees that skbs have
        a vlan hwaccel tag in them, so we would not enter the "else" branch
        of the "if (skb->protocol == htons(ETH_P_8021Q))" condition.
      
      - if we are transmitting on behalf of a VLAN-aware bridge but with no TX
        forwarding offload (no PVT support, out of space in the PVT, whatever),
        we would indeed be transmitting with VLAN 4095 instead of the bridge
        device's pvid. However we would be injecting a "From CPU" frame, and
        the switch won't learn from that - it only learns from "Forward" frames.
        So it is inconsequential for address learning. And VLAN 4095 is
        absolutely enough for the frame to exit the switch, since we never
        remove that VLAN from any port.
      
      Fixes: 57e661aa ("net: dsa: mv88e6xxx: Link aggregation support")
      Reported-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5bded825
    • Vladimir Oltean's avatar
      net: dsa: mv88e6xxx: keep the pvid at 0 when VLAN-unaware · 8b6836d8
      Vladimir Oltean authored
      The VLAN support in mv88e6xxx has a loaded history. Commit 2ea7a679
      ("net: dsa: Don't add vlans when vlan filtering is disabled") noticed
      some issues with VLAN and decided the best way to deal with them was to
      make the DSA core ignore VLANs added by the bridge while VLAN awareness
      is turned off. Those issues were never explained, just presented as
      "at least one corner case".
      
      That approach had problems of its own, presented by
      commit 54a0ed0d ("net: dsa: provide an option for drivers to always
      receive bridge VLANs") for the DSA core, followed by
      commit 1fb74191 ("net: dsa: mv88e6xxx: fix vlan setup") which
      applied ds->configure_vlan_while_not_filtering = true for mv88e6xxx in
      particular.
      
      We still don't know what corner case Andrew saw when he wrote
      commit 2ea7a679 ("net: dsa: Don't add vlans when vlan filtering is
      disabled"), but Tobias now reports that when we use TX forwarding
      offload, pinging an external station from the bridge device is broken if
      the front-facing DSA user port has flooding turned off. The full
      description is in the link below, but for short, when a mv88e6xxx port
      is under a VLAN-unaware bridge, it inherits that bridge's pvid.
      So packets ingressing a user port will be classified to e.g. VID 1
      (assuming that value for the bridge_default_pvid), whereas when
      tag_dsa.c xmits towards a user port, it always sends packets using a VID
      of 0 if that port is standalone or under a VLAN-unaware bridge - or at
      least it did so prior to commit d82f8ab0 ("net: dsa: tag_dsa:
      offload the bridge forwarding process").
      
      In any case, when there is a conversation between the CPU and a station
      connected to a user port, the station's MAC address is learned in VID 1
      but the CPU tries to transmit through VID 0. The packets reach the
      intended station, but via flooding and not by virtue of matching the
      existing ATU entry.
      
      DSA has established (and enforced in other drivers: sja1105, felix,
      mt7530) that a VLAN-unaware port should use a private pvid, and not
      inherit the one from the bridge. The bridge's pvid should only be
      inherited when that bridge is VLAN-aware, so all state transitions need
      to be handled. On the other hand, all bridge VLANs should sit in the VTU
      starting with the moment when the bridge offloads them via switchdev,
      they are just not used.
      
      This solves the problem that Tobias sees because packets ingressing on
      VLAN-unaware user ports now get classified to VID 0, which is also the
      VID used by tag_dsa.c on xmit.
      
      Fixes: d82f8ab0 ("net: dsa: tag_dsa: offload the bridge forwarding process")
      Link: https://patchwork.kernel.org/project/netdevbpf/patch/20211003222312.284175-2-vladimir.oltean@nxp.com/#24491503Reported-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8b6836d8
    • Vladimir Oltean's avatar
      net: dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware bridges using VID 0 · c7709a02
      Vladimir Oltean authored
      The present code is structured this way due to an incomplete thought
      process. In Documentation/networking/switchdev.rst we document that if a
      bridge is VLAN-unaware, then the presence or lack of a pvid on a bridge
      port (or on the bridge itself, for that matter) should not affect the
      ability to receive and transmit tagged or untagged packets.
      
      If the bridge on behalf of which we are sending this packet is
      VLAN-aware, then the TX forwarding offload API ensures that the skb will
      be VLAN-tagged (if the packet was sent by user space as untagged, it
      will get transmitted town to the driver as tagged with the bridge
      device's pvid). But if the bridge is VLAN-unaware, it may or may not be
      VLAN-tagged. In fact the logic to insert the bridge's PVID came from the
      idea that we should emulate what is being done in the VLAN-aware case.
      But we shouldn't.
      
      It appears that injecting packets using a VLAN ID of 0 serves the
      purpose of forwarding the packets to the egress port with no VLAN tag
      added or stripped by the hardware, and no filtering being performed.
      So we can simply remove the superfluous logic.
      
      One reason why this logic is broken is that when CONFIG_BRIDGE_VLAN_FILTERING=n,
      we call br_vlan_get_pvid_rcu() but that returns an error and we do error
      out, dropping all packets on xmit. Not really smart. This is also an
      issue when the user deletes the bridge pvid:
      
      $ bridge vlan del dev br0 vid 1 self
      
      As mentioned, in both cases, packets should still flow freely, and they
      do just that on any net device where the bridge is not offloaded, but on
      mv88e6xxx they don't.
      
      Fixes: d82f8ab0 ("net: dsa: tag_dsa: offload the bridge forwarding process")
      Reported-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://patchwork.kernel.org/project/netdevbpf/patch/20211003155141.2241314-1-andrew@lunn.ch/
      Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210928233708.1246774-1-vladimir.oltean@nxp.com/Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c7709a02
    • Vladimir Oltean's avatar
      net: dsa: fix bridge_num not getting cleared after ports leaving the bridge · 1bec0f05
      Vladimir Oltean authored
      The dp->bridge_num is zero-based, with -1 being the encoding for an
      invalid value. But dsa_bridge_num_put used to check for an invalid value
      by comparing bridge_num with 0, which is of course incorrect.
      
      The result is that the bridge_num will never get cleared by
      dsa_bridge_num_put, and further port joins to other bridges will get a
      bridge_num larger than the previous one, and once all the available
      bridges with TX forwarding offload supported by the hardware get
      exhausted, the TX forwarding offload feature is simply disabled.
      
      In the case of sja1105, 7 iterations of the loop below are enough to
      exhaust the TX forwarding offload bits, and further bridge joins operate
      without that feature.
      
      ip link add br0 type bridge vlan_filtering 1
      
      while :; do
              ip link set sw0p2 master br0 && sleep 1
              ip link set sw0p2 nomaster && sleep 1
      done
      
      This issue is enough of an indication that having the dp->bridge_num
      invalid encoding be a negative number is prone to bugs, so this will be
      changed to a one-based value, with the dp->bridge_num of zero being the
      indication of no bridge. However, that is material for net-next.
      
      Fixes: f5e165e7 ("net: dsa: track unique bridge numbers across all DSA switch trees")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1bec0f05
    • Lin Ma's avatar
      nfc: nci: fix the UAF of rf_conn_info object · 1b1499a8
      Lin Ma authored
      The nci_core_conn_close_rsp_packet() function will release the conn_info
      with given conn_id. However, it needs to set the rf_conn_info to NULL to
      prevent other routines like nci_rf_intf_activated_ntf_packet() to trigger
      the UAF.
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b1499a8
    • Karsten Graul's avatar
      net/smc: improved fix wait on already cleared link · 95f7f3e7
      Karsten Graul authored
      Commit 8f3d65c1 ("net/smc: fix wait on already cleared link")
      introduced link refcounting to avoid waits on already cleared links.
      This patch extents and improves the refcounting to cover all
      remaining possible cases for this kind of error situation.
      
      Fixes: 15e1b99a ("net/smc: no WR buffer wait for terminating link group")
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95f7f3e7
    • David S. Miller's avatar
      Merge branch 'stmmac-regression-fix' · 097657c9
      David S. Miller authored
      Merge branch 'stmmac-regression-fix'
      
      Herve Codina says:
      
      ====================
      net: stmmac: fix regression on SPEAr3xx SOC
      
      The ethernet driver used on old SPEAr3xx soc was previously supported on old
      kernel. Some regressions were introduced during the different updates leading
      to a broken driver for this soc.
      
      This series fixes these regressions and brings back ethernet on SPEAr3xx.
      Tested on a SPEAr320 board.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      097657c9
    • Herve Codina's avatar
      ARM: dts: spear3xx: Fix gmac node · 6636fec2
      Herve Codina authored
      On SPEAr3xx, ethernet driver is not compatible with the SPEAr600
      one.
      Indeed, SPEAr3xx uses an earlier version of this IP (v3.40) and
      needs some driver tuning compare to SPEAr600.
      
      The v3.40 IP support was added to stmmac driver and this patch
      fixes this issue and use the correct compatible string for
      SPEAr3xx
      Signed-off-by: default avatarHerve Codina <herve.codina@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6636fec2
    • Herve Codina's avatar
      net: stmmac: add support for dwmac 3.40a · 9cb1d19f
      Herve Codina authored
      dwmac 3.40a is an old ip version that can be found on SPEAr3xx soc.
      Signed-off-by: default avatarHerve Codina <herve.codina@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9cb1d19f
    • Herve Codina's avatar
      dt-bindings: net: snps,dwmac: add dwmac 3.40a IP version · 3781b6ad
      Herve Codina authored
      dwmac 3.40a is an old ip version that can be found on SPEAr3xx soc.
      Signed-off-by: default avatarHerve Codina <herve.codina@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3781b6ad