1. 17 Sep, 2021 5 commits
  2. 16 Sep, 2021 29 commits
  3. 15 Sep, 2021 6 commits
    • Vladimir Oltean's avatar
      net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports · a57d8c21
      Vladimir Oltean authored
      Sometimes when unbinding the mv88e6xxx driver on Turris MOX, these error
      messages appear:
      
      mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 1 from fdb: -2
      mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 0 from fdb: -2
      mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 100 from fdb: -2
      mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 1 from fdb: -2
      mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 0 from fdb: -2
      
      (and similarly for other ports)
      
      What happens is that DSA has a policy "even if there are bugs, let's at
      least not leak memory" and dsa_port_teardown() clears the dp->fdbs and
      dp->mdbs lists, which are supposed to be empty.
      
      But deleting that cleanup code, the warnings go away.
      
      => the FDB and MDB lists (used for refcounting on shared ports, aka CPU
      and DSA ports) will eventually be empty, but are not empty by the time
      we tear down those ports. Aka we are deleting them too soon.
      
      The addresses that DSA complains about are host-trapped addresses: the
      local addresses of the ports, and the MAC address of the bridge device.
      
      The problem is that offloading those entries happens from a deferred
      work item scheduled by the SWITCHDEV_FDB_DEL_TO_DEVICE handler, and this
      races with the teardown of the CPU and DSA ports where the refcounting
      is kept.
      
      In fact, not only it races, but fundamentally speaking, if we iterate
      through the port list linearly, we might end up tearing down the shared
      ports even before we delete a DSA user port which has a bridge upper.
      
      So as it turns out, we need to first tear down the user ports (and the
      unused ones, for no better place of doing that), then the shared ports
      (the CPU and DSA ports). In between, we need to ensure that all work
      items scheduled by our switchdev handlers (which only run for user
      ports, hence the reason why we tear them down first) have finished.
      
      Fixes: 161ca59d ("net: dsa: reference count the MDB entries at the cross-chip notifier level")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20210914134726.2305133-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a57d8c21
    • Vladimir Oltean's avatar
      Revert "net: phy: Uniform PHY driver access" · 301de697
      Vladimir Oltean authored
      This reverts commit 3ac8eed6, which did
      more than it said on the box, and not only it replaced to_phy_driver
      with phydev->drv, but it also removed the "!drv" check, without actually
      explaining why that is fine.
      
      That patch in fact breaks suspend/resume on any system which has PHY
      devices with no drivers bound.
      
      The stack trace is:
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000000000000e8
      pc : mdio_bus_phy_suspend+0xd8/0xec
      lr : dpm_run_callback+0x38/0x90
      Call trace:
       mdio_bus_phy_suspend+0xd8/0xec
       dpm_run_callback+0x38/0x90
       __device_suspend+0x108/0x3cc
       dpm_suspend+0x140/0x210
       dpm_suspend_start+0x7c/0xa0
       suspend_devices_and_enter+0x13c/0x540
       pm_suspend+0x2a4/0x330
      
      Examples why that assumption is not fine:
      
      - There is an MDIO bus with a PHY device that doesn't have a specific
        PHY driver loaded, because mdiobus_register() automatically creates a
        PHY device for it but there is no specific PHY driver in the system.
        Normally under those circumstances, the generic PHY driver will be
        bound lazily to it (at phy_attach_direct time). But some Ethernet
        drivers attach to their PHY at .ndo_open time. Until then it, the
        to-be-driven-by-genphy PHY device will not have a driver. The blamed
        patch amounts to saying "you need to open all net devices before the
        system can suspend, to avoid the NULL pointer dereference".
      
      - There is any raw MDIO device which has 'plausible' values in the PHY
        ID registers 2 and 3, which is located on an MDIO bus whose driver
        does not set bus->phy_mask = ~0 (which prevents auto-scanning of PHY
        devices). An example could be a MAC's internal MDIO bus with PCS
        devices on it, for serial links such as SGMII. PHY devices will get
        created for those PCSes too, due to that MDIO bus auto-scanning, and
        although those PHY devices are not used, they do not bother anybody
        either. PCS devices are usually managed in Linux as raw MDIO devices.
        Nonetheless, they do not have a PHY driver, nor does anybody attempt
        to connect to them (because they are not a PHY), and therefore this
        patch breaks that.
      
      The goal itself of the patch is questionable, so I am going for a
      straight revert. to_phy_driver does not seem to have a need to be
      replaced by phydev->drv, in fact that might even trigger code paths
      which were not given too deep of a thought.
      
      For instance:
      
      phy_probe populates phydev->drv at the beginning, but does not clean it
      up on any error (including EPROBE_DEFER). So if the phydev driver
      requests probe deferral, phydev->drv will remain populated despite there
      being no driver bound.
      
      If a system suspend starts in between the initial probe deferral request
      and the subsequent probe retry, we will be calling the phydev->drv->suspend
      method, but _before_ any phydev->drv->probe call has succeeded.
      
      That is to say, if the phydev->drv is allocating any driver-private data
      structure in ->probe, it pretty much expects that data structure to be
      available in ->suspend. But it may not. That is a pretty insane
      environment to present to PHY drivers.
      
      In the code structure before the blamed patch, mdio_bus_phy_may_suspend
      would just say "no, don't suspend" to any PHY device which does not have
      a driver pointer _in_the_device_structure_ (not the phydev->drv). That
      would essentially ensure that ->suspend will never get called for a
      device that has not yet successfully completed probe. This is the code
      structure the patch is returning to, via the revert.
      
      Fixes: 3ac8eed6 ("net: phy: Uniform PHY driver access")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20210914140515.2311548-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      301de697
    • Vladimir Oltean's avatar
      net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup · 6a52e733
      Vladimir Oltean authored
      DSA supports connecting to a phy-handle, and has a fallback to a non-OF
      based method of connecting to an internal PHY on the switch's own MDIO
      bus, if no phy-handle and no fixed-link nodes were present.
      
      The -ENODEV error code from the first attempt (phylink_of_phy_connect)
      is what triggers the second attempt (phylink_connect_phy).
      
      However, when the first attempt returns a different error code than
      -ENODEV, this results in an unbalance of calls to phylink_create and
      phylink_destroy by the time we exit the function. The phylink instance
      has leaked.
      
      There are many other error codes that can be returned by
      phylink_of_phy_connect. For example, phylink_validate returns -EINVAL.
      So this is a practical issue too.
      
      Fixes: aab9c406 ("net: dsa: Plug in PHYLINK support")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://lore.kernel.org/r/20210914134331.2303380-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6a52e733
    • Linus Torvalds's avatar
      qnx4: avoid stringop-overread errors · b7213ffa
      Linus Torvalds authored
      The qnx4 directory entries are 64-byte blocks that have different
      contents depending on the a status byte that is in the last byte of the
      block.
      
      In particular, a directory entry can be either a "link info" entry with
      a 48-byte name and pointers to the real inode information, or an "inode
      entry" with a smaller 16-byte name and the full inode information.
      
      But the code was written to always just treat the directory name as if
      it was part of that "inode entry", and just extend the name to the
      longer case if the status byte said it was a link entry.
      
      That work just fine and gives the right results, but now that gcc is
      tracking data structure accesses much more, the code can trigger a
      compiler error about using up to 48 bytes (the long name) in a structure
      that only has that shorter name in it:
      
         fs/qnx4/dir.c: In function ‘qnx4_readdir’:
         fs/qnx4/dir.c:51:32: error: ‘strnlen’ specified bound 48 exceeds source size 16 [-Werror=stringop-overread]
            51 |                         size = strnlen(de->di_fname, size);
               |                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~
         In file included from fs/qnx4/qnx4.h:3,
                          from fs/qnx4/dir.c:16:
         include/uapi/linux/qnx4_fs.h:45:25: note: source object declared here
            45 |         char            di_fname[QNX4_SHORT_NAME_MAX];
               |                         ^~~~~~~~
      
      which is because the source code doesn't really make this whole "one of
      two different types" explicit.
      
      Fix this by introducing a very explicit union of the two types, and
      basically explaining to the compiler what is really going on.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b7213ffa
    • Linus Torvalds's avatar
      sparc: avoid stringop-overread errors · fc7c028d
      Linus Torvalds authored
      The sparc mdesc code does pointer games with 'struct mdesc_hdr', but
      didn't describe to the compiler how that header is then followed by the
      data that the header describes.
      
      As a result, gcc is now unhappy since it does stricter pointer range
      tracking, and doesn't understand about how these things work.  This
      results in various errors like:
      
          arch/sparc/kernel/mdesc.c: In function ‘mdesc_node_by_name’:
          arch/sparc/kernel/mdesc.c:647:22: error: ‘strcmp’ reading 1 or more bytes from a region of size 0 [-Werror=stringop-overread]
            647 |                 if (!strcmp(names + ep[ret].name_offset, name))
                |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      which are easily avoided by just describing 'struct mdesc_hdr' better,
      and making the node_block() helper function look into that unsized
      data[] that follows the header.
      
      This makes the sparc64 build happy again at least for my cross-compiler
      version (gcc version 11.2.1).
      
      Link: https://lore.kernel.org/lkml/CAHk-=wi4NW3NC0xWykkw=6LnjQD6D_rtRtxY9g8gQAJXtQMi8A@mail.gmail.com/
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fc7c028d
    • Linus Torvalds's avatar
      Merge branch 'absolute-pointer' (patches from Guenter) · d6efd3f1
      Linus Torvalds authored
      Merge absolute_pointer macro series from Guenter Roeck:
       "Kernel test builds currently fail for several architectures with error
        messages such as the following.
      
        drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
        arch/m68k/include/asm/string.h:72:25: error:
              '__builtin_memcpy' reading 6 bytes from a region of size 0
                      [-Werror=stringop-overread]
      
        Such warnings may be reported by gcc 11.x for string and memory
        operations on fixed addresses if gcc's builtin functions are used for
        those operations.
      
        This series introduces absolute_pointer() to fix the problem.
        absolute_pointer() disassociates a pointer from its originating symbol
        type and context, and thus prevents gcc from making assumptions about
        pointers passed to memory operations"
      
      * emailed patches from Guenter Roeck <linux@roeck-us.net>:
        alpha: Use absolute_pointer to define COMMAND_LINE
        alpha: Move setup.h out of uapi
        net: i825xx: Use absolute_pointer for memcpy from fixed memory location
        compiler.h: Introduce absolute_pointer macro
      d6efd3f1