1. 07 Jan, 2020 6 commits
    • David S. Miller's avatar
      Merge branch 'Unique-mv88e6xxx-IRQ-names' · 1b935183
      David S. Miller authored
      Andrew Lunn says:
      
      ====================
      Unique mv88e6xxx IRQ names
      
      There are a few boards which have multiple mv88e6xxx switches. With
      such boards, it can be hard to determine which interrupts belong to
      which switches. Make the interrupt names unique by including the
      device name in the interrupt name. For the SERDES interrupt, also
      include the port number. As a result of these patches ZII devel C
      looks like:
      
       50:          0  gpio-vf610  27 Level     mv88e6xxx-0.1:00
       54:          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-0.1:00-g1-atu-prob
       56:          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-0.1:00-g1-vtu-prob
       58:          0  mv88e6xxx-g1   7 Edge      mv88e6xxx-0.1:00-g2
       61:          0  mv88e6xxx-g2   1 Edge      !mdio-mux!mdio@1!switch@0!mdio:01
       62:          0  mv88e6xxx-g2   2 Edge      !mdio-mux!mdio@1!switch@0!mdio:02
       63:          0  mv88e6xxx-g2   3 Edge      !mdio-mux!mdio@1!switch@0!mdio:03
       64:          0  mv88e6xxx-g2   4 Edge      !mdio-mux!mdio@1!switch@0!mdio:04
       70:          0  mv88e6xxx-g2  10 Edge      mv88e6xxx-0.1:00-serdes-10
       75:          0  mv88e6xxx-g2  15 Edge      mv88e6xxx-0.1:00-watchdog
       76:          5  gpio-vf610  26 Level     mv88e6xxx-0.2:00
       80:          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-0.2:00-g1-atu-prob
       82:          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-0.2:00-g1-vtu-prob
       84:          4  mv88e6xxx-g1   7 Edge      mv88e6xxx-0.2:00-g2
       87:          2  mv88e6xxx-g2   1 Edge      !mdio-mux!mdio@2!switch@0!mdio:01
       88:          0  mv88e6xxx-g2   2 Edge      !mdio-mux!mdio@2!switch@0!mdio:02
       89:          0  mv88e6xxx-g2   3 Edge      !mdio-mux!mdio@2!switch@0!mdio:03
       90:          0  mv88e6xxx-g2   4 Edge      !mdio-mux!mdio@2!switch@0!mdio:04
       95:          3  mv88e6xxx-g2   9 Edge      mv88e6xxx-0.2:00-serdes-9
       96:          0  mv88e6xxx-g2  10 Edge      mv88e6xxx-0.2:00-serdes-10
      101:          0  mv88e6xxx-g2  15 Edge      mv88e6xxx-0.2:00-watchdog
      
      Interrupt names like !mdio-mux!mdio@2!switch@0!mdio:01 are created by
      phylib for the integrated PHYs. The mv88e6xxx driver does not
      determine these names.
      ====================
      Tested-by: default avatarChris Healy <cphealy@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b935183
    • Andrew Lunn's avatar
      net: dsa: mv88e6xxx: Unique ATU and VTU IRQ names · 8ddf0b56
      Andrew Lunn authored
      Dynamically generate a unique interrupt name for the VTU and ATU,
      based on the device name.
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ddf0b56
    • Andrew Lunn's avatar
      net: dsa: mv88e6xxx: Unique g2 IRQ name · 06acd114
      Andrew Lunn authored
      Dynamically generate a unique g2 interrupt name, based on the
      device name.
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06acd114
    • Andrew Lunn's avatar
      net: dsa: mv88e6xxx: Unique watchdog IRQ name · 8b4db289
      Andrew Lunn authored
      Dynamically generate a unique watchdog interrupt name, based on the
      device name.
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b4db289
    • Andrew Lunn's avatar
      net: dsa: mv88e6xxx: Unique SERDES interrupt names · e6f2f6b8
      Andrew Lunn authored
      Dynamically generate a unique SERDES interrupt name, based on the
      device name and the port the SERDES is for. For example:
      
       95:          3  mv88e6xxx-g2   9 Edge      mv88e6xxx-0.2:00-serdes-9
       96:          0  mv88e6xxx-g2  10 Edge      mv88e6xxx-0.2:00-serdes-10
      
      The 0.2:00 indicates the switch and -9 indicates port 9.
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6f2f6b8
    • Andrew Lunn's avatar
      net: dsa: mv88e6xxx: Unique IRQ name · 3095383a
      Andrew Lunn authored
      Dynamically generate a unique switch interrupt name, based on the
      device name.
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3095383a
  2. 06 Jan, 2020 34 commits
    • David S. Miller's avatar
      Merge branch 'ethtool-allow-nesting-of-begin-and-complete-callbacks' · 50d31037
      David S. Miller authored
      Michal Kubecek says:
      
      ====================
      ethtool: allow nesting of begin() and complete() callbacks
      
      The ethtool ioctl interface used to guarantee that ethtool_ops callbacks
      were always called in a block between calls to ->begin() and ->complete()
      (if these are defined) and that this whole block was executed with RTNL
      lock held:
      
      	rtnl_lock();
      	ops->begin();
      	/* other ethtool_ops calls */
      	ops->complete();
      	rtnl_unlock();
      
      This prevented any nesting or crossing of the begin-complete blocks.
      However, this is no longer guaranteed even for ioctl interface as at least
      ethtool_phys_id() releases RTNL lock while waiting for a timer. With the
      introduction of netlink ethtool interface, the begin-complete pairs are
      naturally nested e.g. when a request triggers a netlink notification.
      
      Fortunately, only minority of networking drivers implements begin() and
      complete() callbacks and most of those that do, fall into three groups:
      
        - wrappers for pm_runtime_get_sync() and pm_runtime_put()
        - wrappers for clk_prepare_enable() and clk_disable_unprepare()
        - begin() checks netif_running() (fails if false), no complete()
      
      First two have their own refcounting, third is safe w.r.t. nesting of the
      blocks.
      
      Only three in-tree networking drivers need an update to deal with nesting
      of begin() and complete() calls: via-velocity and epic100 perform resume
      and suspend on their own and wil6210 completely serializes the calls using
      its own mutex (which would lead to a deadlock if a request request
      triggered a netlink notification). The series addresses these problems.
      
      changes between v1 and v2:
        - fix inverted condition in epic100 ethtool_begin() (thanks to Andrew
          Lunn)
      ====================
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50d31037
    • Michal Kubecek's avatar
      epic100: allow nesting of ethtool_ops begin() and complete() · 4ac0ac84
      Michal Kubecek authored
      Unlike most networking drivers using begin() and complete() ethtool_ops
      callbacks to resume a device which is down and suspend it again when done,
      epic100 does not use standard refcounted infrastructure but sets device
      sleep state directly.
      
      With the introduction of netlink ethtool interface, we may have nested
      begin-complete blocks so that inner complete() would put the device back to
      sleep for the rest of the outer block.
      
      To avoid rewriting an old and not very actively developed driver, just add
      a nesting counter and only perform resume and suspend on the outermost
      level.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ac0ac84
    • Michal Kubecek's avatar
      via-velocity: allow nesting of ethtool_ops begin() and complete() · 71f711a4
      Michal Kubecek authored
      Unlike most networking drivers using begin() and complete() ethtool_ops
      callbacks to resume a device which is down and suspend it again when done,
      via-velocity does not use standard refcounted infrastructure but sets
      device sleep state directly.
      
      With the introduction of netlink ethtool interface, we may have nested
      begin-complete blocks so that inner complete() would put the device back to
      sleep for the rest of the outer block.
      
      To avoid rewriting an old and not very actively developed driver, just add
      a nesting counter and only perform resume and suspend on the outermost
      level.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71f711a4
    • Michal Kubecek's avatar
      wil6210: get rid of begin() and complete() ethtool_ops · a69faa09
      Michal Kubecek authored
      The wil6210 driver locks a mutex in begin() ethtool_ops callback and
      unlocks it in complete() so that all ethtool requests are serialized. This
      is not going to work correctly with netlink interface; e.g. when ioctl
      triggers a netlink notification, netlink code would call begin() again
      while the mutex taken by ioctl code is still held by the same task.
      
      Let's get rid of the begin() and complete() callbacks and move the mutex
      locking into the remaining ethtool_ops handlers except get_drvinfo which
      only copies strings that are not changing so that there is no need for
      serialization.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a69faa09
    • David Ahern's avatar
      fcnal-test: Fix vrf argument in local tcp tests · 17aa23ee
      David Ahern authored
      The recent MD5 tests added duplicate configuration in the default VRF.
      This change exposed a bug in existing tests designed to verify no
      connection when client and server are not in the same domain. The
      server should be running bound to the vrf device with the client run
      in the default VRF (the -2 option is meant for validating connection
      data). Fix the option for both tests.
      
      While technically this is a bug in previous releases, the tests are
      properly failing since the default VRF does not have any routing
      configuration so there really is no need to backport to prior releases.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17aa23ee
    • Christophe JAILLET's avatar
      gtp: simplify error handling code in 'gtp_encap_enable()' · b289ba5e
      Christophe JAILLET authored
      'gtp_encap_disable_sock(sk)' handles the case where sk is NULL, so there
      is no need to test it before calling the function.
      
      This saves a few line of code.
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b289ba5e
    • David S. Miller's avatar
      Merge branch 'mlxsw-Disable-checks-in-hardware-pipeline' · f233789d
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Disable checks in hardware pipeline
      
      Amit says:
      
      The hardware pipeline contains some checks that, by default, are
      configured to drop packets. Since the software data path does not drop
      packets due to these reasons and since we are interested in offloading
      the software data path to hardware, then these checks should be disabled
      in the hardware pipeline as well.
      
      This patch set changes mlxsw to disable four of these checks and adds
      corresponding selftests. The tests pass both when the software data path
      is exercised (using veth pair) and when the hardware data path is
      exercised (using mlxsw ports in loopback).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f233789d
    • Amit Cohen's avatar
      selftests: forwarding: router: Add test case for destination IP link-local · ef11ffa2
      Amit Cohen authored
      Add test case to check that packets are not dropped when they need to be
      routed and their destination is link-local, i.e., 169.254.0.0/16.
      Signed-off-by: default avatarAmit Cohen <amitc@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef11ffa2
    • Amit Cohen's avatar
      mlxsw: spectrum: Disable DIP_LINK_LOCAL check in hardware pipeline · ca360db4
      Amit Cohen authored
      The check drops packets if they need to be routed and their destination
      IP is link-local, i.e., belongs to 169.254.0.0/16 address range.
      
      Disable the check since the kernel forwards such packets and does not
      drop them.
      Signed-off-by: default avatarAmit Cohen <amitc@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca360db4
    • Amit Cohen's avatar
      selftests: forwarding: router: Add test case for source IP equals destination IP · 6e734f86
      Amit Cohen authored
      Add test case to check that packets are not dropped when they need to be
      routed and their source IP equals to their destination IP.
      Signed-off-by: default avatarAmit Cohen <amitc@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e734f86
    • Amit Cohen's avatar
      mlxsw: spectrum: Disable SIP_DIP check in hardware pipeline · e317b0f7
      Amit Cohen authored
      The check drops packets if they need to be routed and their source IP
      equals to their destination IP.
      
      Disable the check since the kernel forwards such packets and does not
      drop them.
      Signed-off-by: default avatarAmit Cohen <amitc@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e317b0f7
    • Amit Cohen's avatar
      selftests: forwarding: router: Add test case for multicast destination MAC mismatch · 59b3a4f3
      Amit Cohen authored
      Add test case to check that packets are not dropped when they need to be
      routed and their multicast MAC mismatched to their multicast destination
      IP.
      
      i.e., destination IP is multicast and
      	* for IPV4: DMAC !=  {01-00-5E-0 (25 bits), DIP[22:0]}
      	* for IPV6: DMAC !=  {33-33-0 (16 bits), DIP[31:0]}
      Signed-off-by: default avatarAmit Cohen <amitc@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59b3a4f3
    • Amit Cohen's avatar
      mlxsw: spectrum: Disable MC_DMAC check in hardware pipeline · 359ec566
      Amit Cohen authored
      The check drops packets if they need to be routed and their multicast
      MAC mismatched to their multicast destination IP.
      
      For IPV4:
      DMAC is mismatched if it is different from {01-00-5E-0 (25 bits),
      DIP[22:0]}
      
      For IPV6:
      DMAC is mismatched if it is different from {33-33-0 (16 bits),
      DIP[31:0]}
      
      Disable the check since the kernel forwards such packets and does not
      drop them.
      Signed-off-by: default avatarAmit Cohen <amitc@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      359ec566
    • Amit Cohen's avatar
      selftests: forwarding: router: Add test case for source IP in class E · 383dbf70
      Amit Cohen authored
      Add test case to check that packets are not dropped when they need to be
      routed and their source IP in class E, (i.e., 240.0.0.0 –
      255.255.255.254).
      Signed-off-by: default avatarAmit Cohen <amitc@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      383dbf70
    • Amit Cohen's avatar
      mlxsw: spectrum: Disable SIP_CLASS_E check in hardware pipeline · 62b0fb09
      Amit Cohen authored
      The check drops packets if they need to be routed and their source IP is
      from class E, i.e., belongs to 240.0.0.0/4 address range, but different
      from 255.255.255.255.
      
      Disable the check since the kernel forwards such packets and does not
      drop them.
      Signed-off-by: default avatarAmit Cohen <amitc@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62b0fb09
    • David S. Miller's avatar
      Merge branch 'hns3-next' · 02b0442c
      David S. Miller authored
      Huazhong Tan says:
      
      ====================
      net: hns3: misc updates for -net-next
      
      This series includes some misc updates for the HNS3 ethernet driver.
      
      [patch 1] adds trace events support.
      [patch 2] re-organizes TQP's vector handling.
      [patch 3] renames the name of TQP vector.
      [patch 4] rewrites a log in the hclge_map_ring_to_vector().
      [patch 5] modifies the name of misc IRQ vector.
      [patch 6] handles the unexpected speed 0 return from HW.
      [patch 7] replaces an unsuitable variable type.
      [patch 8] modifies an unsuitable reset level for HW error.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02b0442c
    • Huazhong Tan's avatar
      net: hns3: modify an unsuitable reset level for hardware error · 7f39febf
      Huazhong Tan authored
      According to hardware user manual, when hardware reports error
      'roc_pkt_without_key_port', the driver should assert function
      reset to do the recovery.
      
      So this patch uses HNAE3_FUNC_RESET to replace HNAE3_GLOBAL_RESET.
      Signed-off-by: default avatarHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f39febf
    • Huazhong Tan's avatar
      net: hns3: replace an unsuitable variable type in hclge_inform_reset_assert_to_vf() · 7061867b
      Huazhong Tan authored
      In hclge_inform_reset_assert_to_vf(), variable reset_type(enum type)
      will be copied into msg_data whose size is 2 bytes. Currently, hip08
      is a little-endian machine, so the lower two bytes of reset_type will
      be copied to msg_data. But when running on a big-endian machine,
      msg_data will have a wrong value(the higher two bytes of reset_type).
      
      So this patch modifies the type of reset_type to u16, and adds a
      build check in case enum hnae3_reset_type has value larger than
      U16_MAX.
      Signed-off-by: default avatarHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7061867b
    • Guojia Liao's avatar
      net: hns3: add protection when get SFP speed as 0 · 2af8cb61
      Guojia Liao authored
      In some case, the MAC speed get from hardware maybe 0, it should
      not be set to mac->speed.
      Signed-off-by: default avatarGuojia Liao <liaoguojia@huawei.com>
      Signed-off-by: default avatarHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2af8cb61
    • Yonglong Liu's avatar
      net: hns3: modify the IRQ name of misc vectors · f97c4d82
      Yonglong Liu authored
      The misc IRQ of all the devices have the same name, so it's
      hard to find the right misc IRQ of the device.
      
      This patch modifies the misc IRQ names as "hclge/hclgevf"-misc-
      "pci name". And now the IRQ name is not related to net device
      name anymore, so change the HNAE3_INT_NAME_LEN to 32 bytes, and
      that is enough.
      Signed-off-by: default avatarYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: default avatarHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f97c4d82
    • Yonglong Liu's avatar
      net: hns3: modify an unsuitable log in hclge_map_ring_to_vector() · 7ab2b53e
      Yonglong Liu authored
      When the returned vector_id less than 0, the message should print
      out the vector who is getting vector index fail.
      
      So this patch replaces vector_id with vector, and re-format the
      message.
      Signed-off-by: default avatarYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: default avatarHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ab2b53e
    • Yonglong Liu's avatar
      net: hns3: modify the IRQ name of TQP vector · 5bffde62
      Yonglong Liu authored
      When rename the net devices, the IRQ number can not be
      fetched by the net device name, because the driver request
      the IRQ resources only when the vector resource changed, and
      the rename operation did not change the vector resources,
      so the IRQ name keeps the previous net device name.
      So this patch modifies the name of the TQP IRQ as
      "pci driver name"-"pci name"-"TxRx"-"index".
      Signed-off-by: default avatarYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: default avatarHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5bffde62
    • Yonglong Liu's avatar
      net: hns3: re-organize vector handle · 08a10068
      Yonglong Liu authored
      To prevent loss user's IRQ affinity configuration when DOWN,
      this patch moves out release/request operation of the vector
      handle from net DOWN/UP, just do it when vector resource changes.
      Signed-off-by: default avatarYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: default avatarHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      08a10068
    • Yunsheng Lin's avatar
      net: hns3: add trace event support for HNS3 driver · 698a8954
      Yunsheng Lin authored
      This adds trace support for HNS3 driver. It also declares
      some events which could be used to trace the events when a
      TX/RX BD is processed, and other events which are related to
      the processing of sk_buff, such as TSO, GRO.
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      698a8954
    • David S. Miller's avatar
      Merge branch 'Convert-Felix-DSA-switch-to-PHYLINK' · df2c2ba8
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Convert Felix DSA switch to PHYLINK
      
      Unlike most other conversions, this one is not by far a trivial one, and should
      be seen as "Layerscape PCS meets PHYLINK". Actually, the PCS doesn't
      need a lot of hand-holding and most of our other devices 'just work'
      (this one included) without any sort of operating system awareness, just
      an initialization procedure done typically in the bootloader.
      Our issues start when the PCS stops from "just working", and that is
      where PHYLINK comes in handy.
      
      The PCS is not specific to the Vitesse / Microsemi / Microchip switching core
      at all. Variations of this SerDes/PCS design can also be found on DPAA1 and
      DPAA2 hardware.
      
      The main idea of the abstraction provided is that the PCS looks so much like a
      PHY device, that we model it as an actual PHY device and run the generic PHY
      functions on it, where appropriate.
      
      The 4xSGMII, QSGMII and QSXGMII modes are fairly straightforward.
      
      The SerDes protocol which the driver calls 2500Base-X mode (a misnomer) is more
      interesting. There is a description of how it works and what can be done with
      it in patch 9/9 (in a comment above vsc9959_pcs_init_2500basex).
      In short, it is a fixed speed protocol with no auto-negotiation whatsoever.
      From my research of the SGMII-2500 patent [1], it has nothing to do with
      SGMII-2500. That one:
      * does not define any change to the AN base page compared to plain 10/100/1000
        SGMII. This implies that the 2500 speed is not negotiable, but the other
        speeds are. In our case, when the SerDes is configured for this protocol it's
        configured for good, there's no going back to SGMII.
      * runs at a higher base frequency than regular SGMII. So SGMII-2500 operating
        at 1000 Mbps wouldn't interoperate with plain SGMII at 1000 Mbps. Strange,
        but ok..
      * Emulates lower link speeds than 2500 by duplicating the codewords twice, then
        thrice, then twice again etc (2.5/25/250 times on average). The Layerscape
        PCS doesn't do that (it is fixed at 2500 Mbaud).
      
      But on the other hand it isn't completely compatible with Base-X either,
      since it doesn't do 802.3z / clause 37 auto negotiation (flow control,
      local/remote fault etc). It is compatible with 2500Base-X without
      in-band AN, and that is exactly how we decided to expose it (this is
      actually similar to what others do).
      
      For SGMII and USXGMII, the driver is using the PHYLINK 'managed =
      "in-band-status"' DTS binding to figure out whether in-band AN is
      expected to be enabled in the PCS or not. It is expected that the
      attached PHY follows suite, but there is a gap here: the PHY driver does
      not react to this setting, so only one of "AN on" and "AN off" works on
      any particular PHY, even though that PHY might support bypassing the
      SGMII AN process, as is the case on the VSC8514 PHY present on the
      LS1028A-RDB board. A separate series will be sent to propose a way to
      deal with that.
      
      I dropped the Ocelot PHYLINK conversion because:
      * I don't have VSC7514 hardware anyway
      * The hardware is so different in this regard that there's almost nothing to
        share anyway.
      
      Changes in v5:
      
      - Added the register write to DEV_CLOCK_CFG back in
        felix_phylink_mac_config in patch 9/9.
      
      Changes in v4:
      
      - This is mostly a resend of v3, with the only notable change that I've
        dropped the PHY core patches for in_band_autoneg and I'll propose them
        independently.
      
      v1 series:
      https://www.spinics.net/lists/netdev/msg613869.html
      
      RFC v2 series:
      https://www.spinics.net/lists/netdev/msg620128.html
      
      v3 series:
      https://www.spinics.net/lists/netdev/msg622060.html
      
      v4 series:
      https://www.spinics.net/lists/netdev/msg622606.html
      
      [0]: https://www.spinics.net/lists/netdev/msg613869.html
      [1]: https://patents.google.com/patent/US7356047B1/en
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df2c2ba8
    • Vladimir Oltean's avatar
      net: dsa: felix: Add PCS operations for PHYLINK · bdeced75
      Vladimir Oltean authored
      Layerscape SoCs traditionally expose the SerDes configuration/status for
      Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
      format that is compatible with clause 22 or clause 45 (depending on
      SerDes protocol). Each MAC has its own internal MDIO bus on which there
      is one or more of these PCS's, responding to commands at a configurable
      PHY address. The per-port internal MDIO bus (which is just for PCSs) is
      totally separate and has nothing to do with the dedicated external MDIO
      controller (which is just for PHYs), but the register map for the MDIO
      controller is the same.
      
      The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
      in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
      controller driver, so Felix has been made to depend on it in Kconfig.
      
       +------------------------------------------------------------------------+
       |                   +--------+ GMII (typically disabled via RCW)         |
       | ENETC PCI         |  ENETC |--------------------------+                |
       | Root Complex      | port 3 |-----------------------+  |                |
       | Integrated        +--------+                       |  |                |
       | Endpoint                                           |  |                |
       |                   +--------+ 2.5G GMII             |  |                |
       |                   |  ENETC |--------------+        |  |                |
       |                   | port 2 |-----------+  |        |  |                |
       |                   +--------+           |  |        |  |                |
       |                                     +--------+  +--------+             |
       |                                     |  Felix |  |  Felix |             |
       |                                     | port 4 |  | port 5 |             |
       |                                     +--------+  +--------+             |
       |                                                                        |
       | +--------+  +--------+  +--------+  +--------+  +--------+  +--------+ |
       | |  ENETC |  |  ENETC |  |  Felix |  |  Felix |  |  Felix |  |  Felix | |
       | | port 0 |  | port 1 |  | port 0 |  | port 1 |  | port 2 |  | port 3 | |
       +------------------------------------------------------------------------+
       |    ||||  SerDes |          ||||        ||||        ||||        ||||    |
       | +--------+block |       +--------------------------------------------+ |
       | |  ENETC |      |       |       ENETC port 2 internal MDIO bus       | |
       | | port 0 |      |       |  PCS         PCS          PCS        PCS   | |
       | |   PCS  |      |       |   0           1            2          3    | |
       +-----------------|------------------------------------------------------+
              v          v           v           v            v          v
           SGMII/      RGMII    QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
          USXGMII/   (bypasses
        1000Base-X/   SerDes)
        2500Base-X
      
      In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
      the ENETC root complex, and has 2 BARs:
      - BAR 4: the switch's effective registers
      - BAR 0: the MDIO controller register map lended from ENETC port 2
               (PF2), for accessing its associated PCS's.
      
      This explanation is necessary because the patch does some renaming
      "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
      a bit obtuse.
      
      The fact that the internal MDIO bus is "borrowed" is relevant because
      the register map is found in PF5 (the switch) but it triggers an access
      fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
      in any way (and I don't think it can be treated).
      
      All of this is so SoC-specific, that it was contained as much as
      possible in the platform-integration file felix_vsc9959.c.
      
      We need to parse and pre-validate the device tree because of 2 reasons:
      - The PHY mode (SerDes protocol) cannot change at runtime due to SoC
        design.
      - There is a circular dependency in that we need to know what clause the
        PCS speaks in order to find it on the internal MDIO bus. But the
        clause of the PCS depends on what phy-mode it is configured for.
      
      The goal of this patch is to make steps towards removing the bootloader
      dependency for SGMII PCS pre-configuration, as well as to add support
      for monitoring the in-band SGMII AN between the PCS and the system-side
      link partner (PHY or other MAC).
      
      In practice the bootloader dependency is not completely removed. U-Boot
      pre-programs the PHY address at which each PCS can be found on the
      internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
      port has the same out-of-reset PHY address of zero. The SerDes register
      for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
      of the ENETC PCI BARs) and therefore inaccessible to us from here.
      
      Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
      their respective SoCs, and for that reason Felix does not use the Ocelot
      core library for PHYLINK. On one hand we don't want to impose the
      fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
      need to force the MAC link speed the way Ocelot does, since the MAC is
      connected to the PCS through a fixed GMII, and the PCS is the one who
      does the rate adaptation at lower link speeds, which the MAC does not
      even need to know about. In fact changing the GMII speed for Felix
      irrecoverably breaks transmission through that port until a reset.
      
      The pair with ENETC port 3 and Felix port 5 is optional and doesn't
      support tagging. When we enable it, swp5 is a regular slave port, albeit
      an internal one. The trouble is that it doesn't work, and that is
      because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
      ports. So that is yet another reason for wanting to convert Felix to the
      native PHYLINK API.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bdeced75
    • Vladimir Oltean's avatar
      net: mscc: ocelot: export ANA, DEV and QSYS registers to include/soc/mscc · 964ee5c8
      Vladimir Oltean authored
      Since the Felix DSA driver is implementing its own PHYLINK instance due
      to SoC differences, it needs access to the few registers that are
      common, mainly for flow control.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      964ee5c8
    • Vladimir Oltean's avatar
      net: mscc: ocelot: make phy_mode a member of the common struct ocelot_port · ee50d07c
      Vladimir Oltean authored
      The Ocelot switchdev driver and the Felix DSA one need it for different
      reasons. Felix (or at least the VSC9959 instantiation in NXP LS1028A) is
      integrated with the traditional NXP Layerscape PCS design which does not
      support runtime configuration of SerDes protocol. So it needs to
      pre-validate the phy-mode from the device tree and prevent PHYLINK from
      attempting to change it. For this, it needs to cache it in a private
      variable.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee50d07c
    • Vladimir Oltean's avatar
      enetc: Set MDIO_CFG_HOLD to the recommended value of 2 · d79d3032
      Vladimir Oltean authored
      This increases the MDIO hold time to 5 enet_clk cycles from the previous
      value of 0. This is actually the out-of-reset value, that the driver was
      previously overwriting with 0. Zero worked for the external MDIO, but
      breaks communication with the internal MDIO buses on which the PCS of
      ENETC SI's and Felix switch are found.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d79d3032
    • Claudiu Manoil's avatar
      enetc: Make MDIO accessors more generic and export to include/linux/fsl · 6517798d
      Claudiu Manoil authored
      Within the LS1028A SoC, the register map for the ENETC MDIO controller
      is instantiated a few times: for the central (external) MDIO controller,
      for the internal bus of each standalone ENETC port, and for the internal
      bus of the Felix switch.
      
      Refactoring is needed to support multiple MDIO buses from multiple
      drivers. The enetc_hw structure is made an opaque type and a smaller
      enetc_mdio_priv is created.
      
      'mdio_base' - MDIO registers base address - is being parameterized, to
      be able to work with different MDIO register bases.
      
      The ENETC MDIO bus operations are exported from the fsl-enetc-mdio
      kernel object, the same that registers the central MDIO controller (the
      dedicated PF). The ENETC main driver has been changed to select it, and
      use its exported helpers to further register its private MDIO bus. The
      DSA Felix driver will do the same.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6517798d
    • Vladimir Oltean's avatar
      net: dsa: Pass pcs_poll flag from driver to PHYLINK · 787cac3f
      Vladimir Oltean authored
      The DSA drivers that implement .phylink_mac_link_state should normally
      register an interrupt for the PCS, from which they should call
      phylink_mac_change(). However not all switches implement this, and those
      who don't should set this flag in dsa_switch in the .setup callback, so
      that PHYLINK will poll for a few ms until the in-band AN link timer
      expires and the PCS state settles.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      787cac3f
    • Vladimir Oltean's avatar
      net: phylink: add support for polling MAC PCS · 1511ed0a
      Vladimir Oltean authored
      Some MAC PCS blocks are unable to provide interrupts when their status
      changes. As we already have support in phylink for polling status, use
      this to provide a hook for MACs to enable polling mode.
      
      The patch idea was picked up from Russell King's suggestion on the macb
      phylink patch thread here [0] but the implementation was changed.
      Instead of introducing a new phylink_start_poll() function, which would
      make the implementation cumbersome for common PHYLINK implementations
      for multiple types of devices, like DSA, just add a boolean property to
      the phylink_config structure, which is just as backwards-compatible.
      
      https://lkml.org/lkml/2019/12/16/603Suggested-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1511ed0a
    • Vladimir Oltean's avatar
      net: phylink: make QSGMII a valid PHY mode for in-band AN · 3a68ba6f
      Vladimir Oltean authored
      QSGMII is a SerDes protocol clocked at 5 Gbaud (4 times higher than
      SGMII which is clocked at 1.25 Gbaud), with the same 8b/10b encoding and
      some extra symbols for synchronization. Logically it offers 4 SGMII
      interfaces multiplexed onto the same physical lanes. Each MAC PCS has
      its own in-band AN process with the system side of the QSGMII PHY, which
      is identical to the regular SGMII AN process.
      
      So allow QSGMII as a valid in-band AN mode, since it is no different
      from software perspective from regular SGMII.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a68ba6f
    • Vladimir Oltean's avatar
      mii: Add helpers for parsing SGMII auto-negotiation · 6c930994
      Vladimir Oltean authored
      Typically a MAC PCS auto-configures itself after it receives the
      negotiated copper-side link settings from the PHY, but some MAC devices
      are more special and need manual interpretation of the SGMII AN result.
      
      In other cases, the PCS exposes the entire tx_config_reg base page as it
      is transmitted on the wire during auto-negotiation, so it makes sense to
      be able to decode the equivalent lp_advertised bit mask from the raw u16
      (of course, "lp" considering the PCS to be the local PHY).
      
      Therefore, add the bit definitions for the SGMII registers 4 and 5
      (local device ability, link partner ability), as well as a link_mode
      conversion helper that can be used to feed the AN results into
      phy_resolve_aneg_linkmode.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c930994