1. 06 Jun, 2017 13 commits
    • David S. Miller's avatar
      Merge tag 'rxrpc-rewrite-20170606' of... · bb363140
      David S. Miller authored
      Merge tag 'rxrpc-rewrite-20170606' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
      
      David Howells says:
      
      ====================
      rxrpc: Support service upgrade
      
      Here's a set of patches that allow AF_RXRPC to support the AuriStor service
      upgrade facility.  This allows the server to change the service ID
      requested to an upgraded service if the client requests it upon the
      initiation of a connection.
      
      This is used by the AuriStor AFS-compatible servers to implement IPv6
      handling and improved facilities by providing improved volume location,
      volume, protection, file and cache management services.  Note that certain
      parts of the AFS protocol carry hard-coded IPv4 addresses.
      
      The reason AuriStor does it this way is that probing the improved service
      ID first will not incur an ABORT or any other response on some servers if
      the server is not listening on it - and so one have to employ a timeout.
      
      This is implemented in the server by allowing an AF_RXRPC server to call
      bind() twice on a socket to allow it to listen on two service IDs and then
      call setsockopt() to instruct the server to upgrade one into the other if
      the client requests it (by setting userStatus to 1 on the first DATA packet
      on a connection).  If the upgrade occurs, all further operations on that
      connection are done with the new service ID.  AF_RXRPC has to handle this
      automatically as connections are not exposed to userspace.
      
      Clients can request this facility by setting an RXRPC_UPGRADE_SERVICE
      command in the sendmsg() control buffer and then observing the resultant
      service ID in the msg_addr returned by recvmsg().  This should only be used
      to probe the service.  Clients should then use the returned service ID in
      all subsequent communications with that server.  Note that the kernel will
      not retain this information should the connection expire from its cache.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb363140
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 25f41150
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      1GbE Intel Wired LAN Driver Updates 2017-06-06
      
      This series contains updates and fixes to e1000e and igb.
      
      Matwey V Kornilov fixes an issue where igb_get_phy_id_82575() relies on
      the fact that page 0 is already selected, but this is not the case after
      igb_read_phy_reg_gs40g()/igb_write_phy_reg_gs40g() were removed in a
      previous commit.  This leads to initialization failure and some devices
      not working.  To fix the issue, explicitly select page 0 before first
      access to PHY registers.
      
      Arnd Bergmann modifies the driver to avoid a "defined but not used"
      warning by removing #ifdefs and using __maybe_unused annotation instead
      for new power management functions.
      
      Jake provides most of the changes in the series, all around PTP and
      timestamp fixes/updates.  Resolved several race conditions based on
      the hardware can only handle one transmit timestamp at a time, so
      fix the locking logic, as well as create a statistic for "skipped"
      timestamps to help administrators identify issues.
      
      Benjamin Poirier provides 2 changes, first to igb to remove the
      second argument to igb_update_stats() since it always passes the
      same two arguments.  So instead of having to pass the second argument,
      just update the function to the necessary information from the adapter
      structure.  Second modifies the e1000e_get_stats64() call to
      dev_get_stats() to avoid ethtool garbage being reported.
      
      Konstantin Khlebnikov modifies e1000e to use disable_hardirq(), instead
      of disable_irq() for MSIx vectors in e1000_netpoll().
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25f41150
    • Konstantin Khlebnikov's avatar
      e1000e: use disable_hardirq() also for MSIX vectors in e1000_netpoll() · fd8e597b
      Konstantin Khlebnikov authored
      Replace disable_irq() which waits for threaded irq handlers with
      disable_hardirq() which waits only for hardirq part.
      
      Fixes: 31119129 ("e1000: use disable_hardirq() for e1000_netpoll()")
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      fd8e597b
    • Benjamin Poirier's avatar
      e1000e: Don't return uninitialized stats · 24ad2a92
      Benjamin Poirier authored
      Some statistics passed to ethtool are garbage because e1000e_get_stats64()
      doesn't write them, for example: tx_heartbeat_errors. This leaks kernel
      memory to userspace and confuses users.
      
      Do like ixgbe and use dev_get_stats() which first zeroes out
      rtnl_link_stats64.
      
      Fixes: 5944701d ("net: remove useless memset's in drivers get_stats64")
      Reported-by: default avatarStefan Priebe <s.priebe@profihost.ag>
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@suse.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      24ad2a92
    • Benjamin Poirier's avatar
      igb: Remove useless argument · 81e3f64a
      Benjamin Poirier authored
      Given that all callers of igb_update_stats() pass the same two arguments:
      (adapter, &adapter->stats64), the second argument can be removed.
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@suse.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      81e3f64a
    • Jacob Keller's avatar
      igb: check for Tx timestamp timeouts during watchdog · e5f36ad1
      Jacob Keller authored
      The igb driver has logic to handle only one Tx timestamp at a time,
      using a state bit lock to avoid multiple requests at once.
      
      It may be possible, if incredibly unlikely, that a Tx timestamp event is
      requested but never completes. Since we use an interrupt scheme to
      determine when the Tx timestamp occurred we would never clear the state
      bit in this case.
      
      Add an igb_ptp_tx_hang() function similar to the already existing
      igb_ptp_rx_hang() function. This function runs in the watchdog routine
      and makes sure we eventually recover from this case instead of
      permanently disabling Tx timestamps.
      
      Note: there is no currently known way to cause this without hacking the
      driver code to force it.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      e5f36ad1
    • Jacob Keller's avatar
      igb: add statistic indicating number of skipped Tx timestamps · c3b8f85e
      Jacob Keller authored
      The igb driver can only handle one Tx timestamp request at a time.
      This means it is possible for an application timestamp request to be
      ignored.
      
      There is no easy way for an administrator to determine if this occurred.
      Add a new statistic which tracks this, tx_hwtstamp_skipped.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c3b8f85e
    • Jacob Keller's avatar
      e1000e: add statistic indicating number of skipped Tx timestamps · cff57141
      Jacob Keller authored
      The e1000e driver can only handle one Tx timestamp request at a time.
      This means it is possible for an application timestamp request to be
      ignored.
      
      There is no easy way for an administrator to determine if this occurred.
      Add a new statistic which tracks this, tx_hwtstamp_skipped.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      cff57141
    • Jacob Keller's avatar
      igb: avoid permanent lock of *_PTP_TX_IN_PROGRESS · 74344e32
      Jacob Keller authored
      The igb driver uses a state bit lock to avoid handling more than one Tx
      timestamp request at once. This is required because hardware is limited
      to a single set of registers for Tx timestamps.
      
      The state bit lock is not properly cleaned up during
      igb_xmit_frame_ring() if the transmit fails such as due to DMA or TSO
      failure. In some hardware this results in blocking timestamps until the
      service task times out. In other hardware this results in a permanent
      lock of the timestamp bit because we never receive an interrupt
      indicating the timestamp occurred, since indeed the packet was never
      transmitted.
      
      Fix this by checking for DMA and TSO errors in igb_xmit_frame_ring() and
      properly cleaning up after ourselves when these occur.
      Reported-by: default avatarReported-by: David Mirabito <davidm@metamako.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      74344e32
    • Jacob Keller's avatar
      igb: fix race condition with PTP_TX_IN_PROGRESS bits · 4ccdc013
      Jacob Keller authored
      Hardware related to the igb driver has a limitation of only handling one
      Tx timestamp at a time. Thus, the driver uses a state bit lock to
      enforce that only one timestamp request is honored at a time.
      
      Unfortunately this suffers from a simple race condition. The bit lock is
      not cleared until after skb_tstamp_tx() is called notifying the stack of
      a new Tx timestamp. Even a well behaved application which sends only one
      timestamp request at once and waits for a response might wake up and
      send a new packet before the bit lock is cleared. This results in
      needlessly dropping some Tx timestamp requests.
      
      We can fix this by unlocking the state bit as soon as we read the
      Timestamp register, as this is the first point at which it is safe to
      unlock.
      
      To avoid issues with the skb pointer, we'll use a copy of the pointer
      and set the global variable in the driver structure to NULL first. This
      ensures that the next timestamp request does not modify our local copy
      of the skb pointer.
      
      This ensures that well behaved applications do not accidentally race
      with the unlock bit. Obviously an application which sends multiple Tx
      timestamp requests at once will still only timestamp one packet at
      a time. Unfortunately there is nothing we can do about this.
      Reported-by: default avatarDavid Mirabito <davidm@metamako.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4ccdc013
    • Jacob Keller's avatar
      e1000e: fix race condition around skb_tstamp_tx() · 5012863b
      Jacob Keller authored
      The e1000e driver and related hardware has a limitation on Tx PTP
      packets which requires we limit to timestamping a single packet at once.
      We do this by verifying that we never request a new Tx timestamp while
      we still have a tx_hwtstamp_skb pointer.
      
      Unfortunately the driver suffers from a race condition around this. The
      tx_hwtstamp_skb pointer is not set to NULL until after skb_tstamp_tx()
      is called. This function notifies the stack and applications of a new
      timestamp. Even a well behaved application that only sends a new request
      when the first one is finished might be woken up and possibly send
      a packet before we can free the timestamp in the driver again. The
      result is that we needlessly ignore some Tx timestamp requests in this
      corner case.
      
      Fix this by assigning the tx_hwtstamp_skb pointer prior to calling
      skb_tstamp_tx() and use a temporary pointer to hold the timestamped skb
      until that function finishes. This ensures that the application is not
      woken up until the driver is ready to begin timestamping a new packet.
      
      This ensures that well behaved applications do not accidentally race
      with condition to skip Tx timestamps. Obviously an application which
      sends multiple Tx timestamp requests at once will still only timestamp
      one packet at a time. Unfortunately there is nothing we can do about
      this.
      Reported-by: default avatarDavid Mirabito <davidm@metamako.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      5012863b
    • Arnd Bergmann's avatar
      igb: mark PM functions as __maybe_unused · 000ba1f2
      Arnd Bergmann authored
      The new wake function is only used by the suspend/resume handlers that
      are defined in inside of an #ifdef, which can cause this harmless
      warning:
      
      drivers/net/ethernet/intel/igb/igb_main.c:7988:13: warning: 'igb_deliver_wake_packet' defined but not used [-Wunused-function]
      
      Removing the #ifdef, instead using a __maybe_unused annotation
      simplifies the code and avoids the warning.
      
      Fixes: b90fa876 ("igb: Enable reading of wake up packet")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      000ba1f2
    • Matwey V Kornilov's avatar
      igb: Explicitly select page 0 at initialization · 440aeca4
      Matwey V Kornilov authored
      The functions igb_read_phy_reg_gs40g/igb_write_phy_reg_gs40g (which were
      removed in 2a3cdead) explicitly selected the required page at every phy_reg
      access. Currently, igb_get_phy_id_82575 relays on the fact that page 0 is
      already selected. The assumption is not fulfilled for my Lex 3I380CW
      motherboard with integrated dual i211 based gigabit ethernet. This leads to igb
      initialization failure and network interfaces are not working:
      
          igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
          igb: Copyright (c) 2007-2014 Intel Corporation.
          igb: probe of 0000:01:00.0 failed with error -2
          igb: probe of 0000:02:00.0 failed with error -2
      
      In order to fix it, we explicitly select page 0 before first access to phy
      registers.
      
      See also: https://bugzilla.suse.com/show_bug.cgi?id=1009911
      See also: http://www.lex.com.tw/products/pdf/3I380A&3I380CW.pdf
      
      Fixes: 2a3cdead ("igb: Remove GS40G specific defines/functions")
      Cc: <stable@vger.kernel.org> # 4.5+
      Signed-off-by: default avatarMatwey V Kornilov <matwey@sai.msu.ru>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      440aeca4
  2. 05 Jun, 2017 27 commits
    • Colin Ian King's avatar
      mdio: mux: fix an incorrect less than zero error check using a u32 · 9d15e5cc
      Colin Ian King authored
      The u32 variable v is being checked to see if an error return is
      less than zero and this check has no effect because it is unsigned.
      Fix this by making v and int (this also matches the type of
      cb->bus_number which is assigned to the value in v).
      
      Detected by CoverityScan, CID#1440454 ("Unsigned compared against zero")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d15e5cc
    • Icenowy Zheng's avatar
      net-next: stmmac: dwmac-sun8i: ensure the EPHY is properly reseted · 2f878491
      Icenowy Zheng authored
      The EPHY may be already enabled by bootloaders which have Ethernet
      capability (e.g. current U-Boot). Thus it should be reseted properly
      before doing the enabling sequence in the dwmac-sun8i driver, otherwise
      the EMAC reset process may fail if no cable is plugged, and then fail
      the dwmac-sun8i probing.
      
      Tested on Orange Pi PC, One and Zero. All the boards fail to have
      dwmac-sun8i probed with "EMAC reset timeout" without cable plugged
      before, and with this fix they're now all able to successfully probe the
      EMAC without cable plugged and then use the connection after a cable is
      hot-plugged in.
      
      Fixes: 9f93ac8d ("net-next: stmmac: Add dwmac-sun8i")
      Signed-off-by: default avatarIcenowy Zheng <icenowy@aosc.io>
      Tested-by: default avatarCorentin Labbe <clabbe.montjoie@gmail.com>
      Acked-by: default avatarCorentin Labbe <clabbe.montjoie@gmail.com>
      Reviewed-by: default avatarCorentin Labbe <clabbe.montjoie@gmail.com>
      Acked-by: is not as formal as Signed-off-by:.  It is a record that the acker
      Reviewed-by: is similar.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f878491
    • yuval.shaia@oracle.com's avatar
      net/3com: Make el3_netdev_get_ecmd return void · 697dae1e
      yuval.shaia@oracle.com authored
      Make return value void since function never returns meaningfull value.
      Signed-off-by: default avatarYuval Shaia <yuval.shaia@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      697dae1e
    • yuval.shaia@oracle.com's avatar
      net/{mii, smsc}: Make mii_ethtool_get_link_ksettings and smc_netdev_get_ecmd return void · 82c01a84
      yuval.shaia@oracle.com authored
      Make return value void since functions never returns meaningfull value.
      Signed-off-by: default avatarYuval Shaia <yuval.shaia@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82c01a84
    • yuval.shaia@oracle.com's avatar
      net/dec: Make __de_get_link_ksettings return void · c7c6b871
      yuval.shaia@oracle.com authored
      Make return value void since function never return meaningfull value
      Signed-off-by: default avatarYuval Shaia <yuval.shaia@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7c6b871
    • Jiri Pirko's avatar
      net: sched: select cls when cls_act is enabled · 8ec1507d
      Jiri Pirko authored
      It really makes no sense to have cls_act enabled without cls. In that
      case, the cls_act code is dead. So select it.
      
      This also fixes an issue recently reported by kbuild robot:
      [linux-next:master 1326/4151] net/sched/act_api.c:37:18: error: implicit declaration of function 'tcf_chain_get'
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Fixes: db50514f ("net: sched: add termination action to allow goto chain")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ec1507d
    • Rosen, Rami's avatar
      genetlink: remove ops_list from genetlink header. · 4e2ec436
      Rosen, Rami authored
      commit d91824c0 ("genetlink: register family ops as array") removed the
      ops_list member from both genl_family and genl_ops; while the
      documentation of genl_family was updated accordingly by this patch,
      ops_list remained in the documentation of the genl_ops object.
      This patch fixes it by removing ops_list from genl_ops documentation.
      Signed-off-by: default avatarRami Rosen <rami.rosen@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e2ec436
    • David Howells's avatar
      rxrpc: Add service upgrade support for client connections · 4e255721
      David Howells authored
      Make it possible for a client to use AuriStor's service upgrade facility.
      
      The client does this by adding an RXRPC_UPGRADE_SERVICE control message to
      the first sendmsg() of a call.  This takes no parameters.
      
      When recvmsg() starts returning data from the call, the service ID field in
      the returned msg_name will reflect the result of the upgrade attempt.  If
      the upgrade was ignored, srx_service will match what was set in the
      sendmsg(); if the upgrade happened the srx_service will be altered to
      indicate the service the server upgraded to.
      
      Note that:
      
       (1) The choice of upgrade service is up to the server
      
       (2) Further client calls to the same server that would share a connection
           are blocked if an upgrade probe is in progress.
      
       (3) This should only be used to probe the service.  Clients should then
           use the returned service ID in all subsequent communications with that
           server (and not set the upgrade).  Note that the kernel will not
           retain this information should the connection expire from its cache.
      
       (4) If a server that supports upgrading is replaced by one that doesn't,
           whilst a connection is live, and if the replacement is running, say,
           OpenAFS 1.6.4 or older or an older IBM AFS, then the replacement
           server will not respond to packets sent to the upgraded connection.
      
           At this point, calls will time out and the server must be reprobed.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      4e255721
    • David Howells's avatar
      rxrpc: Implement service upgrade · 4722974d
      David Howells authored
      Implement AuriStor's service upgrade facility.  There are three problems
      that this is meant to deal with:
      
       (1) Various of the standard AFS RPC calls have IPv4 addresses in their
           requests and/or replies - but there's no room for including IPv6
           addresses.
      
       (2) Definition of IPv6-specific RPC operations in the standard operation
           sets has not yet been achieved.
      
       (3) One could envision the creation a new service on the same port that as
           the original service.  The new service could implement improved
           operations - and the client could try this first, falling back to the
           original service if it's not there.
      
           Unfortunately, certain servers ignore packets addressed to a service
           they don't implement and don't respond in any way - not even with an
           ABORT.  This means that the client must then wait for the call timeout
           to occur.
      
      What service upgrade does is to see if the connection is marked as being
      'upgradeable' and if so, change the service ID in the server and thus the
      request and reply formats.  Note that the upgrade isn't mandatory - a
      server that supports only the original call set will ignore the upgrade
      request.
      
      In the protocol, the procedure is then as follows:
      
       (1) To request an upgrade, the first DATA packet in a new connection must
           have the userStatus set to 1 (this is normally 0).  The userStatus
           value is normally ignored by the server.
      
       (2) If the server doesn't support upgrading, the reply packets will
           contain the same service ID as for the first request packet.
      
       (3) If the server does support upgrading, all future reply packets on that
           connection will contain the new service ID and the new service ID will
           be applied to *all* further calls on that connection as well.
      
       (4) The RPC op used to probe the upgrade must take the same request data
           as the shadow call in the upgrade set (but may return a different
           reply).  GetCapability RPC ops were added to all standard sets for
           just this purpose.  Ops where the request formats differ cannot be
           used for probing.
      
       (5) The client must wait for completion of the probe before sending any
           further RPC ops to the same destination.  It should then use the
           service ID that recvmsg() reported back in all future calls.
      
       (6) The shadow service must have call definitions for all the operation
           IDs defined by the original service.
      
      
      To support service upgrading, a server should:
      
       (1) Call bind() twice on its AF_RXRPC socket before calling listen().
           Each bind() should supply a different service ID, but the transport
           addresses must be the same.  This allows the server to receive
           requests with either service ID.
      
       (2) Enable automatic upgrading by calling setsockopt(), specifying
           RXRPC_UPGRADEABLE_SERVICE and passing in a two-member array of
           unsigned shorts as the argument:
      
      	unsigned short optval[2];
      
           This specifies a pair of service IDs.  They must be different and must
           match the service IDs bound to the socket.  Member 0 is the service ID
           to upgrade from and member 1 is the service ID to upgrade to.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      4722974d
    • David Howells's avatar
      rxrpc: Permit multiple service binding · 28036f44
      David Howells authored
      Permit bind() to be called on an AF_RXRPC socket more than once (currently
      maximum twice) to bind multiple listening services to it.  There are some
      restrictions:
      
       (1) All bind() calls involved must have a non-zero service ID.
      
       (2) The service IDs must all be different.
      
       (3) The rest of the address (notably the transport part) must be the same
           in all (a single UDP socket is shared).
      
       (4) This must be done before listen() or sendmsg() is called.
      
      This allows someone to connect to the service socket with different service
      IDs and lays the foundation for service upgrading.
      
      The service ID used by an incoming call can be extracted from the msg_name
      returned by recvmsg().
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      28036f44
    • David Howells's avatar
      rxrpc: Separate the connection's protocol service ID from the lookup ID · 68d6d1ae
      David Howells authored
      Keep the rxrpc_connection struct's idea of the service ID that is exposed
      in the protocol separate from the service ID that's used as a lookup key.
      
      This allows the protocol service ID on a client connection to get upgraded
      without making the connection unfindable for other client calls that also
      would like to use the upgraded connection.
      
      The connection's actual service ID is then returned through recvmsg() by
      way of msg_name.
      
      Whilst we're at it, we get rid of the last_service_id field from each
      channel.  The service ID is per-connection, not per-call and an entire
      connection is upgraded in one go.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      68d6d1ae
    • David S. Miller's avatar
      Merge branch 'mlxsw-Minor-cleanup' · aae1a2ce
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      mlxsw: Minor cleanup
      
      Fix small issues I noticed during the refactoring.
      
      First patch adds file name comments in the header file to make it clear
      what goes where. Second patch fixes a typo and third patch simply aligns
      RIF index allocation with similar allocations in the driver.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aae1a2ce
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Align RIF index allocation with existing code · de5ed99e
      Ido Schimmel authored
      The way we usually allocate an index is by letting the allocation
      function return an error instead of an invalid index.
      
      Do the same for RIF index.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de5ed99e
    • Ido Schimmel's avatar
      da0abcf9
    • Ido Schimmel's avatar
      mlxsw: spectrum: Tidy up header file · cb4cc0e0
      Ido Schimmel authored
      Make it clear where functions are defined and move misplaced declaration
      to their correct place.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb4cc0e0
    • Yotam Gigi's avatar
      mlxsw: spectrum: Rename the firmware file · a4e1ce24
      Yotam Gigi authored
      Change the firmware file name to be in "mellanox" directory.
      
      This commit is a followup to the linux-firmware commit a4c72696f5f4
      ("Mellanox: Add firmware for mlxsw_spectrum")
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4e1ce24
    • David S. Miller's avatar
      Merge branch 'qed-vf-xdp' · fc85c910
      David S. Miller authored
      Yuval Mintz says:
      
      qed*: Support VF XDP attachment
      
      ====================
      Each driver queue [Rx, Tx, XDP-forwarding] requires an allocated HW/FW
      connection + configured queue-zone.
      
      VF handling by the PF has several limitations that prevented adding the
      capability to perform XDP at driver-level:
      
       - The VF assumes there's 1-to-1 correspondance between the VF queue and
         the used connection, meaning q<x> is always going to use cid<x>,
         whereas for its own queues the PF is acquiring a new cid per each new
         queue.
      
       - There's a 1-to-1 correspondate between the VF-queues and the HW queue
         zones. While this is necessary for Rx-queues [as the queue-zone
         contains the producer], transmission queues can share the underlaying
         queue-zone [only shared configuration is coalescing].
         But all VF<->PF communication mechanisms assume there's a single
         identifier that identify a queue [as queue-zone == queue], while
         sharing queue-zones requires passing additional information.
      
       - VFs currently don't try mapping a doorbell bar - there's a small
         doorbell window in the regview allowing VFs to doorbell up to 16
         connections; but this window isn's wide enough for the added XDP
         forwarding queues.
      
      This series is going to add the necessary infrastrucutre to finally let
      our VFs support XDP assuming both the PF and VF drivers are sufficiently
      new [Legacy support would be retained both for older VFs and older PFs,
      but both will be needed for this new support to work].
      Basically, the various database driver maintains for its queue-cids
      would be revised, and queue-cids would be identified using the
      (queue-zone, unique index) pair. The TLV mechanism would then be
      extended to allow VFs to communicate that unique-index as well as the
      already provided queue-zone. Finally, the VFs would try to map their
      doorbell bar and inform their PF that they're using it.
      
      Almost all the changes are in qed, with exception of #3 [which does some
      cleanup in qede as well] and #11 that actually enables the feature.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc85c910
    • Mintz, Yuval's avatar
      qede: VF XDP support · e7b80dec
      Mintz, Yuval authored
      This introduces 2 changes needed for XDP to be supported for VFs:
      
       a. On VF-side, publish the NDO based on qed outputs
      
       b. On PF-side, request qed to allocate sufficient cids per-VF
          to allow the child vfs to support it
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7b80dec
    • Mintz, Yuval's avatar
      qed: VF XDP support · cbb8a12c
      Mintz, Yuval authored
      The final addition on the qed front -
       - VFs would now require their PFs to provide multiple CIDs
       - Based on the availability of connections from PF, determine whether
         XDP is feasible and share it with qede via dev_info.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cbb8a12c
    • Mintz, Yuval's avatar
      qed: VFs to try utilizing the doorbell bar · 1a850bfc
      Mintz, Yuval authored
      VFs are currently not mapping their doorbell bar, instead relying
      on the small doorbell window they have in their limited regview bar.
      
      In order to increase the number of possible Tx connections [queues]
      employeed by VF past 16, we need to start using the doorbell bar if
      one such is exposed - VF would communicate this fact to PF which would
      return the size-bar internally configured into chip, according to
      which the VF would decide whether to actually utilize the doorbell
      bar.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a850bfc
    • Mintz, Yuval's avatar
      qed: Multiple qzone queues for VFs · 08bc8f15
      Mintz, Yuval authored
      This adds the infrastructure for supporting VFs that want to open
      multiple transmission queues on the same queue-zone.
      At this point, there are no VFs that actually request this functionality,
      but later patches would remedy that.
      
       a. VF and PF would communicate the capability during ACQUIRE;
          Legacy VFs would continue on behaving as they do today
      
       b. PF would communicate number of supported CIDs to the VF
          and would enforce said limitation
      
       c. Whenever VF passes a request for a given queue configuration
          it would also pass an associated index within said queue-zone
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      08bc8f15
    • Mintz, Yuval's avatar
      qed: IOV db support multiple queues per qzone · 007bc371
      Mintz, Yuval authored
      Allow the infrastructure a PF maintains for each one of its VFs
      to support multiple queue-cids on a single queue-zone.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      007bc371
    • Mintz, Yuval's avatar
      qed: Make VF legacy a bitfield · 3b19f478
      Mintz, Yuval authored
      Until now we used to have a single VF legacy compatibility mode,
      one that affected the place of the Rx producers of those VFs [mostly].
      
      As PF would soon support allocating CIDs for VFs instead of having
      a static CID<->queue configuration for them, we'll need to have
      an additional legacy mode since existing VFs would need to continue
      on using the older mode of operation.
      
      Change the infrastrucutre so that the legacy would be able to indicate
      which of the legacy behaviors is needed for a given VF.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b19f478
    • Mintz, Yuval's avatar
      qed: Assign a unique per-queue index to queue-cid · bbe3f233
      Mintz, Yuval authored
      When a queue-cid is allocated, assign an index inside that's
      CID's queue-zone.
      
      For PFs and VFS, this number is going to be unique and derive
      from a per-queue-zone bitmap, while for PF's VFs queues the
      number is currently going to constant; Later, we'd add the
      capability of a VF to communicate such an index to its PF.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbe3f233
    • Mintz, Yuval's avatar
      qed: Pass vf_params when creating a queue-cid · 3946497a
      Mintz, Yuval authored
      We're going to need additional information for queue-cids
      that a PF creates for its VFs, so start by refactoring existing
      logic used for initializing said struct into receiving a structure
      encapsulating the VF-specific information that needs to be provided.
      
      This also introduces QED_QUEUE_CID_SELF - each queue-cid would hold
      an indication to whether it belongs to the hw-function holding it
      [whether that's a PF or a VF], or else what's the VF id it belongs
      to.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3946497a
    • Mintz, Yuval's avatar
      qed*: L2 interface to use the SB structures directly · f604b17d
      Mintz, Yuval authored
      Part of an effort of a cleaner seperation between qed and the protocol
      drivers, the L2 interface is to use the SB structure for initialization
      purposes opaquely.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f604b17d
    • Mintz, Yuval's avatar
      qed: Create L2 queue database · 0db711bb
      Mintz, Yuval authored
      First step in allowing a single PF/VF to open multiple queues on
      the same queue zone is to add per-hwfn database of queue-cids
      as a two-dimensional array where entry would be according to
      [queue zone][internal index].
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0db711bb