1. 18 Nov, 2016 27 commits
  2. 17 Nov, 2016 13 commits
    • David S. Miller's avatar
      Merge branch 'rds-ha-failover-fixes' · fcd2b0da
      David S. Miller authored
      Sowmini Varadhan says:
      
      ====================
      RDS: TCP: HA/Failover fixes
      
      This series contains a set of fixes for bugs exposed when
      we ran the following in a loop between a test machine pair:
      
       while (1); do
         # modprobe rds-tcp on test nodes
         # run rds-stress in bi-dir mode between test machine pair
         # modprobe -r rds-tcp on test nodes
       done
      
      rds-stress in bi-dir mode will cause both nodes to initiate
      RDS-TCP connections at almost the same instant, exposing the
      bugs fixed in this series.
      
      Without the fixes, rds-stress reports sporadic packet drops,
      and packets arriving out of sequence. After the fixes,we have
      been able to run the  test overnight, without any issues.
      
      Each patch has a detailed description of the root-cause fixed
      by the patch.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fcd2b0da
    • Sowmini Varadhan's avatar
      RDS: TCP: Force every connection to be initiated by numerically smaller IP address · 1a0e100f
      Sowmini Varadhan authored
      When 2 RDS peers initiate an RDS-TCP connection simultaneously,
      there is a potential for "duelling syns" on either/both sides.
      See commit 241b2719 ("RDS-TCP: Reset tcp callbacks if re-using an
      outgoing socket in rds_tcp_accept_one()") for a description of this
      condition, and the arbitration logic which ensures that the
      numerically large IP address in the TCP connection is bound to the
      RDS_TCP_PORT ("canonical ordering").
      
      The rds_connection should not be marked as RDS_CONN_UP until the
      arbitration logic has converged for the following reason. The sender
      may start transmitting RDS datagrams as soon as RDS_CONN_UP is set,
      and since the sender removes all datagrams from the rds_connection's
      cp_retrans queue based on TCP acks. If the TCP ack was sent from
      a tcp socket that got reset as part of duel aribitration (but
      before data was delivered to the receivers RDS socket layer),
      the sender may end up prematurely freeing the datagram, and
      the datagram is no longer reliably deliverable.
      
      This patch remedies that condition by making sure that, upon
      receipt of 3WH completion state change notification of TCP_ESTABLISHED
      in rds_tcp_state_change, we mark the rds_connection as RDS_CONN_UP
      if, and only if, the IP addresses and ports for the connection are
      canonically ordered. In all other cases, rds_tcp_state_change will
      force an rds_conn_path_drop(), and rds_queue_reconnect() on
      both peers will restart the connection to ensure canonical ordering.
      
      A side-effect of enforcing this condition in rds_tcp_state_change()
      is that rds_tcp_accept_one_path() can now be refactored for simplicity.
      It is also no longer possible to encounter an RDS_CONN_UP connection in
      the arbitration logic in rds_tcp_accept_one().
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a0e100f
    • Sowmini Varadhan's avatar
      RDS: TCP: Track peer's connection generation number · 905dd418
      Sowmini Varadhan authored
      The RDS transport has to be able to distinguish between
      two types of failure events:
      (a) when the transport fails (e.g., TCP connection reset)
          but the RDS socket/connection layer on both sides stays
          the same
      (b) when the peer's RDS layer itself resets (e.g., due to module
          reload or machine reboot at the peer)
      In case (a) both sides must reconnect and continue the RDS messaging
      without any message loss or disruption to the message sequence numbers,
      and this is achieved by rds_send_path_reset().
      
      In case (b) we should reset all rds_connection state to the
      new incarnation of the peer. Examples of state that needs to
      be reset are next expected rx sequence number from, or messages to be
      retransmitted to, the new incarnation of the peer.
      
      To achieve this, the RDS handshake probe added as part of
      commit 5916e2c1 ("RDS: TCP: Enable multipath RDS for TCP")
      is enhanced so that sender and receiver of the RDS ping-probe
      will add a generation number as part of the RDS_EXTHDR_GEN_NUM
      extension header. Each peer stores local and remote generation
      numbers as part of each rds_connection. Changes in generation
      number will be detected via incoming handshake probe ping
      request or response and will allow the receiver to reset rds_connection
      state.
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      905dd418
    • Sowmini Varadhan's avatar
      RDS: TCP: set RDS_FLAG_RETRANSMITTED in cp_retrans list · 315ca6d9
      Sowmini Varadhan authored
      As noted in rds_recv_incoming() sequence numbers on data packets
      can decreas for the failover case, and the Rx path is equipped
      to recover from this, if the RDS_FLAG_RETRANSMITTED is set
      on the rds header of an incoming message with a suspect sequence
      number.
      
      The RDS_FLAG_RETRANSMITTED is predicated on the RDS_FLAG_RETRANSMITTED
      flag in the rds_message, so make sure the flag is set on messages
      queued for retransmission.
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      315ca6d9
    • LABBE Corentin's avatar
      net: stmmac: replace if (netif_msg_type) by their netif_xxx counterpart · b3e51069
      LABBE Corentin authored
      As sugested by Joe Perches, we could replace all
      if (netif_msg_type(priv)) dev_xxx(priv->devices, ...)
      by the simpler macro netif_xxx(priv, hw, priv->dev, ...)
      Signed-off-by: default avatarCorentin Labbe <clabbe.montjoie@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3e51069
    • LABBE Corentin's avatar
      net: stmmac: replace hardcoded function name by __func__ · de9a2165
      LABBE Corentin authored
      Some printing have the function name hardcoded.
      It is better to use __func__ instead.
      Signed-off-by: default avatarCorentin Labbe <clabbe.montjoie@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de9a2165
    • LABBE Corentin's avatar
      net: stmmac: replace all pr_xxx by their netdev_xxx counterpart · 38ddc59d
      LABBE Corentin authored
      The stmmac driver use lots of pr_xxx functions to print information.
      This is bad since we cannot know which device logs the information.
      (moreover if two stmmac device are present)
      
      Furthermore, it seems that it assumes wrongly that all logs will always
      be subsequent by using a dev_xxx then some indented pr_xxx like this:
      kernel: sun7i-dwmac 1c50000.ethernet: no reset control found
      kernel:  Ring mode enabled
      kernel:  No HW DMA feature register supported
      kernel:  Normal descriptors
      kernel:  TX Checksum insertion supported
      
      So this patch replace all pr_xxx by their netdev_xxx counterpart.
      Excepts for some printing where netdev "cause" unpretty output like:
      sun7i-dwmac 1c50000.ethernet (unnamed net_device) (uninitialized): no reset control found
      In those case, I keep dev_xxx.
      
      In the same time I remove some "stmmac:" print since
      this will be a duplicate with that dev_xxx displays.
      Signed-off-by: default avatarCorentin Labbe <clabbe.montjoie@gmail.com>
      Acked-by: default avatarGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38ddc59d
    • Eric Dumazet's avatar
      net_sched: sch_fq: use hash_ptr() · 29c58472
      Eric Dumazet authored
      When I wrote sch_fq.c, hash_ptr() on 64bit arches was awful,
      and I chose hash_32().
      
      Linus Torvalds and George Spelvin fixed this issue, so we can
      use hash_ptr() to get more entropy on 64bit arches with Terabytes
      of memory, and avoid the cast games.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29c58472
    • Eric Dumazet's avatar
      net/mlx5e: remove napi_hash_del() calls · d30d9ccb
      Eric Dumazet authored
      Calling napi_hash_del() after netif_napi_del() is pointless.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Acked-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d30d9ccb
    • Eric Dumazet's avatar
      net/mlx4_en: remove napi_hash_del() call · bb07fafa
      Eric Dumazet authored
      There is no need calling napi_hash_del()+synchronize_rcu() before
      calling netif_napi_del()
      
      netif_napi_del() does this already.
      
      Using napi_hash_del() in a driver is useful only when dealing with
      a batch of NAPI structures, so that a single synchronize_rcu() can
      be used. mlx4_en_deactivate_cq() is deactivating a single NAPI.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb07fafa
    • David S. Miller's avatar
      Merge branch 'mlxsw-i2c' · 6a02f5eb
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      mlxsw: Introduce support for I2C bus
      
      Vadim says:
      
      This patchset adds I2C access support for SwitchX, SwitchX2, SwitchIB,
      SwitchIB2 and Spectrum silicones.
      
      It contains:
       - Small changes in mlxsw core code, needed for I2C bus support;
       - I2C driver, which obtains I2C input/output mailboxes setting and
         provides command interface implementation.
       - Minimal driver, which works on top of I2C driver and allows running
         of mlxsw command interface over I2C bus;
      
      Use case:
      On system, which does not have PCI to ASIC (BMC), hwmon functionality
      (sensors, pwm, tacho) will be available through I2C.
      
      Usage (manual probing):
      echo mlxsw_minimal 0x48 > /sys/bus/i2c/devices/i2c-2/new_device
      
      Sysfs interface:
      /sys/bus/i2c/devices/2-0048/hwmon/hwmon5/pwm1
      /sys/bus/i2c/devices/2-0048/hwmon/hwmon5/temp1_input
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a02f5eb
    • Vadim Pasternak's avatar
      mlxsw: minimal: Add I2C support for Mellanox ASICs · d556e929
      Vadim Pasternak authored
      Add I2C access support for Mellanox ASICs:
      - Virtual Protocol Interconnect switches SwitchX, SwitchX2,
        providing InfiniBand, Ethernet and Fibre Channel connectivity;
      - Infiniband switches SwitchIB, SwitchIB2:
      - Ethernet switch Spectrum.
      
      Example of probing activation:
      echo mlxsw_minimal 0x48 > /sys/bus/i2c/devices/i2c-2/new_device
      Signed-off-by: default avatarVadim Pasternak <vadimp@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d556e929
    • Vadim Pasternak's avatar
      mlxsw: Invoke driver's init/fini methods only if defined · 4ebd00bc
      Vadim Pasternak authored
      We are going to add a minimal driver on top of the mlxsw core
      infrastructure, which will be mainly used for hardware monitoring in
      Baseboard management controller (BMC) installations.
      
      Unlike the switch drivers (e.g., spectrum, switchx2), this driver does not
      initialize the ASIC and therefore doesn't need to implement the init() and
      fini() methods in its 'mlxsw_driver' struct.
      Signed-off-by: default avatarVadim Pasternak <vadimp@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ebd00bc