Commits · 2684d1b75f215bdf521064bcbc0015dfca9156e7 · nexedi / linux

24 Apr, 2019 15 commits

net: sched: taprio: Fix taprio_peek() · 2684d1b7

Andre Guedes authored Apr 23, 2019

While traversing taprio's children qdisc list, if the gate is closed for
a given traffic class, we should continue traversing the list since the
remaining qdiscs may have skb ready for transmission.

This patch also takes this opportunity and changes the function to use
the TAPRIO_ALL_GATES_OPEN macro instead of the magic number '-1'.

Fixes: 5a781ccb (“tc: Add support for configuring the taprio scheduler”)
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2684d1b7

net: sched: taprio: Remove should_restart_cycle() · 5175aafe

Andre Guedes authored Apr 23, 2019

The 'entry' argument from should_restart_cycle() cannot be NULL since it
is already checked by the caller so the WARN_ON() within should_
restart_cycle() could be removed. By doing that, that function becomes
a dummy wrapper on list_is_last() so this patch simply gets rid of it
and call list_is_last() within advance_sched() instead.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5175aafe

net: sched: taprio: Refactor taprio_get_start_time() · 8599099f

Andre Guedes authored Apr 23, 2019

This patch does a code refactoring to taprio_get_start_time() function
to improve readability and report error properly.

If 'base' time is later than 'now', the start time is equal to 'base'
and taprio_get_start_time() is done. That's the natural case so we move
that code to the beginning of the function. Also, if 'cycle' calculation
is zero, something went really wrong with taprio and we should log that
internal error properly.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8599099f

net: sched: taprio: Remove pointless variable assigment · 59ab87f6

Andre Guedes authored Apr 23, 2019

This patch removes a pointless variable assigment in taprio_change().
The 'err' variable is not used from this assignment to the next one so
this patch removes it.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

59ab87f6

net: Change nhc_flags to unsigned char · ecc5663c

David Ahern authored Apr 23, 2019

nhc_flags holds the RTNH_F flags for a given nexthop (fib{6}_nh).
All of the RTNH_F_ flags fit in an unsigned char, and since the API to
userspace (rtnh_flags and lower byte of rtm_flags) is 1 byte it can not
grow. Make nhc_flags in fib_nh_common an unsigned char and shrink the
size of the struct by 8, from 56 to 48 bytes.

Update the flags arguments for up netdevice events and fib_nexthop_info
which determines the RTNH_F flags to return on a dump/event. The RTNH_F
flags are passed in the lower byte of rtm_flags which is an unsigned int
so use a temp variable for the flags to fib_nexthop_info and combine
with rtm_flags in the caller.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ecc5663c

lwtunnel: Pass encap and encap type attributes to lwtunnel_fill_encap · ffa8ce54

David Ahern authored Apr 23, 2019

Currently, lwtunnel_fill_encap hardcodes the encap and encap type
attributes as RTA_ENCAP and RTA_ENCAP_TYPE, respectively. The nexthop
objects want to re-use this code but the encap attributes passed to
userspace as NHA_ENCAP and NHA_ENCAP_TYPE. Since that is the only
difference, change lwtunnel_fill_encap to take the attribute type as
an input.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ffa8ce54

ravb: Avoid unsupported internal delay mode for R-Car E3/D3 · 0a5d329f

Simon Horman authored Apr 23, 2019

According to the R-Car Gen3 Hardware Manual Rev 1.50 of Nov 30, 2018, the
TX clock internal delay mode isn't supported on R-Car E3 (r8a77990) or D3
(r8a77995). And by extension it is also not supported by RZ/G2E (r9a774c0).

This matches all ES versions of the affected SoCs as it is
not clear if this problem will be resolved in newer chips.
This can be revisited, as necessary.

This patch does not error-out if PHY_INTERFACE_MODE_RGMII_ID or
PHY_INTERFACE_MODE_RGMII_TXID are used on SoCs where TX clock delay
mode is not supported as there is a risk of introducing a regression
when used in conjunction with older DT blobs present in the field.
Rather, a warning is logged in such cases.

Based on work by Kazuya Mizuguchi.
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0a5d329f

net: fix sparc64 compilation of sock_gettstamp · c98f4822

Stephen Rothwell authored Apr 23, 2019

net/core/sock.c: In function 'sock_gettstamp':
net/core/sock.c:3007:23: error: expected '}' before ';' token
    .tv_sec = ts.tv_sec;
                       ^
net/core/sock.c:3011:4: error: expected ')' before 'return'
    return -EFAULT;
    ^~~~~~
net/core/sock.c:3013:2: error: expected expression before '}' token
  }
  ^

Fixes: c7cbdbf2 ("net: rework SIOCGSTAMP ioctl handling")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

c98f4822

isdn:mISDN: fix misuse of %x in hfcpci.c · 0fa4122b

Fuqian Huang authored Apr 23, 2019

Pointers should be printed with %p or %px rather than
cast to (u_long) and printed with %lx.
Change %lx to %p to print the pointer.
Change %lx to %pad to print the dma_addr_t.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0fa4122b

isdn: hisax: Fix misuse of %x in config.c · 6f9fd97e

Fuqian Huang authored Apr 23, 2019

Pointers should be printed with %p or %px rather than
cast to (u_long) type and printed with %lX.
As the function seems to be for debug purpose.
Change %lX to %px to print the pointer value.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6f9fd97e

Merge branch 'ipv6-fib6_ref-conversion-to-refcount_t' · 6b18bdfd

David S. Miller authored Apr 23, 2019

Eric Dumazet says:

====================
ipv6: fib6_ref conversion to refcount_t

We are chasing use-after-free in IPv6 that could have their origin
in fib6_ref 0 -> 1 transitions.

This patch series should help finding the root causes if these
illegal transitions ever happen.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6b18bdfd

ipv6: convert fib6_ref to refcount_t · f05713e0

Eric Dumazet authored Apr 22, 2019

We suspect some issues involving fib6_ref 0 -> 1 transitions might
cause strange syzbot reports.

Lets convert fib6_ref to refcount_t to catch them earlier.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f05713e0

ipv6: broadly use fib6_info_hold() helper · 5ea71528

Eric Dumazet authored Apr 22, 2019

Instead of using atomic_inc(), prefer fib6_info_hold()
so that upcoming refcount_t conversion is simpler.

Only fib6_info_alloc() is using atomic_set() since we
just allocated a new object.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5ea71528

ipv6: fib6_info_destroy_rcu() cleanup · b0270550

Eric Dumazet authored Apr 22, 2019

We do not need to clear f6i->rt6i_exception_bucket right before
freeing f6i.

Note that f6i->rt6i_exception_bucket is properly protected by
f6i->exception_bucket_flushed being set to one in rt6_flush_exceptions()
under the protection of rt6_exception_lock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b0270550

Merge tag 'mlx5-updates-2019-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 20eb08b2

David S. Miller authored Apr 23, 2019

Saeed Mahameed says:

====================
mlx5-updates-2019-04-22

This series includes updates to mlx5e driver RX data path and some
significant XDP RX/TX improvements to overcome/mitigate HW and PCIE
bottlenecks.

From Tariq:
1) Some Enhancements in rq->flags
2) Stabilize RX packet rate (on Striding RQ) with
multiple outstanding UMR posts
In this patch, we add support for multiple outstanding UMR posts,
 to allow faster gap closure between consuming MPWQEs and reposting
them back into the WQ.

Performance test:
As expected, huge improvement in large-scale (48 cores).

xdp_redirect_map, 64B UDP multi-stream.
Redirect from ConnectX-5 100Gbps to ConnectX-6 100Gbps.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz.

Before: Unstable, 7 to 30 Mpps
After:  Stable,   at 70.5 Mpps

From Shay:
3) XDP, Inline small packets into the TX MPWQE in XDP xmit flow

Upon high packet rate with multiple CPUs TX workloads, much of the HCA's
resources are spent on prefetching TX descriptors, thus affecting
transmission rates.
This patch comes to mitigate this problem by moving some workload to the
CPU and reducing the HW data prefetch overhead for small packets (<= 256B).

When forwarding packets with XDP, a packet that is smaller
than a certain size (set to ~256 bytes) would be sent inline within
its WQE TX descrptor (mem-copied), when the hardware tx queue is congested
beyond a pre-defined water-mark.

Performance:
    Tested packet rate for UDP 64Byte multi-stream
    over two dual port ConnectX-5 100Gbps NICs.
    CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

    * Tested with hyper-threading disabled

    XDP_TX:

    |          | before | after   |       |
    | 24 rings | 51Mpps | 116Mpps | +126% |
    | 1 ring   | 12Mpps | 12Mpps  | same  |

    XDP_REDIRECT:

    ** Below is the transmit rate, not the redirection rate
    which might be larger, and is not affected by this patch.

    |          | before  | after   |      |
    | 32 rings | 64Mpps  | 92Mpps  | +43% |
    | 1 ring   | 6.4Mpps | 6.4Mpps | same |

As we can see, feature significantly improves scaling, without
hurting single ring performance.

From Maxim:
4) Some trivial refactoring and code improvements prior to a larger series
to support AF_XDP.
====================
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

20eb08b2

23 Apr, 2019 25 commits

net/mlx5e: Use #define for the WQE wait timeout constant · f8ebecf2

Maxim Mikityanskiy authored Mar 05, 2019

Create a #define for the timeout of mlx5e_wait_for_min_rx_wqes to
clarify the meaning of a magic number.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

f8ebecf2

net/mlx5e: Remove unused rx_page_reuse stat · 03ceda6f

Maxim Mikityanskiy authored Mar 21, 2019

Remove the no longer used page_reuse stat of RQs.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

03ceda6f

net/mlx5e: Take HW interrupt trigger into a function · 63d26b49

Maxim Mikityanskiy authored Mar 01, 2019

mlx5e_trigger_irq posts a NOP to the ICO SQ just to trigger an IRQ and
enter the NAPI poll on the right CPU according to the affinity. Use it
in mlx5e_activate_rq.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

63d26b49

net/mlx5e: Remove unused parameter · 10961c56

Maxim Mikityanskiy authored Mar 27, 2019

mdev is unused in mlx5e_rx_is_linear_skb.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

10961c56

net/mlx5e: Add an underflow warning comment · b1b187e1

Maxim Mikityanskiy authored Mar 27, 2019

mlx5e_mpwqe_get_log_rq_size calculates the number of WQEs (N) based on
the requested number of frames in the RQ (F) and the number of packets
per WQE (P). It ensures that N is not less than the minimum number of
WQEs in an RQ (N_min). Arithmetically, it means that F / P >= N_min
should be true. This function deals with logarithms, so it should check
that log(F) - log(P) >= log(N_min). However, if F < P, this expression
will cause an unsigned underflow. Check log(F) >= log(P) + log(N_min)
instead.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

b1b187e1

net/mlx5e: Move parameter calculation functions to en/params.c · 9a22d5d8

Maxim Mikityanskiy authored Mar 27, 2019

This commit moves the parameter calculation functions to a separate file
for better modularity and code sharing with future features.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

9a22d5d8

net/mlx5e: Report mlx5e_xdp_set errors · 74bbaebf

Maxim Mikityanskiy authored Mar 19, 2019

If the channels fail to reopen after setting an XDP program, return the
error code instead of 0. A proper fix is still needed, as now any error
while reopening the channels brings the interface down. This patch only
adds error reporting.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

74bbaebf

net/mlx5e: Remove unused parameter · 83b2fd64

Maxim Mikityanskiy authored Mar 07, 2019

params is unused in mlx5e_init_di_list.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

83b2fd64

net/mlx5e: XDP, Inline small packets into the TX MPWQE in XDP xmit flow · c2273219

Shay Agroskin authored Mar 14, 2019

Upon high packet rate with multiple CPUs TX workloads, much of the HCA's
resources are spent on prefetching TX descriptors, thus affecting
transmission rates.
This patch comes to mitigate this problem by moving some workload to the
CPU and reducing the HW data prefetch overhead for small packets (<= 256B).

When forwarding packets with XDP, a packet that is smaller
than a certain size (set to ~256 bytes) would be sent inline within
its WQE TX descrptor (mem-copied), when the hardware tx queue is congested
beyond a pre-defined water-mark.

This is added to better utilize the HW resources (which now makes
one less packet data prefetch) and allow better scalability, on the
account of CPU usage (which now 'memcpy's the packet into the WQE).

To load balance between HW and CPU and get max packet rate, we use
watermarks to detect how much the HW is congested and move the work
loads back and forth between HW and CPU.

Performance:
Tested packet rate for UDP 64Byte multi-stream
over two dual port ConnectX-5 100Gbps NICs.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

* Tested with hyper-threading disabled

XDP_TX:

|          | before | after   |       |
| 24 rings | 51Mpps | 116Mpps | +126% |
| 1 ring   | 12Mpps | 12Mpps  | same  |

XDP_REDIRECT:

** Below is the transmit rate, not the redirection rate
which might be larger, and is not affected by this patch.

|          | before  | after   |      |
| 32 rings | 64Mpps  | 92Mpps  | +43% |
| 1 ring   | 6.4Mpps | 6.4Mpps | same |

As we can see, feature significantly improves scaling, without
hurting single ring performance.
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

c2273219

net/mlx5e: XDP, Add TX MPWQE session counter · 73cab880

Shay Agroskin authored Feb 25, 2019

This counter tracks how many TX MPWQE sessions are started in XDP SQ
in XDP TX/REDIRECT flow. It counts per-channel and global stats.
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

73cab880

net/mlx5e: XDP, Enhance RQ indication for XDP redirect flush · 15143bf5

Tariq Toukan authored Mar 10, 2019

The XDP redirect flush indication belongs to the receive queue,
not to its XDP send queue.

For this, use a new bit on rq->flags.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

15143bf5

net/mlx5e: XDP, Fix shifted flag index in RQ bitmap · f03590f7

Tariq Toukan authored Mar 10, 2019

Values in enum mlx5e_rq_flag are used as bit indixes.
Intention was to use them with no BIT(i) wrapping.

No functional bug fix here, as the same (shifted)flag bit
is used for all set, test, and clear operations.

Fixes: 121e8927 ("net/mlx5e: Refactor RQ XDP_TX indication")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

f03590f7

net/mlx5e: RX, Support multiple outstanding UMR posts · fd9b4be8

Tariq Toukan authored Feb 27, 2019

The buffers mapping of the Multi-Packet WQEs (of Striding RQ)
is done via UMR posts, one UMR WQE per an RX MPWQE.

A single MPWQE is capable of serving many incoming packets,
usually larger than the budget of a single napi cycle.
Hence, posting a single UMR WQE per napi cycle (and handling its
completion in the next cycle) works fine in many common cases,
but not always.

When an XDP program is loaded, every MPWQE is capable of serving less
packets, to satisfy the packet-per-page requirement.
Thus, for the same number of packets more MPWQEs (and UMR posts)
are needed (twice as much for the default MTU), giving less latency
room for the UMR completions.

In this patch, we add support for multiple outstanding UMR posts,
to allow faster gap closure between consuming MPWQEs and reposting
them back into the WQ.

For better SW and HW locality, we combine the UMR posts in bulks of
(at least) two.

This is expected to improve packet rate in high CPU scale.

Performance test:
As expected, huge improvement in large-scale (48 cores).

xdp_redirect_map, 64B UDP multi-stream.
Redirect from ConnectX-5 100Gbps to ConnectX-6 100Gbps.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz.

Before: Unstable, 7 to 30 Mpps
After:  Stable,   at 70.5 Mpps

No degradation in other tested scenarios.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

fd9b4be8

Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux · 3839f99d
Saeed Mahameed authored Apr 23, 2019

3839f99d

Merge branch 'net-phy-mscc-Improvements-to-VSC8514-PHY-driver' · 539b593d

David S. Miller authored Apr 23, 2019

Kavya Sree Kotagiri says:

====================
net: phy: mscc: Improvements to VSC8514 PHY driver.

    The VSC8514 PHY is a 4-ports PHY that is 10/100/1000BASE-T, 100BASE-FX,
    1000BASE-X, can communicate with the MAC via QSGMII.
    The MAC interface protocol for each port within QSGMII can
    be either 1000BASE-X or SGMII, if the QSGMII MAC that the VSC8514 is
    connecting to supports this functionality.
    VSC8514 also supports SGMII MAC-side autonegotiation on each individual
    port, downshifting, can set the blinking pattern of each of its 4 LEDs,
    SyncE, 1000BASE-T Ring Resiliency as well as HP Auto-MDIX detection.

    This patch series adds support for 10BASE-T, 100BASE-TX, and
    1000BASE-T, QSGMII link with the MAC, downshifting, HP Auto-MDIX
    detection and blinking pattern for its 4 LEDs.

    The GPIO register bank is a set of registers that are common to all
    PHYs in the package. So any modification in any register of this bank
    affects all PHYs of the package.

    If the PHYs haven't been reset before booting the Linux kernel and were
    configured to use interrupts for e.g. link status updates, it is
    required to clear the interrupts mask register of all PHYs before being
    able to use interrupts with any PHY. The first PHY of the package that
    will be init will take care of clearing all PHYs interrupts mask
    registers. Thus, we need to keep track of the init sequence in the
    package, if it's already been done or if it's to be done.

    Most of the init sequence of a PHY of the package is common to all PHYs
    in the package, thus we use the SMI broadcast feature which enables us
    to propagate a write in one register of one PHY to all PHYs in the same
    package.

    This patch series adds support for VSC8514 in Microsemi driver(mscc.c)
    and removes support from Vitesse driver(vitesse.c).

v8
- mscc: Added appropriate code using phy_modify() in vsc8514_config_init().

v7
- mscc: Handled return values in vsc8514_config_init().

v6
- mscc: Added proper return value in vsc85xx_csr_ctrl_phy_read().
- mscc: Replaced __mdiobus_write and__mdiobus_read with __phy_write and __phy_read resp.
- mscc: Replaced register addresses in 8514_config_init() with proper constants.

v5
- mscc: Added return error statements for few function calls.
- mscc: Added comments in vsc85xx_csr_ctrl_phy_read() and vsc85xx_csr_ctrl_phy_write()
v4
- mscc: Removed features settings
- mscc: Removed aneg_done settings.

v3
- mscc: Used BIT(x) for PHY_MCB_S6G_WRITE and PHY_MCB_S6G_READ
        instead of hex.
- mscc: Replaced magic numbers with proper constants.
- mscc: Handled delays and timeouts at appropriate points.
- mscc: Added comments/explanation where requested.

v2
- mscc: Sorted variable declarations in reverse christmas tree order.

v1
- Added 0/2 file.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

539b593d

net: phy: vitesse: Remove support for VSC8514. · edeb207b

Kavya Sree Kotagiri authored Apr 22, 2019

Add support for VSC8514 in Microsemi driver (mscc.c)
with more features.
Signed-off-by: Kavya Sree Kotagiri <kavyasree.kotagiri@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

edeb207b

net: phy: mscc: add support for VSC8514 PHY. · e4f9ba64

Kavya Sree Kotagiri authored Apr 22, 2019

The VSC8514 PHY is a 4-ports PHY that is 10/100/1000BASE-T, 100BASE-FX,
1000BASE-X, can communicate with the MAC via QSGMII.
The MAC interface protocol for each port within QSGMII can
be either 1000BASE-X or SGMII, if the QSGMII MAC that the VSC8514 is
connecting to supports this functionality.
VSC8514 also supports SGMII MAC-side autonegotiation on each individual
port, downshifting, can set the blinking pattern of each of its 4 LEDs,
SyncE, 1000BASE-T Ring Resiliency as well as HP Auto-MDIX detection.

This adds support for 10BASE-T, 100BASE-TX, and 1000BASE-T,
QSGMII link with the MAC, downshifting, HP Auto-MDIX detection
and blinking pattern for its 4 LEDs.

The GPIO register bank is a set of registers that are common to all PHYs
in the package. So any modification in any register of this bank affects
all PHYs of the package.

If the PHYs haven't been reset before booting the Linux kernel and were
configured to use interrupts for e.g. link status updates, it is
required to clear the interrupts mask register of all PHYs before being
able to use interrupts with any PHY. The first PHY of the package that
will be init will take care of clearing all PHYs interrupts mask
registers. Thus, we need to keep track of the init sequence in the
package, if it's already been done or if it's to be done.

Most of the init sequence of a PHY of the package is common to all PHYs
in the package, thus we use the SMI broadcast feature which enables us
to propagate a write in one register of one PHY to all PHYs in the same
package.
Signed-off-by: Kavya Sree Kotagiri <kavyasree.kotagiri@microchip.com>
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Co-developed-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4f9ba64

net: phy: marvell: add new default led configure for m88e151x · a93f7fe1

Jian Shen authored Apr 22, 2019

The default m88e151x LED configuration is 0x1177, used LED[0]
for 1000M link, LED[1] for 100M link, and LED[2] for active.
But for some boards, which use LED[0] for link, and LED[1] for
active, prefer to be 0x1040. To be compatible with this case,
this patch defines a new dev_flag, and set it before connect
phy in HNS3 driver. When phy initializing, using the new
LED configuration if this dev_flag is set.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a93f7fe1

net: systemport: Remove need for DMA descriptor · 7e6e185c

Florian Fainelli authored Apr 22, 2019

All we do is write the length/status and address bits to a DMA
descriptor only to write its contents into on-chip registers right
after, eliminate this unnecessary step.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7e6e185c

bridge: Fix possible use-after-free when deleting bridge port · 697cd36c

Ido Schimmel authored Apr 22, 2019

When a bridge port is being deleted, do not dereference it later in
br_vlan_port_event() as it can result in a use-after-free [1] if the RCU
callback was executed before invoking the function.

[1]
[  129.638551] ==================================================================
[  129.646904] BUG: KASAN: use-after-free in br_vlan_port_event+0x53c/0x5fd
[  129.654406] Read of size 8 at addr ffff8881e4aa1ae8 by task ip/483
[  129.663008] CPU: 0 PID: 483 Comm: ip Not tainted 5.1.0-rc5-custom-02265-ga946bd73daac #1383
[  129.672359] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[  129.682484] Call Trace:
[  129.685242]  dump_stack+0xa9/0x10e
[  129.689068]  print_address_description.cold.2+0x9/0x25e
[  129.694930]  kasan_report.cold.3+0x78/0x9d
[  129.704420]  br_vlan_port_event+0x53c/0x5fd
[  129.728300]  br_device_event+0x2c7/0x7a0
[  129.741505]  notifier_call_chain+0xb5/0x1c0
[  129.746202]  rollback_registered_many+0x895/0xe90
[  129.793119]  unregister_netdevice_many+0x48/0x210
[  129.803384]  rtnl_delete_link+0xe1/0x140
[  129.815906]  rtnl_dellink+0x2a3/0x820
[  129.844166]  rtnetlink_rcv_msg+0x397/0x910
[  129.868517]  netlink_rcv_skb+0x137/0x3a0
[  129.882013]  netlink_unicast+0x49b/0x660
[  129.900019]  netlink_sendmsg+0x755/0xc90
[  129.915758]  ___sys_sendmsg+0x761/0x8e0
[  129.966315]  __sys_sendmsg+0xf0/0x1c0
[  129.988918]  do_syscall_64+0xa4/0x470
[  129.993032]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  129.998696] RIP: 0033:0x7ff578104b58
...
[  130.073811] Allocated by task 479:
[  130.077633]  __kasan_kmalloc.constprop.5+0xc1/0xd0
[  130.083008]  kmem_cache_alloc_trace+0x152/0x320
[  130.088090]  br_add_if+0x39c/0x1580
[  130.092005]  do_set_master+0x1aa/0x210
[  130.096211]  do_setlink+0x985/0x3100
[  130.100224]  __rtnl_newlink+0xc52/0x1380
[  130.104625]  rtnl_newlink+0x6b/0xa0
[  130.108541]  rtnetlink_rcv_msg+0x397/0x910
[  130.113136]  netlink_rcv_skb+0x137/0x3a0
[  130.117538]  netlink_unicast+0x49b/0x660
[  130.121939]  netlink_sendmsg+0x755/0xc90
[  130.126340]  ___sys_sendmsg+0x761/0x8e0
[  130.130645]  __sys_sendmsg+0xf0/0x1c0
[  130.134753]  do_syscall_64+0xa4/0x470
[  130.138864]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  130.146195] Freed by task 0:
[  130.149421]  __kasan_slab_free+0x125/0x170
[  130.154016]  kfree+0xf3/0x310
[  130.157349]  kobject_put+0x1a8/0x4c0
[  130.161363]  rcu_core+0x859/0x19b0
[  130.165175]  __do_softirq+0x250/0xa26
[  130.170956] The buggy address belongs to the object at ffff8881e4aa1ae8
                which belongs to the cache kmalloc-1k of size 1024
[  130.184972] The buggy address is located 0 bytes inside of
                1024-byte region [ffff8881e4aa1ae8, ffff8881e4aa1ee8)

Fixes: 9c0ec2e7 ("bridge: support binding vlan dev link state to vlan member bridge ports")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Cc: Mike Manning <mmanning@vyatta.att-mail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Mike Manning <mmanning@vyatta.att-mail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

697cd36c

r8152: sync sa_family with the media type of network device · a6cbcb77

Crag.Wang authored Apr 22, 2019

Without this patch the socket address family sporadically gets wrong
value ends up the dev_set_mac_address() fails to set the desired MAC
address.

Fixes: 25766271 ("r8152: Refresh MAC address during USBDEVFS_RESET")
Signed-off-by: Crag.Wang <crag.wang@dell.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-By: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a6cbcb77

Merge branch 'mlxsw-Shared-buffer-improvements' · 6f97955f

David S. Miller authored Apr 22, 2019

Ido Schimmel says:

====================
mlxsw: Shared buffer improvements

This patchset includes two improvements with regards to shared buffer
configuration in mlxsw.

The first part of this patchset forbids the user from performing illegal
shared buffer configuration that can result in unnecessary packet loss.
In order to better communicate these configuration failures to the user,
extack is propagated from devlink towards drivers. This is done in
patches #1-#8.

The second part of the patchset deals with the shared buffer
configuration of the CPU port. When a packet is trapped by the device,
it is sent across the PCI bus to the attached host CPU. From the
device's perspective, it is as if the packet is transmitted through the
CPU port.

While testing traffic directed at the CPU it became apparent that for
certain packet sizes and certain burst sizes, the current shared buffer
configuration of the CPU port is inadequate and results in packet drops.
The configuration is adjusted by patches #9-#14 that create two new pools
- ingress & egress - which are dedicated for CPU traffic.
====================
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6f97955f

mlxsw: spectrum_buffers: Adjust CPU port shared buffer egress quotas · 7a1ff9f4

Ido Schimmel authored Apr 22, 2019

Switch the CPU port to use the new dedicated egress pool instead the
previously used egress pool which was shared with normal front panel
ports.

Add per-port quotas for the amount of traffic that can be buffered for
the CPU port and also adjust the per-{port, TC} quotas.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a1ff9f4

mlxsw: spectrum_buffers: Allow skipping ingress port quota configuration · 6d28725c

Ido Schimmel authored Apr 22, 2019

The CPU port is used to transmit traffic that is trapped to the host
CPU. It is therefore irrelevant to define ingress quota for it.

Add a 'skip_ingress' argument to the function tasked with configuring
per-port quotas, so that ingress quotas could be skipped in case the
passed local port is the CPU port.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6d28725c

mlxsw: spectrum_buffers: Split business logic from mlxsw_sp_port_sb_pms_init() · 24a7cc1e

Ido Schimmel authored Apr 22, 2019

The function is used to set the per-port shared buffer quotas.
Currently, these quotas are only set for front panel ports, but a
subsequent patch will configure these quotas for the CPU port as well.

The configuration required for the CPU port is a bit different than that
of the front panel ports, so split the business logic into a separate
function which will be called with different parameters for the CPU
port.

No functional changes intended.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

24a7cc1e