Commits · 286556c0b63a91d48770d5979e6653656235dbb0 · nexedi / linux

06 Jun, 2017 40 commits

Merge branch 'bpf-prog-map-ID' · 286556c0

David S. Miller authored Jun 06, 2017

Martin KaFai Lau says:

====================
Introduce bpf ID

This patch series:
1) Introduce ID for both bpf_prog and bpf_map.
2) Add bpf commands to iterate the prog IDs and map
   IDs of the system.
3) Add bpf commands to get a prog/map fd from an ID
4) Add bpf command to get prog/map info from a fd.
   The prog/map info is a jump start in this patchset
   and it is not meant to be a complete list.  They can
   be extended in the future patches.

v3:
- I suspect v2 may not have applied cleanly.
  In particular, patch 1 has conflict with a recent
  change in struct bpf_prog_aux introduced at a similar time frame:
  8726679a ("bpf: teach verifier to track stack depth")
  v3 should have fixed it.

v2:
Compiler warning fixes:
- Remove lockdep_is_held() usage.  Add comment
  to explain the lock situation instead.
- Add static for idr related variables
- Add __user to the uattr param in bpf_prog_get_info_by_fd()
  and bpf_map_get_info_by_fd().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

286556c0

bpf: Test for bpf ID · 95b9afd3

Martin KaFai Lau authored Jun 05, 2017

Add test to exercise the bpf_prog/map id generation,
bpf_(prog|map)_get_next_id(), bpf_(prog|map)_get_fd_by_id() and
bpf_get_obj_info_by_fd().
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

95b9afd3

bpf: Add BPF_OBJ_GET_INFO_BY_FD · 1e270976

Martin KaFai Lau authored Jun 05, 2017

A single BPF_OBJ_GET_INFO_BY_FD cmd is used to obtain the info
for both bpf_prog and bpf_map.  The kernel can figure out the
fd is associated with a bpf_prog or bpf_map.

The suggested struct bpf_prog_info and struct bpf_map_info are
not meant to be a complete list and it is not the goal of this patch.
New fields can be added in the future patch.

The focus of this patch is to create the interface,
BPF_OBJ_GET_INFO_BY_FD cmd for exposing the bpf_prog's and
bpf_map's info.

The obj's info, which will be extended (and get bigger) over time, is
separated from the bpf_attr to avoid bloating the bpf_attr.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

1e270976

bpf: Add jited_len to struct bpf_prog · 783d28dd

Martin KaFai Lau authored Jun 05, 2017

Add jited_len to struct bpf_prog.  It will be
useful for the struct bpf_prog_info which will
be added in the later patch.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

783d28dd

bpf: Add BPF_MAP_GET_FD_BY_ID · bd5f5f4e

Martin KaFai Lau authored Jun 05, 2017

Add BPF_MAP_GET_FD_BY_ID command to allow user to get a fd
from a bpf_map's ID.

bpf_map_inc_not_zero() is added and is called with map_idr_lock
held.

__bpf_map_put() is also added which has the 'bool do_idr_lock'
param to decide if the map_idr_lock should be acquired when
freeing the map->id.

In the error path of bpf_map_inc_not_zero(), it may have to
call __bpf_map_put(map, false) which does not need
to take the map_idr_lock when freeing the map->id.

It is currently limited to CAP_SYS_ADMIN which we can
consider to lift it in followup patches.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd5f5f4e

bpf: Add BPF_PROG_GET_FD_BY_ID · b16d9aa4

Martin KaFai Lau authored Jun 05, 2017

Add BPF_PROG_GET_FD_BY_ID command to allow user to get a fd
from a bpf_prog's ID.

bpf_prog_inc_not_zero() is added and is called with prog_idr_lock
held.

__bpf_prog_put() is also added which has the 'bool do_idr_lock'
param to decide if the prog_idr_lock should be acquired when
freeing the prog->id.

In the error path of bpf_prog_inc_not_zero(), it may have to
call __bpf_prog_put(map, false) which does not need
to take the prog_idr_lock when freeing the prog->id.

It is currently limited to CAP_SYS_ADMIN which we can
consider to lift it in followup patches.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

b16d9aa4

bpf: Add BPF_(PROG|MAP)_GET_NEXT_ID command · 34ad5580

Martin KaFai Lau authored Jun 05, 2017

This patch adds BPF_PROG_GET_NEXT_ID and BPF_MAP_GET_NEXT_ID
to allow userspace to iterate all bpf_prog IDs and bpf_map IDs.

The API is trying to be consistent with the existing
BPF_MAP_GET_NEXT_KEY.

It is currently limited to CAP_SYS_ADMIN which we can
consider to lift it in followup patches.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

34ad5580

bpf: Introduce bpf_map ID · f3f1c054

Martin KaFai Lau authored Jun 05, 2017

This patch generates an unique ID for each created bpf_map.
The approach is similar to the earlier patch for bpf_prog ID.

It is worth to note that the bpf_map's ID and bpf_prog's ID
are in two independent ID spaces and both have the same valid range:
[1, INT_MAX).
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

f3f1c054

bpf: Introduce bpf_prog ID · dc4bb0e2

Martin KaFai Lau authored Jun 05, 2017

This patch generates an unique ID for each BPF_PROG_LOAD-ed prog.
It is worth to note that each BPF_PROG_LOAD-ed prog will have
a different ID even they have the same bpf instructions.

The ID is generated by the existing idr_alloc_cyclic().
The ID is ranged from [1, INT_MAX).  It is allocated in cyclic manner,
so an ID will get reused every 2 billion BPF_PROG_LOAD.

The bpf_prog_alloc_id() is done after bpf_prog_select_runtime()
because the jit process may have allocated a new prog.  Hence,
we need to ensure the value of pointer 'prog' will not be changed
any more before storing the prog to the prog_idr.

After bpf_prog_select_runtime(), the prog is read-only.  Hence,
the id is stored in 'struct bpf_prog_aux'.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

dc4bb0e2

cxgb4: implement ndo_set_vf_rate() · 8ea4fae9

Ganesh Goudar authored Jun 05, 2017

Implement ndo_set_vf_rate() for mgmt interface to support rate-limiting
of VF traffic using 'ip' command.

Based on the original work of Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8ea4fae9

ppp: mppe: Use vsnprintf extension %phN · 4f5a9841

Joe Perches authored Jun 05, 2017

Using this extension reduces the object size.

$ size drivers/net/ppp/ppp_mppe.o*
   text	   data	    bss	    dec	    hex	filename
   5683	    216	      8	   5907	   1713	drivers/net/ppp/ppp_mppe.o.new
   5808	    216	      8	   6032	   1790	drivers/net/ppp/ppp_mppe.o.old
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4f5a9841

net: phy: Delete unused function phy_ethtool_gset · f8fe9975

yuval.shaia@oracle.com authored Jun 05, 2017

It's unused, so remove it.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f8fe9975

Merge branch 's390-next-updates' · ea30afb4

David S. Miller authored Jun 06, 2017

Julian Wiedmann says:

====================
s390/net updates

please apply the following qeth updates for net-next.

Aside from some janitorial changes, this adds early setup for virtualized
HiperSockets devices - building upon the code that landed via -net earlier.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ea30afb4

s390/qeth: do early device setup for z/VM IQD NICs · c70eb09d

Julian Wiedmann authored Jun 06, 2017

qeth currently supports early setup for OSM and OSN devices.
This patch adds early setup support for z/VM HiperSockets,
since they can only be coupled to L3 networks.

Based on an initial version by Dmitriy Lakhvich.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c70eb09d

s390/qeth: add support for early L3 device setup · 79a04e40

Ursula Braun authored Jun 06, 2017

Similar to how qeth currently does early L2 setup of OSM and OSN
devices, add support for early setup of L3-only devices.
This adds a qeth_l3_devtype that contains all core and l3-specific
sysfs attributes, so that they can be created in one go while probing.

This just adds the infrastructure, exploitation of the support happens
in a subsequent patch.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

79a04e40

s390/qeth: silence qeth_fix_features() · cf536ffe

Julian Wiedmann authored Jun 06, 2017

Noting the lack of TSO support on every feature change is just silly,
in particular since the requested features might not even affect
NETIF_F_TSO.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cf536ffe

s390/qeth: consolidate pack buffer flushing · 664e42ac

Julian Wiedmann authored Jun 06, 2017

qeth_switch_to_nonpacking_if_needed() contains an open-coded version
of qeth_flush_buffers_on_no_pci(). Extract a single helper instead.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

664e42ac

s390/qeth: add missing strings for IPA return codes · 84616e86

Julian Wiedmann authored Jun 06, 2017

commit 76b11f8e ("qeth: HiperSockets Network Traffic Analyzer")
missed adding the human-readable translations when adding new RCs.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

84616e86

s390/qeth: log bridgeport capabilities · 521c10ea

Julian Wiedmann authored Jun 06, 2017

Bridgeport is a l2-specific feature, and we should write its
capabilities to a debug entry.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

521c10ea

s390/qeth: query IPv6 IPA support on HiperSockets · 23274596

Julian Wiedmann authored Jun 06, 2017

HiperSocket devices don't need the full IPv6 initialization, but we
should still query the supported assists for logging purposes.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

23274596

s390/qeth: remove skb_is_nonlinear() check on IQD · 94a9c981

Julian Wiedmann authored Jun 06, 2017

qeth doesn't advertise NETIF_F_SG for L3 IQDs. So trust the stack to
not hand us any nonlinear skbs, and remove an always-true condition.

With the fact that data_offset < 0 is no longer possible on IQDs,
apply a small cleanup to subsequent code.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

94a9c981

s390/qeth: remove support for IPA_IP_FRAGMENTATION · 4845b93f

Julian Wiedmann authored Jun 06, 2017

This Assist was never actually implemented in any hardware, so just
remove the leftovers.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4845b93f

Merge branch 'net-trap-control-action' · 359fce1d

David S. Miller authored Jun 06, 2017

Jiri Pirko says:

====================
net: introduce trap control action to tc and offload it

This patchset introduces a control action dedicated to indicate
to trap the matched packet to CPU. This is specific action for
HW offloads. Also, the patchset offloads the action to mlxsw driver.

Example usage:
$ tc filter add dev enp3s0np19 parent ffff: protocol ip prio 20 flower skip_sw dst_ip 192.168.10.1 action trap

v1->v2:
- patch 1
  - fix the comment according to Andrew's note
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

359fce1d

spectrum_flower: Implement gact trap TC action offload · bd5ddba5

Jiri Pirko authored Jun 06, 2017

Just use the previously prepared infrastructure and offload the gact
trap action to ACL.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd5ddba5

acl: Introduce ACL trap action · df7eea96

Jiri Pirko authored Jun 06, 2017

Use trap/discard flex action to implement trap.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

df7eea96

mlxsw: spectrum: Introduce ACL trap · 0db7b386

Jiri Pirko authored Jun 06, 2017

Introduce an ACL trap and put it into ip2me trap group.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0db7b386

mlxsw: pci: Fix size of trap_id field in CQE · be8408e1

Jiri Pirko authored Jun 06, 2017

The "trap_id" is 9bits long. So far, this was not a problem since we
used only traps with ids that fit into 8bits. But the ACL traps that are
going to be introduced use the 9th bit.

Fixes: eda6500a ("mlxsw: Add PCI bus implementation")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be8408e1

net: sched: introduce helper to identify gact trap action · 5a4d1fee

Jiri Pirko authored Jun 06, 2017

Introduce a helper called is_tcf_gact_trap which could be used to
tell if the action is gact trap or not.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5a4d1fee

net: sched: introduce a TRAP control action · e25ea21f

Jiri Pirko authored Jun 06, 2017

There is need to instruct the HW offloaded path to push certain matched
packets to cpu/kernel for further analysis. So this patch introduces a
new TRAP control action to TC.

For kernel datapath, this action does not make much sense. So with the
same logic as in HW, new TRAP behaves similar to STOLEN. The skb is just
dropped in the datapath (and virtually ejected to an upper level, which
does not exist in case of kernel).
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

e25ea21f

net/mlxfw: remove redundant goto on error check · 928a7595

Colin Ian King authored Jun 06, 2017

The check to see of err is set and the subsequent goto is extraneous
as the next statement is where the goto is jumping to. Remove this
redundant check and goto.

Detected by CoverityScan, CID#1437734 ("Identical code for
different branches")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

928a7595

Merge tag 'rxrpc-rewrite-20170606' of... · bb363140

David S. Miller authored Jun 06, 2017

Merge tag 'rxrpc-rewrite-20170606' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Support service upgrade

Here's a set of patches that allow AF_RXRPC to support the AuriStor service
upgrade facility.  This allows the server to change the service ID
requested to an upgraded service if the client requests it upon the
initiation of a connection.

This is used by the AuriStor AFS-compatible servers to implement IPv6
handling and improved facilities by providing improved volume location,
volume, protection, file and cache management services.  Note that certain
parts of the AFS protocol carry hard-coded IPv4 addresses.

The reason AuriStor does it this way is that probing the improved service
ID first will not incur an ABORT or any other response on some servers if
the server is not listening on it - and so one have to employ a timeout.

This is implemented in the server by allowing an AF_RXRPC server to call
bind() twice on a socket to allow it to listen on two service IDs and then
call setsockopt() to instruct the server to upgrade one into the other if
the client requests it (by setting userStatus to 1 on the first DATA packet
on a connection).  If the upgrade occurs, all further operations on that
connection are done with the new service ID.  AF_RXRPC has to handle this
automatically as connections are not exposed to userspace.

Clients can request this facility by setting an RXRPC_UPGRADE_SERVICE
command in the sendmsg() control buffer and then observing the resultant
service ID in the msg_addr returned by recvmsg().  This should only be used
to probe the service.  Clients should then use the returned service ID in
all subsequent communications with that server.  Note that the kernel will
not retain this information should the connection expire from its cache.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

bb363140

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 25f41150

David S. Miller authored Jun 06, 2017

Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2017-06-06

This series contains updates and fixes to e1000e and igb.

Matwey V Kornilov fixes an issue where igb_get_phy_id_82575() relies on
the fact that page 0 is already selected, but this is not the case after
igb_read_phy_reg_gs40g()/igb_write_phy_reg_gs40g() were removed in a
previous commit.  This leads to initialization failure and some devices
not working.  To fix the issue, explicitly select page 0 before first
access to PHY registers.

Arnd Bergmann modifies the driver to avoid a "defined but not used"
warning by removing #ifdefs and using __maybe_unused annotation instead
for new power management functions.

Jake provides most of the changes in the series, all around PTP and
timestamp fixes/updates.  Resolved several race conditions based on
the hardware can only handle one transmit timestamp at a time, so
fix the locking logic, as well as create a statistic for "skipped"
timestamps to help administrators identify issues.

Benjamin Poirier provides 2 changes, first to igb to remove the
second argument to igb_update_stats() since it always passes the
same two arguments.  So instead of having to pass the second argument,
just update the function to the necessary information from the adapter
structure.  Second modifies the e1000e_get_stats64() call to
dev_get_stats() to avoid ethtool garbage being reported.

Konstantin Khlebnikov modifies e1000e to use disable_hardirq(), instead
of disable_irq() for MSIx vectors in e1000_netpoll().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

25f41150

e1000e: use disable_hardirq() also for MSIX vectors in e1000_netpoll() · fd8e597b

Konstantin Khlebnikov authored May 19, 2017

Replace disable_irq() which waits for threaded irq handlers with
disable_hardirq() which waits only for hardirq part.

Fixes: 31119129 ("e1000: use disable_hardirq() for e1000_netpoll()")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

fd8e597b

e1000e: Don't return uninitialized stats · 24ad2a92

Benjamin Poirier authored May 17, 2017

Some statistics passed to ethtool are garbage because e1000e_get_stats64()
doesn't write them, for example: tx_heartbeat_errors. This leaks kernel
memory to userspace and confuses users.

Do like ixgbe and use dev_get_stats() which first zeroes out
rtnl_link_stats64.

Fixes: 5944701d ("net: remove useless memset's in drivers get_stats64")
Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

24ad2a92

igb: Remove useless argument · 81e3f64a

Benjamin Poirier authored May 16, 2017

Given that all callers of igb_update_stats() pass the same two arguments:
(adapter, &adapter->stats64), the second argument can be removed.
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

81e3f64a

igb: check for Tx timestamp timeouts during watchdog · e5f36ad1

Jacob Keller authored May 03, 2017

The igb driver has logic to handle only one Tx timestamp at a time,
using a state bit lock to avoid multiple requests at once.

It may be possible, if incredibly unlikely, that a Tx timestamp event is
requested but never completes. Since we use an interrupt scheme to
determine when the Tx timestamp occurred we would never clear the state
bit in this case.

Add an igb_ptp_tx_hang() function similar to the already existing
igb_ptp_rx_hang() function. This function runs in the watchdog routine
and makes sure we eventually recover from this case instead of
permanently disabling Tx timestamps.

Note: there is no currently known way to cause this without hacking the
driver code to force it.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e5f36ad1

igb: add statistic indicating number of skipped Tx timestamps · c3b8f85e

Jacob Keller authored May 03, 2017

The igb driver can only handle one Tx timestamp request at a time.
This means it is possible for an application timestamp request to be
ignored.

There is no easy way for an administrator to determine if this occurred.
Add a new statistic which tracks this, tx_hwtstamp_skipped.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

c3b8f85e

e1000e: add statistic indicating number of skipped Tx timestamps · cff57141

Jacob Keller authored May 03, 2017

The e1000e driver can only handle one Tx timestamp request at a time.
This means it is possible for an application timestamp request to be
ignored.

There is no easy way for an administrator to determine if this occurred.
Add a new statistic which tracks this, tx_hwtstamp_skipped.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

cff57141

igb: avoid permanent lock of *_PTP_TX_IN_PROGRESS · 74344e32

Jacob Keller authored May 03, 2017

The igb driver uses a state bit lock to avoid handling more than one Tx
timestamp request at once. This is required because hardware is limited
to a single set of registers for Tx timestamps.

The state bit lock is not properly cleaned up during
igb_xmit_frame_ring() if the transmit fails such as due to DMA or TSO
failure. In some hardware this results in blocking timestamps until the
service task times out. In other hardware this results in a permanent
lock of the timestamp bit because we never receive an interrupt
indicating the timestamp occurred, since indeed the packet was never
transmitted.

Fix this by checking for DMA and TSO errors in igb_xmit_frame_ring() and
properly cleaning up after ourselves when these occur.
Reported-by: Reported-by: David Mirabito <davidm@metamako.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

74344e32

igb: fix race condition with PTP_TX_IN_PROGRESS bits · 4ccdc013

Jacob Keller authored May 03, 2017

Hardware related to the igb driver has a limitation of only handling one
Tx timestamp at a time. Thus, the driver uses a state bit lock to
enforce that only one timestamp request is honored at a time.

Unfortunately this suffers from a simple race condition. The bit lock is
not cleared until after skb_tstamp_tx() is called notifying the stack of
a new Tx timestamp. Even a well behaved application which sends only one
timestamp request at once and waits for a response might wake up and
send a new packet before the bit lock is cleared. This results in
needlessly dropping some Tx timestamp requests.

We can fix this by unlocking the state bit as soon as we read the
Timestamp register, as this is the first point at which it is safe to
unlock.

To avoid issues with the skb pointer, we'll use a copy of the pointer
and set the global variable in the driver structure to NULL first. This
ensures that the next timestamp request does not modify our local copy
of the skb pointer.

This ensures that well behaved applications do not accidentally race
with the unlock bit. Obviously an application which sends multiple Tx
timestamp requests at once will still only timestamp one packet at
a time. Unfortunately there is nothing we can do about this.
Reported-by: David Mirabito <davidm@metamako.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

4ccdc013