Commits · e1c151a47942da66cd208dea2330547b34684fae · Kirill Smelkov / linux

17 Feb, 2017 25 commits

Merge branch 'rhashtable-allocation-failure-during-insertion' · e1c151a4

David S. Miller authored Feb 17, 2017

Herbert Xu says:

====================
rhashtable: Handle table allocation failure during insertion

v2 -

Added Ack to patch 2.
Fixed RCU annotation in code path executed by rehasher by using
rht_dereference_bucket.

v1 -

This series tackles the problem of table allocation failures during
insertion.  The issue is that we cannot vmalloc during insertion.
This series deals with this by introducing nested tables.

The first two patches removes manual hash table walks which cannot
work on a nested table.

The final patch introduces nested tables.

I've tested this with test_rhashtable and it appears to work.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e1c151a4

rhashtable: Add nested tables · da20420f

Herbert Xu authored Feb 11, 2017

This patch adds code that handles GFP_ATOMIC kmalloc failure on
insertion.  As we cannot use vmalloc, we solve it by making our
hash table nested.  That is, we allocate single pages at each level
and reach our desired table size by nesting them.

When a nested table is created, only a single page is allocated
at the top-level.  Lower levels are allocated on demand during
insertion.  Therefore for each insertion to succeed, only two
(non-consecutive) pages are needed.

After a nested table is created, a rehash will be scheduled in
order to switch to a vmalloced table as soon as possible.  Also,
the rehash code will never rehash into a nested table.  If we
detect a nested table during a rehash, the rehash will be aborted
and a new rehash will be scheduled.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

da20420f

tipc: Fix tipc_sk_reinit race conditions · 40f9f439

Herbert Xu authored Feb 11, 2017

There are two problems with the function tipc_sk_reinit.  Firstly
it's doing a manual walk over an rhashtable.  This is broken as
an rhashtable can be resized and if you manually walk over it
during a resize then you may miss entries.

Secondly it's missing memory barriers as previously the code used
spinlocks which provide the barriers implicitly.

This patch fixes both problems.

Fixes: 07f6c4bc ("tipc: convert tipc reference table to...")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

40f9f439

gfs2: Use rhashtable walk interface in glock_hash_walk · 98687f42

Herbert Xu authored Feb 11, 2017

The function glock_hash_walk walks the rhashtable by hand.  This
is broken because if it catches the hash table in the middle of
a rehash, then it will miss entries.

This patch replaces the manual walk by using the rhashtable walk
interface.

Fixes: 88ffbf3e ("GFS2: Use resizable hash table for glocks")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

98687f42

net: mvneta: make mvneta_eth_tool_ops static · 4581be42

Jisheng Zhang authored Feb 16, 2017

The mvneta_eth_tool_ops is only used internally in mvneta driver, so
make it static.
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4581be42

Merge branch 'net-sched-reflect-hw-offload-in-classifiers' · 4b5026ad

David S. Miller authored Feb 17, 2017

Or Gerlitz says:

====================
net/sched: Reflect HW offload status in classifiers

Currently there is no way of querying whether a filter is
offloaded to HW or not when using "both" policy (where none
of skip_sw or skip_hw flags are set by user-space).

Added two new flags, "in hw" and "not in hw" such that user space
can determine if a filter is actually offloaded to hw. The "in hw"
UAPI semantics was chosen so it's similar to the "skip hw" flag logic.

If none of these two flags are set, this signals running
over older kernel.

As an example, add one vlan push + fwd rule, one matchall rule and one u32 rule
without any flags, and another vlan + fwd skip_sw rule, such that the different TC
classifier attempt to offload all of them -- all over mlx5 SRIOV VF rep:

	flower skip_sw indev eth2_0 src_mac e4:11:22:33:44:50 dst_mac e4:1d:2d:a5:f3:9d
	action vlan push id 52 action mirred egress redirect dev eth2

	flower indev eth2_0 src_mac e4:11:22:33:44:50 dst_mac e4:11:22:33:44:51
	action vlan push id 53 action mirred egress redirect dev eth2

	u32 ht 800: flowid 800:1 match ip src 192.168.1.0/24 action drop

Since that VF rep doesn't offload matchall/u32 and can currently offload
only one vlan push rule we expect three of the rules not to be offloaded:

filter protocol ip pref 99 u32
filter protocol ip pref 99 u32 fh 800: ht divisor 1
filter protocol ip pref 99 u32 fh 800::1 order 1 key ht 800 bkt 0 flowid 800:1 not in_hw
  match c0a80100/ffffff00 at 12
	action order 1: gact action drop
	 random type none pass val 0
	 index 8 ref 1 bind 1

filter protocol all pref 49150 matchall
filter protocol all pref 49150 matchall handle 0x1
  not in_hw
	action order 1: mirred (Egress Mirror to device veth1) pipe
 	index 27 ref 1 bind 1

filter protocol ip pref 49151 flower
filter protocol ip pref 49151 flower handle 0x1
  indev eth2_0
  dst_mac e4:11:22:33:44:51
  src_mac e4:11:22:33:44:50
  eth_type ipv4
  not in_hw
	action order 1:  vlan push id 53 protocol 802.1Q priority 0 pipe
	 index 20 ref 1 bind 1

	action order 2: mirred (Egress Redirect to device eth2) stolen
 	index 26 ref 1 bind 1

filter protocol ip pref 49152 flower
filter protocol ip pref 49152 flower handle 0x1
  indev eth2_0
  dst_mac e4:1d:2d:a5:f3:9d
  src_mac e4:11:22:33:44:50
  eth_type ipv4
  skip_sw
  in_hw
	action order 1:  vlan push id 52 protocol 802.1Q priority 0 pipe
	 index 19 ref 1 bind 1

	action order 2: mirred (Egress Redirect to device eth2) stolen
 	index 25 ref 1 bind 1

v3 --> v4 changes:
 - removed extra parenthesis (Dave)

v2 --> v3 changes:
 - fixed the matchall dump flags patch to do proper checks (Jakub)
 - added the same proper checks to flower where they were missing
 - that flower patch was added as #1 and hence all the other patches are offed-by-one

v1 --> v2 changes:
 - applied feedback from Jakub and Dave -- where none of the skip flags were set,
   the suggested approach didn't allow user space to distringuish between old kernel
   to a case when offloading to HW worked fine.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4b5026ad

net/sched: cls_bpf: Reflect HW offload status · 5cecb6cc

Or Gerlitz authored Feb 16, 2017

BPF classifier support for the "in hw" offloading flags.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5cecb6cc

net/sched: cls_u32: Reflect HW offload status · 24d3dc6d

Or Gerlitz authored Feb 16, 2017

U32 support for the "in hw" offloading flags.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
Signed-off-by: David S. Miller <davem@davemloft.net>

24d3dc6d

net/sched: cls_matchall: Reflect HW offloading status · c7d2b2f5

Or Gerlitz authored Feb 16, 2017

Matchall support for the "in hw" offloading flags.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c7d2b2f5

net/sched: cls_flower: Reflect HW offload status · 55593960

Or Gerlitz authored Feb 16, 2017

Flower support for the "in hw" offloading flags.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

55593960

net/sched: Reflect HW offload status · e696028a

Or Gerlitz authored Feb 16, 2017

Currently there is no way of querying whether a filter is
offloaded to HW or not when using "both" policy (where none
of skip_sw or skip_hw flags are set by user-space).

Add two new flags, "in hw" and "not in hw" such that user
space can determine if a filter is actually offloaded to
hw or not. The "in hw" UAPI semantics was chosen so it's
similar to the "skip hw" flag logic.

If none of these two flags are set, this signals running
over older kernel.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e696028a

net/sched: cls_matchall: Dump the classifier flags · 7a335ada

Or Gerlitz authored Feb 16, 2017

The classifier flags are not dumped to user-space, do that.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a335ada

net/sched: cls_flower: Properly handle classifier flags dumping · 749e6720

Or Gerlitz authored Feb 16, 2017

Dump the classifier flags only if non zero and make sure to check
the return status of the handler that puts them into the netlink msg.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

749e6720

net: oki-semi: pch_gbe: use new api ethtool_{get|set}_link_ksettings · ad8e963c

Philippe Reynes authored Feb 16, 2017

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ad8e963c

net: ethernet: ti: cpsw: correct ale dev to cpsw · 9fe9aa0b

Ivan Khoronzhuk authored Feb 15, 2017

The ale is a property of cpsw, so change dev to cpsw->dev,
aka pdev->dev, to be consistent.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9fe9aa0b

net: nvidia: forcedeth: use new api ethtool_{get|set}_link_ksettings · 0fa9e289

Philippe Reynes authored Feb 14, 2017

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0fa9e289

Merge branch 'ptp-attribute-cleanup' · af6e2b5b

David S. Miller authored Feb 17, 2017

Dmitry Torokhov says:

====================
PTP attribute handling cleanup

PTP core was creating some attributes, such as "period" and "fifo", and the
entire "pins" attribute group, after creating class deevice, which creates
a race for userspace: uevent may arrive before all attributes are created.

This series of patches switches PTP to use is_visible() to control
visibility of attributes in a group, and device_create_with_groups() to
ensure that attributes are created before we notify userspace of a new
device.

v2:
- added Richard's acked-by to patch #1
- removed use of kmalloc_array in favor of kcalloc in patch #2 at
  Richard's request
- added a cover letter

v1:
- initial patch set
====================
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af6e2b5b

ptp: create "pins" together with the rest of attributes · 85a66e55

Dmitry Torokhov authored Feb 14, 2017

Let's switch to using device_create_with_groups(), which will allow us to
create "pins" attribute group together with the rest of ptp device
attributes, and before userspace gets notified about ptp device creation.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

85a66e55

ptp: use is_visible method to hide unused attributes · af59e717

Dmitry Torokhov authored Feb 14, 2017

Instead of creating selected attributes after the device is created (and
after userspace potentially seen uevent), lets use attribute group
is_visible() method to control which attributes are shown. This will allow
us to create all attributes (except "pins" group, which will be taken care
of later) before userspace gets notified about new ptp class device.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af59e717

ptp: use kcalloc when allocating arrays · 6f7aa56b

Dmitry Torokhov authored Feb 14, 2017

kcalloc is more semantically correct when allocating arrays of objects, and
overflow-safe.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6f7aa56b

ptp: do not explicitly set drvdata in ptp_clock_register() · 882f312d

Dmitry Torokhov authored Feb 14, 2017

We do not need explicitly call dev_set_drvdata(), as it is done for us by
device_create().
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

882f312d

mlx4: do not fire tasklet unless necessary · 01f0f425

Eric Dumazet authored Feb 10, 2017

All rx and rx netdev interrupts are handled by respectively
by mlx4_en_rx_irq() and mlx4_en_tx_irq() which simply schedule a NAPI.

But mlx4_eq_int() also fires a tasklet to service all items that were
queued via mlx4_add_cq_to_tasklet(), but this handler was not called
unless user cqe was handled.

This is very confusing, as "mpstat -I SCPU ..." show huge number of
tasklet invocations.

This patch saves this overhead, by carefully firing the tasklet directly
from mlx4_add_cq_to_tasklet(), removing four atomic operations per IRQ.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

01f0f425

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next · 99d5ceee

David S. Miller authored Feb 16, 2017

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2017-02-16

1) Make struct xfrm_input_afinfo const, nothing writes to it.
   From Florian Westphal.

2) Remove all places that write to the afinfo policy backend
   and make the struct const then.
   From Florian Westphal.

3) Prepare for packet consuming gro callbacks and add
   ESP GRO handlers. ESP packets can be decapsulated
   at the GRO layer then. It saves a round through
   the stack for each ESP packet.

Please note that this has a merge coflict between commit

63fca65d ("net: add confirm_neigh method to dst_ops")

from net-next and

3d7d25a6 ("xfrm: policy: remove garbage_collect callback")
a2817d8b ("xfrm: policy: remove family field")

from ipsec-next.

The conflict can be solved as it is done in linux-next.

Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

99d5ceee

Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 5237b9dd

David S. Miller authored Feb 16, 2017

Jeff Kirsher says:

====================
10GbE Intel Wired LAN Driver Updates 2017-02-16

This series contains updates to ixgbe only.

Tony updates the driver to advertise 2.5Gb and 5.0Gb if the adapter
supports it.

Stephen Hemminger renames our dcbnl_ops since it is global to
ixgbe_dcbnl_ops to avoid namespace issues.

Mark updates the driver version based on the recent changes.

Alex has the remainder of the changes, starting with consolidating
functions that represent logical steps in the receive process so we can
later update them more easily (and align with igb).  Modify the receive
path to only synchronize the length of the frame versus the entire buffer.
Provided performance improvements by adding support for
DMA_ATTR_SKIP_CPU_SYNC and DMA_ATTR_WEAK_ORDERING.  Also made additional
performance gains by batching the page count updates instead of doing
them one at a time.  Adjusted the receive path to use 3k buffers with
8k backing them in order to support build_skb with jumbo frames.  Made
additional driver improvements by using the length of the packet instead
of the DD status to determine if a new descriptor is ready to be
processed, which cuts down on reads.  To reduce code duplication, pulled
apart the receive path into separate functions.  Added support for
providing a buffer with headroom and tailroom to allow for shared info
for NET_SKB_PAD and NET_IP_ALIGN.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5237b9dd

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 3f64116a
David S. Miller authored Feb 16, 2017

3f64116a

16 Feb, 2017 15 commits

cxgb4: Remove redundant code in t4_uld_clean_up() · f3caf861

Ganesh Goudar authored Feb 16, 2017

Remove variable rxq_info and also remove redundant assignment
to it.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f3caf861

cxgb4: Add new T5 and T6 pci device id's · 5d071c24

Ganesh Goudar authored Feb 16, 2017

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5d071c24

cxgb4: Increase max number of tc u32 links · 45da1ca2

Arjun V authored Feb 16, 2017

Make max number of supported tc u32 links equal to max number of filters
supported by hardware.
Signed-off-by: Arjun V <arjun@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

45da1ca2

Merge tag 'media/v4.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 4695daef

Linus Torvalds authored Feb 16, 2017

Pull media fix from Mauro Carvalho Chehab:
 "A regression fix that makes the Siano driver to work again after the
  CONFIG_VMAP_STACK change"

* tag 'media/v4.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] siano: make it work again with CONFIG_VMAP_STACK

4695daef

vfs: fix uninitialized flags in splice_to_pipe() · 5a81e6a1

Miklos Szeredi authored Feb 16, 2017

Flags (PIPE_BUF_FLAG_PACKET, PIPE_BUF_FLAG_GIFT) could remain on the
unused part of the pipe ring buffer.  Previously splice_to_pipe() left
the flags value alone, which could result in incorrect behavior.

Uninitialized flags appears to have been there from the introduction of
the splice syscall.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org> # 2.6.17+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

5a81e6a1

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · 58f6eaee

Linus Torvalds authored Feb 16, 2017

Pull fuse fixes from Miklos Szeredi:
 "Fix a use after free bug introduced in 4.2 and using an uninitialized
  value introduced in 4.9"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: fix uninitialized flags in pipe_buffer
  fuse: fix use after free issue in fuse_dev_do_read()

58f6eaee

Merge tag 'pci-v4.10-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · aa6fba55

Linus Torvalds authored Feb 16, 2017

Pull PCI fix from Bjorn Helgaas:
 "Add back pcie_pme_remove() so we free the IRQ when removing PCIe port
  devices; previously the leaked IRQ caused an MSI BUG_ON"

* tag 'pci-v4.10-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI/PME: Restore pcie_pme_driver.remove

aa6fba55

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 3c7a9f32

Linus Torvalds authored Feb 16, 2017

Pull networking fixes from David Miller:

 1) In order to avoid problems in the future, make cgroup bpf overriding
    explicit using BPF_F_ALLOW_OVERRIDE. From Alexei Staovoitov.

 2) LLC sets skb->sk without proper skb->destructor and this explodes,
    fix from Eric Dumazet.

 3) Make sure when we have an ipv4 mapped source address, the
    destination is either also an ipv4 mapped address or
    ipv6_addr_any(). Fix from Jonathan T. Leighton.

 4) Avoid packet loss in fec driver by programming the multicast filter
    more intelligently. From Rui Sousa.

 5) Handle multiple threads invoking fanout_add(), fix from Eric
    Dumazet.

 6) Since we can invoke the TCP input path in process context, without
    BH being disabled, we have to accomodate that in the locking of the
    TCP probe. Also from Eric Dumazet.

 7) Fix erroneous emission of NETEVENT_DELAY_PROBE_TIME_UPDATE when we
    aren't even updating that sysctl value. From Marcus Huewe.

 8) Fix endian bugs in ibmvnic driver, from Thomas Falcon.

[ This is the second version of the pull that reverts the nested
  rhashtable changes that looked a bit too scary for this late in the
  release  - Linus ]

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
  rhashtable: Revert nested table changes.
  ibmvnic: Fix endian errors in error reporting output
  ibmvnic: Fix endian error when requesting device capabilities
  net: neigh: Fix netevent NETEVENT_DELAY_PROBE_TIME_UPDATE notification
  net: xilinx_emaclite: fix freezes due to unordered I/O
  net: xilinx_emaclite: fix receive buffer overflow
  bpf: kernel header files need to be copied into the tools directory
  tcp: tcp_probe: use spin_lock_bh()
  uapi: fix linux/if_pppol2tp.h userspace compilation errors
  packet: fix races in fanout_add()
  ibmvnic: Fix initial MTU settings
  net: ethernet: ti: cpsw: fix cpsw assignment in resume
  kcm: fix a null pointer dereference in kcm_sendmsg()
  net: fec: fix multicast filtering hardware setup
  ipv6: Handle IPv4-mapped src to in6addr_any dst.
  ipv6: Inhibit IPv4-mapped src address on the wire.
  net/mlx5e: Disable preemption when doing TC statistics upcall
  rhashtable: Add nested tables
  tipc: Fix tipc_sk_reinit race conditions
  gfs2: Use rhashtable walk interface in glock_hash_walk
  ...

3c7a9f32

fuse: fix uninitialized flags in pipe_buffer · 84588a93

Miklos Szeredi authored Feb 16, 2017

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: d82718e3 ("fuse_dev_splice_read(): switch to add_to_pipe()")
Cc: <stable@vger.kernel.org> # 4.9+

84588a93

ixgbe: Don't bother clearing buffer memory for descriptor rings · ffed21bc

Alexander Duyck authored Jan 17, 2017

This patch makes it so that we don't need to bother with clearing the
memory out for the descriptor rings. The general idea is to only free
buffers associated with buffers in use which are located between the
next_to_clean and next_to_use or next_to_alloc values. Everything outside
of those regions can be safely ignored since they should have no buffers
associated with them.

The advantage to doing things this way is that is should speed up bring-up
and tear-down of the rings. Specifically we can avoid the 512 or more
cycles required to memset the rings in tear-down. In the bring-up phase we
then clear the memory as a part of initialization. The general idea is
that the clearing in initialization can act as a prefetch of sorts for the
buffer info structures so they are in the local CPU when we go to populate
them. This should help to improve overall time needed to perform a
suspend/resume.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ffed21bc

ixgbe: Add support for build_skb · 6f429223

Alexander Duyck authored Jan 17, 2017

This patch adds build_skb support to the Rx path.  There are several
advantages to this change.

1.  It avoids the memcpy and skb->head allocation for small packets which
    improves performance by about 5% in my tests.
2.  It avoids the memcpy, skb->head allocation, and eth_get_headlen
    for larger packets improving performance by about 10% in my tests.
3.  For VXLAN packets it allows the full header to be in skb->data which
    improves the performance by as much as 30% in some of my tests.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

6f429223

ixgbe: Add private flag to control buffer mode · 2ccdf26f

Alexander Duyck authored Jan 17, 2017

Since there are potential drawbacks to the new Rx allocation approach I
thought it best to add a "chicken bit" so that we can turn the feature off
if in the event that a problem is found.

It also provides a means of validating the legacy Rx path in the event that
we are forced to fall back.  At some point in the future when we are
convinced we don't need it anymore we might be able to drop the legacy-rx
flag.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

2ccdf26f

ixgbe: Add support for padding packet · 2de6aa3a

Alexander Duyck authored Jan 17, 2017

This patch adds support for providing a buffer with headroom and tailroom
to allow for shared info, NET_SKB_PAD, and NET_IP_ALIGN. With this
combined with the DMA changes we can start using build_skb to build frames
around an incoming Rx buffer instead of having to memcpy the headers.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

2de6aa3a

ixgbe: Break out Rx buffer page management · 3fd21876

Alexander Duyck authored Jan 17, 2017

We are going to be expanding the number of Rx paths in the driver.  Instead
of duplicating all that code I am pulling it apart into separate functions
so that we don't have so much code duplication.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

3fd21876

ixgbe: Use length to determine if descriptor is done · c3630cc4

Alexander Duyck authored Jan 17, 2017

This change makes it so that we use the length of the packet instead of the
DD status bit to determine if a new descriptor is ready to be processed.
The obvious advantage is that it cuts down on reads as we don't really even
need the DD bit if going from a 0 to a non-zero value on size is enough to
inform us that the packet has been completed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

c3630cc4