Commits · 1c626cf472fce757a69a4c70f038867907b5b808 · Kirill Smelkov / linux

29 Sep, 2015 4 commits

brcmfmac: only call brcmf_cfg80211_detach() when attach was successful · 1c626cf4

Arend van Spriel authored Aug 26, 2015

In brcmf_bus_start() the function brcmf_cfg80211_attach() is called which
may fail. If this happens we should not call brcmf_cfg80211_detach() in
the failure path as it will result in NULL pointer dereference:

  brcmf_fweh_activate_events: Set event_msgs error (-5)
  brcmf_bus_start: failed: -5
  brcmf_sdio_firmware_callback: dongle is not responding
  BUG: unable to handle kernel NULL pointer dereference at 0000000000000068
  IP: [<ffffffff811e8f08>] kernfs_find_ns+0x18/0xd0
  PGD 0
  Oops: 0000 [#1] SMP
  Modules linked in: brcmfmac(O) brcmutil(O) cfg80211 auth_rpcgss
  CPU: 1 PID: 45 Comm: kworker/1:1 Tainted: G           O
  Hardware name: Dell Inc. Latitude E6410/07XJP9, BIOS A07 02/15/2011
  Workqueue: events request_firmware_work_func
  task: ffff880036c09ac0 ti: ffff880036dd4000 task.ti: ffff880036dd4000
  RIP: 0010:[<ffffffff811e8f08>]  [<ffffffff811e8f08>] kernfs_find_ns+0x18/0xd0
  RSP: 0018:ffff880036dd7a28  EFLAGS: 00010246
  RAX: ffff880036c09ac0 RBX: 0000000000000000 RCX: 000000007fffffff
  RDX: 0000000000000000 RSI: ffffffff816578b9 RDI: 0000000000000000
  RBP: ffff880036dd7a48 R08: 0000000000000000 R09: ffff880036c0b340
  R10: 00000000000002ec R11: ffff880036dd7b08 R12: ffffffff816578b9
  R13: 0000000000000000 R14: ffffffff816578b9 R15: ffff8800c6c87000
  FS:  0000000000000000(0000) GS:ffff88012bc40000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000068 CR3: 0000000001a0b000 CR4: 00000000000006e0
  Stack:
   0000000000000000 ffffffff816578b9 0000000000000000 ffff8800c0d003c8
   ffff880036dd7a78 ffffffff811e8ff5 0000000ffffffff1 ffffffff81a9b060
   ffff8800c789f880 ffff8800c0d00000 ffff880036dd7a98 ffffffff811ebe0d
  Call Trace:
   [<ffffffff811e8ff5>] kernfs_find_and_get_ns+0x35/0x60
   [<ffffffff811ebe0d>] sysfs_unmerge_group+0x1d/0x60
   [<ffffffff81404ef2>] dpm_sysfs_remove+0x22/0x60
   [<ffffffff813f9db9>] device_del+0x49/0x240
   [<ffffffff815da768>] rfkill_unregister+0x58/0xc0
   [<ffffffffa06bd91b>] wiphy_unregister+0xab/0x2f0 [cfg80211]
   [<ffffffffa0742fe3>] brcmf_cfg80211_detach+0x23/0x50 [brcmfmac]
   [<ffffffffa074d986>] brcmf_detach+0x86/0xe0 [brcmfmac]
   [<ffffffffa0757de8>] brcmf_sdio_remove+0x48/0x120 [brcmfmac]
   [<ffffffffa0758ed9>] brcmf_sdiod_remove+0x29/0xd0 [brcmfmac]
   [<ffffffffa0759031>] brcmf_ops_sdio_remove+0xb1/0x110 [brcmfmac]
   [<ffffffffa001c267>] sdio_bus_remove+0x37/0x100 [mmc_core]
   [<ffffffff813fe026>] __device_release_driver+0x96/0x130
   [<ffffffff813fe0e3>] device_release_driver+0x23/0x30
   [<ffffffffa0754bc8>] brcmf_sdio_firmware_callback+0x2a8/0x5d0 [brcmfmac]
   [<ffffffffa074deaf>] brcmf_fw_request_nvram_done+0x15f/0x5e0 [brcmfmac]
   [<ffffffff8140142f>] ? devres_add+0x3f/0x50
   [<ffffffff810642b5>] ? usermodehelper_read_unlock+0x15/0x20
   [<ffffffff81400000>] ? platform_match+0x70/0xa0
   [<ffffffff8140f400>] request_firmware_work_func+0x30/0x60
   [<ffffffff8106828c>] process_one_work+0x14c/0x3d0
   [<ffffffff8106862a>] worker_thread+0x11a/0x450
   [<ffffffff81068510>] ? process_one_work+0x3d0/0x3d0
   [<ffffffff8106d692>] kthread+0xd2/0xf0
   [<ffffffff8106d5c0>] ? kthread_create_on_node+0x180/0x180
   [<ffffffff815ed35f>] ret_from_fork+0x3f/0x70
   [<ffffffff8106d5c0>] ? kthread_create_on_node+0x180/0x180
  Code: e9 40 fe ff ff 48 89 d8 eb 87 66 0f 1f 84 00 00 00 00 00 66 66 66 66
	90 55 48 89 e5 41 56 49 89 f6 41 55 49 89 d5 31 d2 41 54 53 <0f> b7
	47 68 48 8b 5f 48 66 c1 e8 05 83 e0 01 4d 85 ed 0f b6 c8
  RIP  [<ffffffff811e8f08>] kernfs_find_ns+0x18/0xd0
   RSP <ffff880036dd7a28>
  CR2: 0000000000000068
  ---[ end trace 87d6ec0d3fe46740 ]---
Reported-by: Daniel (Deognyoun) Kim <dekim@broadcom.com>
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Franky (Zhenhui) Lin <frankyl@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

1c626cf4

brcmfmac: change parameters for brcmf_remove_interface() · ee6e3a34

Arend van Spriel authored Aug 26, 2015

Just pass the interface to be removed, ie. the struct brcmf_if instance.
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Franky (Zhenhui) Lin <frankyl@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

ee6e3a34

brcmfmac: make brcmf_proto_hdrpull() return struct brcmf_if instance · 796cfb65

Arend van Spriel authored Aug 26, 2015

Avoid spreading the ifidx in the driver, but have it return the
struct brcmf_if instance.
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Franky (Zhenhui) Lin <frankyl@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

796cfb65

brcmfmac: consolidate ifp lookup in driver core · 75effb03

Arend van Spriel authored Aug 26, 2015

In rx path the firmware provide an interface index which is used to
map to a struct brcmf_if instance. However, this involves some trick
that is done in two places. This is changed by having driver core
providing brcmf_get_ifp() function.
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Franky (Zhenhui) Lin <frankyl@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

75effb03

26 Sep, 2015 1 commit

Merge tag 'iwlwifi-next-for-kalle-2015-09-21' of... · 8f6c5b07

Kalle Valo authored Sep 26, 2015

Merge tag 'iwlwifi-next-for-kalle-2015-09-21' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next

* some debugfs improvements;
* fix signedness in beacon statistics;
* deinline some functions to reduce size when device tracing is enabled;
* filter beacons out in AP mode when no stations are associated;
* deprecate firmwares version -12;
* fix a runtime PM vs. legacy suspend race;
* one-liner fix for a ToF bug;
* clean-ups in the rx code;
* small debugging improvement;
* fix WoWLAN with new firmware versions;

8f6c5b07

21 Sep, 2015 12 commits

iwlwifi: mvm: add debug print for d0i3 exit indication · 7c014e35

Eliad Peller authored Sep 06, 2015

In order to verify d0i3 flow, add debug print to indicate
d0i3 exit was completed (right after tx was re-enabled),
along with the wakeup reasons.
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

7c014e35

iwlwifi: mvm: configure wowlan configuration only if connected · 183edd84

Eliad Peller authored Sep 01, 2015

Recent fw version added assert to make sure wowlan configuration
is configured only when a station is connected.

Change the driver behavior to pass this configuration only
if we indeed have ap station id (i.e. connected).
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

183edd84

iwlwifi: mvm: move RX API into its own file · ee6dbb29

Johannes Berg authored Sep 02, 2015

The RX API is currently mixed up into the general fw-api.h
file, but we're going to need to extend it significantly in
the future, so move it to its own file.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

ee6dbb29

iwlwifi: mvm: remove some unused defines from RX API · 2df5328e

Johannes Berg authored Sep 02, 2015

Remove some unused values from the RX API; these were used
with older firmware API that didn't have the RX energy API,
support for which was removed a long time ago.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

2df5328e

iwlwifi: mvm: remove pointless cfg_phy_cnt length check · abfd794c

Johannes Berg authored Sep 02, 2015

Since the driver can never configure the data here, this field
will always be reported as 0 by the firmware. Even if this was
not the case, however, it wouldn't matter since the extra data
would be added beyond the end of the phy_info structure we use
in the driver, so wouldn't harm anything in this code either.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

abfd794c

iwlwifi: mvm: remove useless debug message from RX · 7f89a58e

Johannes Berg authored Sep 02, 2015

This message is useless - it's in the good case that always
happens so enabling it doesn't really help. Just remove it.
There are other ways to debug this (e.g. tracing) so there's
no need to add a message in the bad case.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

7f89a58e

iwlwifi: mvm: make sure AP is operating for ToF · da583fdf

Johannes Berg authored Aug 26, 2015

It's possible for an AP interface to be UP but not actually
operating (i.e. not beaconing etc.) - in this case it can't
actually do ToF, so check for it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

da583fdf

iwlwifi: mvm: remove IWL_UCODE_TLV_API_STATS_V10 TLV flag · 38d5f66f

Emmanuel Grumbach authored Aug 26, 2015

This flag is set in all supported firmwares.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>

38d5f66f

iwlwifi: mvm: remove IWL_UCODE_TLV_API_ASYNC_DTM TLV flag · 9d43fa4b

Emmanuel Grumbach authored Aug 26, 2015

This flag is set in all supported firmwares.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>

9d43fa4b

iwlwifi: mvm: remove IWL_UCODE_TLV_API_SINGLE_SCAN_EBS TLV flag · eb991c5e

Emmanuel Grumbach authored Aug 26, 2015

All the supported firmwares have this flag set.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>

eb991c5e

iwlwifi: mvm: remove IWL_UCODE_TLV_API_TX_POWER_DEV TLV flag · 4d31eed1

Emmanuel Grumbach authored Aug 26, 2015

All the supported firmwares use the new API.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>

4d31eed1

iwlwifi: mvm: remove IWL_UCODE_TLV_API_HDC_PHASE_0 TLV flag · 89ced540

Emmanuel Grumbach authored Aug 26, 2015

All the supported firwmares support the new API.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

89ced540

18 Sep, 2015 23 commits

ath9k_htc: introduce support for different fw versions · e904cf6f

Oleksij Rempel authored Sep 06, 2015

Current kernel support only one fw name with theoretically only one
fw version located in “firmware/htc_[9271|7010].fw”. Which is ok so far we
have only one fw version (1.3). After we realised new fw 1.4, we faced
compatibility problem which was decided to solve by firmware name and
location:
- new firmware is located now in
	firmware/ath9k_htc/htc_[9271|7010]-1.4.0.fw
- old version 1.3 should be on old place, so old kernel have no issues
	with it.
- new kernels including this patch should be able to try different
	supported (min..max) fw version.
- new kernel should be able to support old fw location too. At least for
	now.

At same time this patch will add new module option which should allow user
to play with development  fw version without replacing stable one. If user
will set “ath9k_htc use_dev_fw=1” module will try to find
firmware/ath9k_htc/htc_[9271|7010]-1.dev.0.fw first and if it fails, use
stable version: for example...1.4.0.fw.
Signed-off-by: Oleksij Rempel <linux@rempel-privat.de>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

e904cf6f

sch_dsmark: improve memory locality · 47bbbb30

Eric Dumazet authored Sep 17, 2015

Memory placement in sch_dsmark is silly : Better place mask/value
in the same cache line.

Also, we can embed small arrays in the first cache line and
remove a potential cache miss.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

47bbbb30

Merge branch 'bcmgenet-irq-coalesce' · 25354001

David S. Miller authored Sep 17, 2015

Florian Fainelli says:

====================
net: bcmgenet: Interrupt coalescing

This patch series adds support for interrupt coalescing for GENET
adapters.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

25354001

net: bcmgenet: Implement RX coalescing control knobs · 4a29645b

Florian Fainelli authored Sep 16, 2015

Add support for the ethtool rx-frames coalescing parameter which allows
defining the number of RX interrupts per frames received. The RDMA
engine supports a configurable timeout with a resolution of
approximately 8.192 us.

We can no longer enable the BDONE/PDONE interrupts as those would
fire for each packet/buffer received, which would defeat the MBDONE
interrupt purpose. The MBDONE interrupt is guaranteed to correspond to a
PDONE/BDONE interrupt when the threshold is set to 1.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4a29645b

net: bcmgenet: Implement TX coalescing control knobs · 2f913070

Florian Fainelli authored Sep 16, 2015

Configuring the ethtool tx-frames property, which translates into N
packets before a TX interrupt is the simplest configuration scheme
because it requires no locking neither at the softare nor hardware
level, and is completely indepedent from the link speed. Since ethtool
does not allow per-tx queue coalescing parameters, we apply the same
setting to any transmit queue.

We can no longer enable the BDONE/PDONE interrupts as those would fire
for each packet/buffer received, which would defeat the MBDONE interrupt
purpose. The MBDONE interrupt is guaranteed to correspond to a
PDONE/BDONE interrupt when the threshold is set to 1, but offers
interrupt coalescing when the value is > 1.

Since the HW is configured to generate an interrupt when the ring
becomes emtpy, we have to deny any timeout/timer settings coming from
user-space to indicate we can only generate an interrupt very <N>
packets.

While we are at it, fix the DMA_INTR_THRESHOLD_MASK value which was off
by one bit (0xff vs. 0x1ff).
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f913070

lan78xx: Remove not defined MAC_CR_GMII_EN_ bit from MAC_CR. · 9110fe4a

Woojung.Huh@microchip.com authored Sep 16, 2015

Remove not defined MAC_CR_GMII_EN_ bit from MAC_CR.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9110fe4a

lan78xx: Create lan78xx_get_mdix_status() and lan78xx_set_mdix_status() for MDIX control. · 758c5c11

Woojung.Huh@microchip.com authored Sep 16, 2015

Create lan78xx_get_mdix_status() and lan78xx_set_mdix_status() for MDIX control.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

758c5c11

lan78xx: Remove phy defines in lan78xx.h and use defines in include/linux/microchipphy.h · bdfba55e

Woojung.Huh@microchip.com authored Sep 16, 2015

Remove phy defines in lan78xx.h and use defines in include/linux/microchipphy.h.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bdfba55e

lan78xx: Update to use phylib instead of mii_if_info. · ce85e13a

Woojung.Huh@microchip.com authored Sep 16, 2015

Update to use phylib instead of mii_if_info.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ce85e13a

lan78xx: Add PHYLIB and MICROCHIP_PHY as default config. · 05fe68c0

Woojung.Huh@microchip.com authored Sep 16, 2015

Add PHYLIB and MICROCHIP_PHY as default configuration for lan78xx.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

05fe68c0

lan78xx: Check device ready bit (PMT_CTL_READY_) after reset the PHY · 6c595b03

Woojung.Huh@microchip.com authored Sep 16, 2015

Check device ready bit (PMT_CTL_READY_) after reset the PHY.
Device may not be ready even if PHY_RST_ is cleared depends on configuration.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6c595b03

net: Initialize table in fib result · bde6f9de

David Ahern authored Sep 16, 2015

Sergey, Richard and Fabio reported an oops in ip_route_input_noref. e.g., from Richard:

[    0.877040] BUG: unable to handle kernel NULL pointer dereference at 0000000000000056
[    0.877597] IP: [<ffffffff8155b5e2>] ip_route_input_noref+0x1a2/0xb00
[    0.877597] PGD 3fa14067 PUD 3fa6e067 PMD 0
[    0.877597] Oops: 0000 [#1] SMP
[    0.877597] Modules linked in: virtio_net virtio_pci virtio_ring virtio
[    0.877597] CPU: 1 PID: 119 Comm: ifconfig Not tainted 4.2.0+ #1
[    0.877597] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    0.877597] task: ffff88003fab0bc0 ti: ffff88003faa8000 task.ti: ffff88003faa8000
[    0.877597] RIP: 0010:[<ffffffff8155b5e2>]  [<ffffffff8155b5e2>] ip_route_input_noref+0x1a2/0xb00
[    0.877597] RSP: 0018:ffff88003ed03ba0  EFLAGS: 00010202
[    0.877597] RAX: 0000000000000046 RBX: 00000000ffffff8f RCX: 0000000000000020
[    0.877597] RDX: ffff88003fab50b8 RSI: 0000000000000200 RDI: ffffffff8152b4b8
[    0.877597] RBP: ffff88003ed03c50 R08: 0000000000000000 R09: 0000000000000000
[    0.877597] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003fab6f00
[    0.877597] R13: ffff88003fab5000 R14: 0000000000000000 R15: ffffffff81cb5600
[    0.877597] FS:  00007f6de5751700(0000) GS:ffff88003ed00000(0000) knlGS:0000000000000000
[    0.877597] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.877597] CR2: 0000000000000056 CR3: 000000003fa6d000 CR4: 00000000000006e0
[    0.877597] Stack:
[    0.877597]  0000000000000000 0000000000000046 ffff88003fffa600 ffff88003ed03be0
[    0.877597]  ffff88003f9e2c00 697da8c0017da8c0 ffff880000000000 000000000007fd00
[    0.877597]  0000000000000000 0000000000000046 0000000000000000 0000000400000000
[    0.877597] Call Trace:
[    0.877597]  <IRQ>
[    0.877597]  [<ffffffff812bfa1f>] ? cpumask_next_and+0x2f/0x40
[    0.877597]  [<ffffffff8158e13c>] arp_process+0x39c/0x690
[    0.877597]  [<ffffffff8158e57e>] arp_rcv+0x13e/0x170
[    0.877597]  [<ffffffff8151feec>] __netif_receive_skb_core+0x60c/0xa00
[    0.877597]  [<ffffffff81515795>] ? __build_skb+0x25/0x100
[    0.877597]  [<ffffffff81515795>] ? __build_skb+0x25/0x100
[    0.877597]  [<ffffffff81521ff6>] __netif_receive_skb+0x16/0x70
[    0.877597]  [<ffffffff81522078>] netif_receive_skb_internal+0x28/0x90
[    0.877597]  [<ffffffff8152288f>] napi_gro_receive+0x7f/0xd0
[    0.877597]  [<ffffffffa0017906>] virtnet_receive+0x256/0x910 [virtio_net]
[    0.877597]  [<ffffffffa0017fd8>] virtnet_poll+0x18/0x80 [virtio_net]
[    0.877597]  [<ffffffff815234cd>] net_rx_action+0x1dd/0x2f0
[    0.877597]  [<ffffffff81053228>] __do_softirq+0x98/0x260
[    0.877597]  [<ffffffff8164969c>] do_softirq_own_stack+0x1c/0x30

The root cause is use of res.table uninitialized.

Thanks to Nikolay for noticing the uninitialized use amongst the maze of
gotos.

As Nikolay pointed out the second initialization is not required to fix
the oops, but rather to fix a related problem where a valid lookup should
be invalidated before creating the rth entry.

Fixes: b7503e0c ("net: Add FIB table id to rtable")
Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Reported-by: Richard Alpe <richard.alpe@ericsson.com>
Reported-by: Fabio Estevam <festevam@gmail.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bde6f9de

Merge branch 'bpf_avoid_clone' · 41a9802f

David S. Miller authored Sep 17, 2015

Alexei Starovoitov says:

====================
bpf: performance improvements

v1->v2: dropped redundant iff_up check in patch 2

At plumbers we discussed different options on how to get rid of skb_clone
from bpf_clone_redirect(), the patch 2 implements the best option.
Patch 1 adds 'integrated exts' to cls_bpf to improve performance by
combining simple actions into bpf classifier.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

41a9802f

bpf: add bpf_redirect() helper · 27b29f63

Alexei Starovoitov authored Sep 15, 2015

Existing bpf_clone_redirect() helper clones skb before redirecting
it to RX or TX of destination netdev.
Introduce bpf_redirect() helper that does that without cloning.

Benchmarked with two hosts using 10G ixgbe NICs.
One host is doing line rate pktgen.
Another host is configured as:
$ tc qdisc add dev $dev ingress
$ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
   action bpf run object-file tcbpf1_kern.o section clone_redirect_xmit drop
so it receives the packet on $dev and immediately xmits it on $dev + 1
The section 'clone_redirect_xmit' in tcbpf1_kern.o file has the program
that does bpf_clone_redirect() and performance is 2.0 Mpps

$ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
   action bpf run object-file tcbpf1_kern.o section redirect_xmit drop
which is using bpf_redirect() - 2.4 Mpps

and using cls_bpf with integrated actions as:
$ tc filter add dev $dev root pref 10 \
  bpf run object-file tcbpf1_kern.o section redirect_xmit integ_act classid 1
performance is 2.5 Mpps

To summarize:
u32+act_bpf using clone_redirect - 2.0 Mpps
u32+act_bpf using redirect - 2.4 Mpps
cls_bpf using redirect - 2.5 Mpps

For comparison linux bridge in this setup is doing 2.1 Mpps
and ixgbe rx + drop in ip_rcv - 7.8 Mpps
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

27b29f63

cls_bpf: introduce integrated actions · 045efa82

Daniel Borkmann authored Sep 15, 2015

Often cls_bpf classifier is used with single action drop attached.
Optimize this use case and let cls_bpf return both classid and action.
For backwards compatibility reasons enable this feature under
TCA_BPF_FLAG_ACT_DIRECT flag.

Then more interesting programs like the following are easier to write:
int cls_bpf_prog(struct __sk_buff *skb)
{
  /* classify arp, ip, ipv6 into different traffic classes
   * and drop all other packets
   */
  switch (skb->protocol) {
  case htons(ETH_P_ARP):
    skb->tc_classid = 1;
    break;
  case htons(ETH_P_IP):
    skb->tc_classid = 2;
    break;
  case htons(ETH_P_IPV6):
    skb->tc_classid = 3;
    break;
  default:
    return TC_ACT_SHOT;
  }

  return TC_ACT_OK;
}

Joint work with Daniel Borkmann.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

045efa82

net: only check perm protocol when register proto · f6c53334

Junwei Zhang authored Sep 18, 2015

The permanent protocol nodes are at the head of the list,
So only need check all these nodes.

No matter the new node is permanent or not,
insert the new node after the last permanent protocol node,

If the new node conflicts with existing permanent node,
return error.
Signed-off-by: Martin Zhang <martinbj2008@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6c53334

bonding: use l4 hash if available · 4b1b865e

Eric Dumazet authored Sep 15, 2015

If skb carries a l4 hash, no need to perform a flow dissection.

Performance is slightly better :

lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.39012e+06
lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.39393e+06
lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.39988e+06

After patch :

lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.43579e+06
lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.44304e+06
lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.44312e+06
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4b1b865e

tcp: provide skb->hash to synack packets · 58d607d3

Eric Dumazet authored Sep 15, 2015

In commit b73c3d0e ("net: Save TX flow hash in sock and set in skbuf
on xmit"), Tom provided a l4 hash to most outgoing TCP packets.

We'd like to provide one as well for SYNACK packets, so that all packets
of a given flow share same txhash, to later enable bonding driver to
also use skb->hash to perform slave selection.

Note that a SYNACK retransmit shuffles the tx hash, as Tom did
in commit 265f94ff ("net: Recompute sk_txhash on negative routing
advice") for established sockets.

This has nice effect making TCP flows resilient to some kind of black
holes, even at connection establish phase.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

58d607d3

Merge branch 'nf_hook_netns' · bbe83731

David S. Miller authored Sep 17, 2015

Eric W. Biederman says:

====================
Passing net through the netfilter hooks

My primary goal with this patchset and it's follow ups is to cleanup the
network routing paths so that we do not look at the output device to
derive the network namespace.  My plan is to pass the network namespace
of the transmitting socket through the output path, to replace code that
looks at the output network device today.  Once that is done we can have
routes with output devices outside of the current network namespace.
Which should allow reception and transmission of packets in network
namespaces to be as fast as normal packet reception and transmission
with early demux disabled, because it will same code path.

Once skb_dst(skb)->dev is a little better under control I think it will
also be possible to use rcu to cleanup the ancient hack that sets
dst->dev to loopback_dev when a network device is removed.

The work to get there is a series of code cleanups.  I am starting with
passing net into the netfilter hooks and into the functions that are
called after the netfilter hooks.  This removes from netfilter the
need to guess which network namespace it is working on.

To get there I perform a series of minor prep patches so the big changes
at the end are possible to audit without getting lost in the noise.  In
particular I have a lot of patches computing net into a local variable
and then using it through out the function.

So this patchset encompases removing dead code, sorting out the _sk
functions that were added last time someone pushed a prototype change
through the post netfilter functions.  Cleaning up individual functions
use of the network namespace.  Passing net into the netfilter hooks.
Passing net into the post netfilter functions.  Using state->net in
the netfilter code where it is available and trivially usable.

Pablo, Dave I don't know whose tree this makes more sense to go
through.  I am assuming at least initially Pablos as netfilter is
involved.  From what I have seen there will be a lot of back and forth
between the netfilter code paths and the routing code paths.

The patches are also available (against 4.3-rc1) at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/net-next.git master
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

bbe83731

netfilter: Add blank lines in callers of netfilter hooks · be10de0a

Eric W. Biederman authored Sep 17, 2015

In code review it was noticed that I had failed to add some blank lines
in places where they are customarily used.  Taking a second look at the
code I have to agree blank lines would be nice so I have added them
here.
Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be10de0a

netfilter: Pass net into okfn · 0c4b51f0

Eric W. Biederman authored Sep 15, 2015

This is immediately motivated by the bridge code that chains functions that
call into netfilter.  Without passing net into the okfns the bridge code would
need to guess about the best expression for the network namespace to process
packets in.

As net is frequently one of the first things computed in continuation functions
after netfilter has done it's job passing in the desired network namespace is in
many cases a code simplification.

To support this change the function dst_output_okfn is introduced to
simplify passing dst_output as an okfn.  For the moment dst_output_okfn
just silently drops the struct net.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0c4b51f0

netfilter: Use nf_hook_state.net · 9dff2c96

Eric W. Biederman authored Sep 15, 2015

Instead of saying "net = dev_net(state->in?state->in:state->out)"
just say "state->net".  As that information is now availabe,
much less confusing and much less error prone.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9dff2c96

netfilter: Pass struct net into the netfilter hooks · 29a26a56

Eric W. Biederman authored Sep 15, 2015

Pass a network namespace parameter into the netfilter hooks.  At the
call site of the netfilter hooks the path a packet is taking through
the network stack is well known which allows the network namespace to
be easily and reliabily.

This allows the replacement of magic code like
"dev_net(state->in?:state->out)" that appears at the start of most
netfilter hooks with "state->net".

In almost all cases the network namespace passed in is derived
from the first network device passed in, guaranteeing those
paths will not see any changes in practice.

The exceptions are:
xfrm/xfrm_output.c:xfrm_output_resume()         xs_net(skb_dst(skb)->xfrm)
ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont()      ip_vs_conn_net(cp)
ipvs/ip_vs_xmit.c:ip_vs_send_or_cont()          ip_vs_conn_net(cp)
ipv4/raw.c:raw_send_hdrinc()                    sock_net(sk)
ipv6/ip6_output.c:ip6_xmit()			sock_net(sk)
ipv6/ndisc.c:ndisc_send_skb()                   dev_net(skb->dev) not dev_net(dst->dev)
ipv6/raw.c:raw6_send_hdrinc()                   sock_net(sk)
br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev

In all cases these exceptions seem to be a better expression for the
network namespace the packet is being processed in then the historic
"dev_net(in?in:out)".  I am documenting them in case something odd
pops up and someone starts trying to track down what happened.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

29a26a56