- 14 Apr, 2014 26 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller authored
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains three Netfilter fixes for your net tree, they are: * Fix missing generation sequence initialization which results in a splat if lockdep is enabled, it was introduced in the recent works to improve nf_conntrack scalability, from Andrey Vagin. * Don't flush the GRE keymap list in nf_conntrack when the pptp helper is disabled otherwise this crashes due to a double release, from Andrey Vagin. * Fix nf_tables cmp fast in big endian, from Patrick McHardy. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vlad Yasevich authored
Sometimes, when the packet arrives at skb_mac_gso_segment() its skb->mac_len already accounts for some of the mac lenght headers in the packet. This seems to happen when forwarding through and OpenSSL tunnel. When we start looking for any vlan headers in skb_network_protocol() we seem to ignore any of the already known mac headers and start with an ETH_HLEN. This results in an incorrect offset, dropped TSO frames and general slowness of the connection. We can start counting from the known skb->mac_len and return at least that much if all mac level headers are known and accounted for. Fixes: 53d6471c (net: Account for all vlan headers in skb_mac_gso_segment) CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Daniel Borkman <dborkman@redhat.com> Tested-by: Martin Filip <nexus+kernel@smoula.net> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
This reverts commit ef2820a7 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer") as it introduced a serious performance regression on SCTP over IPv4 and IPv6, though a not as dramatic on the latter. Measurements are on 10Gbit/s with ixgbe NICs. Current state: [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64 Time: Fri, 11 Apr 2014 17:56:21 GMT Connecting to host 192.168.241.3, port 5201 Cookie: Lab200slot2.1397238981.812898.548918 [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec [ 4] 16.79-17.82 sec 5.94 MBytes 48.7 Mbits/sec (etc) [root@Lab200slot2 ~]# iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64 Time: Fri, 11 Apr 2014 19:08:41 GMT Connecting to host 2001:db8:0:f101::1, port 5201 Cookie: Lab200slot2.1397243321.714295.2b3f7c [ 4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201 Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 169 MBytes 1.42 Gbits/sec [ 4] 1.00-2.00 sec 201 MBytes 1.69 Gbits/sec [ 4] 2.00-3.00 sec 188 MBytes 1.58 Gbits/sec [ 4] 3.00-4.00 sec 174 MBytes 1.46 Gbits/sec [ 4] 4.00-5.00 sec 165 MBytes 1.39 Gbits/sec [ 4] 5.00-6.00 sec 199 MBytes 1.67 Gbits/sec [ 4] 6.00-7.00 sec 163 MBytes 1.36 Gbits/sec [ 4] 7.00-8.00 sec 174 MBytes 1.46 Gbits/sec [ 4] 8.00-9.00 sec 193 MBytes 1.62 Gbits/sec [ 4] 9.00-10.00 sec 196 MBytes 1.65 Gbits/sec [ 4] 10.00-11.00 sec 157 MBytes 1.31 Gbits/sec [ 4] 11.00-12.00 sec 175 MBytes 1.47 Gbits/sec [ 4] 12.00-13.00 sec 192 MBytes 1.61 Gbits/sec [ 4] 13.00-14.00 sec 199 MBytes 1.67 Gbits/sec (etc) After patch: [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64 Time: Mon, 14 Apr 2014 16:40:48 GMT Connecting to host 192.168.240.3, port 5201 Cookie: Lab200slot2.1397493648.413274.65e131 [ 4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 240 MBytes 2.02 Gbits/sec [ 4] 1.00-2.00 sec 239 MBytes 2.01 Gbits/sec [ 4] 2.00-3.00 sec 240 MBytes 2.01 Gbits/sec [ 4] 3.00-4.00 sec 239 MBytes 2.00 Gbits/sec [ 4] 4.00-5.00 sec 245 MBytes 2.05 Gbits/sec [ 4] 5.00-6.00 sec 240 MBytes 2.01 Gbits/sec [ 4] 6.00-7.00 sec 240 MBytes 2.02 Gbits/sec [ 4] 7.00-8.00 sec 239 MBytes 2.01 Gbits/sec With the reverted patch applied, the SCTP/IPv4 performance is back to normal on latest upstream for IPv4 and IPv6 and has same throughput as 3.4.2 test kernel, steady and interval reports are smooth again. Fixes: ef2820a7 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer") Reported-by: Peter Butler <pbutler@sonusnet.com> Reported-by: Dongsheng Song <dongsheng.song@gmail.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Peter Butler <pbutler@sonusnet.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com> Cc: Alexander Sverdlin <alexander.sverdlin@nsn.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Steve Wise authored
Hardware needs the local device mac address to support hw loopback for rdma loopback connections. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
While reviewing seccomp code, we found that BPF_S_ANC_SECCOMP_LD_W has been wrongly decoded by commit a8fc9277 ("sk-filter: Add ability to get socket filter program (v2)") into the opcode BPF_LD|BPF_B|BPF_ABS although it should have been decoded as BPF_LD|BPF_W|BPF_ABS. In practice, this should not have much side-effect though, as such conversion is/was being done through prctl(2) PR_SET_SECCOMP. Reverse operation PR_GET_SECCOMP will only return the current seccomp mode, but not the filter itself. Since the transition to the new BPF infrastructure, it's also not used anymore, so we can simply remove this as it's unreachable. Fixes: a8fc9277 ("sk-filter: Add ability to get socket filter program (v2)") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Linus reports that on 32-bit x86 Chromium throws the following seccomp resp. audit log messages: audit: type=1326 audit(1397359304.356:28108): auid=500 uid=500 gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023 pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0 syscall=172 compat=0 ip=0xb2dd9852 code=0x30000 audit: type=1326 audit(1397359304.356:28109): auid=500 uid=500 gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023 pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0 syscall=5 compat=0 ip=0xb2dd9852 code=0x50000 These audit messages are being triggered via audit_seccomp() through __secure_computing() in seccomp mode (BPF) filter with seccomp return codes 0x30000 (== SECCOMP_RET_TRAP) and 0x50000 (== SECCOMP_RET_ERRNO) during filter runtime. Moreover, Linus reports that x86_64 Chromium seems fine. The underlying issue that explains this is that the implementation of populate_seccomp_data() is wrong. Our seccomp data structure sd that is being shared with user ABI is: struct seccomp_data { int nr; __u32 arch; __u64 instruction_pointer; __u64 args[6]; }; Therefore, a simple cast to 'unsigned long *' for storing the value of the syscall argument via syscall_get_arguments() is just wrong as on 32-bit x86 (or any other 32bit arch), it would result in storing a0-a5 at wrong offsets in args[] member, and thus i) could leak stack memory to user space and ii) tampers with the logic of seccomp BPF programs that read out and check for syscall arguments: syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]); Tested on 32-bit x86 with Google Chrome, unfortunately only via remote test machine through slow ssh X forwarding, but it fixes the issue on my side. So fix it up by storing args in type correct variables, gcc is clever and optimizes the copy away in other cases, e.g. x86_64. Fixes: bd4cf0ed ("net: filter: rework/optimize internal BPF interpreter's instruction set") Reported-and-bisected-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Shahed Shaikh says: ==================== qlcnic: Bug fixes This patch series contains following bug fixes - * Send INIT_NIC_FUNC mailbox command as first mailbox * Fix a panic because of uninitialized delayed_work. * Fix inconsistent calculation of max rings count. * Fix PVID configuration issue. Driver needs to clear older PVID before adding new one. * Fix QLogic application/driver interface by packing vNIC information array. * Fix a crash when user tries to disable SR-IOV while VFs are still assigned to VMs. Please apply to net. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Manish Chopra authored
o While disabling SR-IOV when VFs are assigned to VMs causes host crash so return -EPERM when user request to disable SR-IOV using pci sysfs in case of VFs are assigned to VMs. Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jitendra Kalsaria authored
o Application expect vNIC number as the array index but driver interface return configuration in array index form. o Pack the vNIC information array in the buffer such that application can access it using vNIC number as the array index. Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jitendra Kalsaria authored
Clear older PVID before adding a newer PVID to the eSwicth port Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Shahed Shaikh authored
Do not read max rings count from qlcnic_get_nic_info(). Use driver defined values for 82xx adapters. In case of 83xx adapters, use minimum of firmware provided and driver defined values. Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sucheta Chakraborty authored
o INIT_NIC_FUNC should be first mailbox sent. Sending DCB capability and parameter query commands after that command. Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sucheta Chakraborty authored
o AEN event was being received before initializing delayed_work struct and handlers for it. This was resulting in crash. This patch fixes it. Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Sathya Perla says: ==================== be2net: patch set Patch 1/2 is a v2 of a patch that was submitted earlier (as a part of a different patch-set). v2 incorporates a suggestion given by David Laight for how long to poll for pending TX completions while disabling a device. Patch 2/2 fixes a crash in be_remove()->be_close() path after be2net has aborted an EEH error recovery due to a permanant failure. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kalesh AP authored
In the EEH error recovery path, when a permanent failure occurs, we clean up adapter structure (i.e. destroy queues etc) by calling be_clear() and return PCI_ERS_RESULT_DISCONNECT. After this the stack tries to remove device from bus and calls be_remove() which invokes netdev_unregister()->be_close(). be_close() operating on destroyed queues results in a NULL dereference. This patch fixes this problem by introducing a flag to keep track of the setup state. Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vasundhara Volam authored
be_close() currently waits for a max of 200ms to receive all pending TX compls. This timeout value was roughly calculated based on 10G transmission speeds and the TX queue depth. This timeout may not be enough when the link is operating at lower speeds or in multi-channel/SR-IOV configs with TX-rate limiting setting. It is hard to calculate a "proper timeout value" that works in all configurations. This patch solves this problem by continuing to reap TX completions till the HW is completely silent for a period of 10ms or a HW error is detected. v2: implements the new scheme (as suggested by David Laight) instead of just waiting longer than 200ms for reaping all completions. Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amir Vadai authored
Fix in commit [1] is not sufficient since a deferred VF initialization could happen after pci_enable_sriov() is finished, but before the PF is fully initialized. Need to prevent VFs from initializing till the PF is fully ready and comm channel is operational. [1] - 97989356 "net/mlx4_core: mlx4_init_slave() shouldn't access comm channel before PF is ready" CC: Stuart Hayes <Stuart_Hayes@Dell.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel J Blueman authored
When CONFIG_PM_SLEEP isn't enabled, bnx2_suspend/resume are unused; don't build them when they aren't used. Signed-off-by: Daniel J Blueman <daniel@quora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Francois reported that setting big mtu on loopback device could prevent tcp sessions making progress. We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets. We must limit the IPv6 MTU to (65535 + 40) bytes in theory. Tested: ifconfig lo mtu 70000 netperf -H ::1 Before patch : Throughput : 0.05 Mbits After patch : Throughput : 35484 Mbits Reported-by: Francois WELLENREITER <f.wellenreiter@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Patrick McHardy authored
nft_cmp_fast is used for equality comparisions of size <= 4. For comparisions of size < 4 byte a mask is calculated that is applied to both the data from userspace (during initialization) and the register value (during runtime). Both values are stored using (in effect) memcpy to a memory area that is then interpreted as u32 by nft_cmp_fast. This works fine on little endian since smaller types have the same base address, however on big endian this is not true and the smaller types are interpreted as a big number with trailing zero bytes. The mask therefore must not include the lower bytes, but the higher bytes on big endian. Add a helper function that does a cpu_to_le32 to switch the bytes on big endian. Since we're dealing with a mask of just consequitive bits, this works out fine. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Andrey Vagin authored
[ 251.920788] INFO: trying to register non-static key. [ 251.921386] the code is fine but needs lockdep annotation. [ 251.921386] turning off the locking correctness validator. [ 251.921386] CPU: 2 PID: 15715 Comm: socket_listen Not tainted 3.14.0+ #294 [ 251.921386] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 251.921386] 0000000000000000 000000009d18c210 ffff880075f039b8 ffffffff816b7ecd [ 251.921386] ffffffff822c3b10 ffff880075f039c8 ffffffff816b36f4 ffff880075f03aa0 [ 251.921386] ffffffff810c65ff ffffffff810c4a85 00000000fffffe01 ffffffffa0075172 [ 251.921386] Call Trace: [ 251.921386] [<ffffffff816b7ecd>] dump_stack+0x45/0x56 [ 251.921386] [<ffffffff816b36f4>] register_lock_class.part.24+0x38/0x3c [ 251.921386] [<ffffffff810c65ff>] __lock_acquire+0x168f/0x1b40 [ 251.921386] [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0 [ 251.921386] [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat] [ 251.921386] [<ffffffff816c1215>] ? _raw_spin_unlock_bh+0x35/0x40 [ 251.921386] [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat] [ 251.921386] [<ffffffff810c7272>] lock_acquire+0xa2/0x120 [ 251.921386] [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffffa0055989>] __nf_conntrack_confirm+0x129/0x410 [nf_conntrack] [ 251.921386] [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffffa008ab90>] ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815d8c5a>] nf_iterate+0xaa/0xc0 [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815d8d14>] nf_hook_slow+0xa4/0x190 [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815e98f2>] ip_output+0x92/0x100 [ 251.921386] [<ffffffff815e8df9>] ip_local_out+0x29/0x90 [ 251.921386] [<ffffffff815e9240>] ip_queue_xmit+0x170/0x4c0 [ 251.921386] [<ffffffff815e90d5>] ? ip_queue_xmit+0x5/0x4c0 [ 251.921386] [<ffffffff81601208>] tcp_transmit_skb+0x498/0x960 [ 251.921386] [<ffffffff81602d82>] tcp_connect+0x812/0x960 [ 251.921386] [<ffffffff810e3dc5>] ? ktime_get_real+0x25/0x70 [ 251.921386] [<ffffffff8159ea2a>] ? secure_tcp_sequence_number+0x6a/0xc0 [ 251.921386] [<ffffffff81606f57>] tcp_v4_connect+0x317/0x470 [ 251.921386] [<ffffffff8161f645>] __inet_stream_connect+0xb5/0x330 [ 251.921386] [<ffffffff8158dfc3>] ? lock_sock_nested+0x33/0xa0 [ 251.921386] [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10 [ 251.921386] [<ffffffff81078885>] ? __local_bh_enable_ip+0x75/0xe0 [ 251.921386] [<ffffffff8161f8f8>] inet_stream_connect+0x38/0x50 [ 251.921386] [<ffffffff8158b157>] SYSC_connect+0xe7/0x120 [ 251.921386] [<ffffffff810e3789>] ? current_kernel_time+0x69/0xd0 [ 251.921386] [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0 [ 251.921386] [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10 [ 251.921386] [<ffffffff8158c36e>] SyS_connect+0xe/0x10 [ 251.921386] [<ffffffff816caf69>] system_call_fastpath+0x16/0x1b [ 312.014104] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=60003 jiffies, g=42359, c=42358, q=333) [ 312.015097] INFO: Stall ended before state dump start Fixes: 93bb0ceb ("netfilter: conntrack: remove central spinlock nf_conntrack_lock") Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Mathias Krause authored
The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check for a minimal message length before testing the supplied offset to be within the bounds of the message. This allows the subtraction of the nla header to underflow and therefore -- as the data type is unsigned -- allowing far to big offset and length values for the search of the netlink attribute. The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is also wrong. It has the minuend and subtrahend mixed up, therefore calculates a huge length value, allowing to overrun the end of the message while looking for the netlink attribute. The following three BPF snippets will trigger the bugs when attached to a UNIX datagram socket and parsing a message with length 1, 2 or 3. ,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]-- | ld #0x87654321 | ldx #42 | ld #nla | ret a `--- ,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]-- | ld #0x87654321 | ldx #42 | ld #nlan | ret a `--- ,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]-- | ; (needs a fake netlink header at offset 0) | ld #0 | ldx #42 | ld #nlan | ret a `--- Fix the first issue by ensuring the message length fulfills the minimal size constrains of a nla header. Fix the second bug by getting the math for the remainder calculation right. Fixes: 4738c1db ("[SKFILTER]: Add SKF_ADF_NLATTR instruction") Fixes: d214c753 ("filter: add SKF_AD_NLATTR_NEST to look for nested..") Cc: Patrick McHardy <kaber@trash.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Anastasov authored
Extend commit 13378cad ("ipv4: Change rt->rt_iif encoding.") from 3.6 to return valid RTA_IIF on 'ip route get ... iif DEVICE' instead of rt_iif 0 which is displayed as 'iif *'. inet_iif is not appropriate to use because skb_iif is not set. Use the skb->dev->ifindex instead. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Petazzoni authored
This reverts commit e3a8786c. While this commit allows to use the mvneta driver as a module on some configurations, it breaks other configurations even if mvneta is used built-in. This breakage is due to the fact that on some RGMII platforms, the PCS bit has to be set, and on some other platforms, it has to be cleared. At the moment, we lack informations to know exactly the significance of this bit (the datasheet only says "enables PCS"), and so we can't produce a patch that will work on all platforms at this point. And since this change is breaking the network completely for many users, it's much better to revert it for now. We'll come back later with a proper fix that takes into account all platforms. Basically: * Armada XP GP is configured as RGMII-ID, and needs the PCS bit to be set. * Armada 370 Mirabox is configured as RGMII-ID, and needs the PCS bit to be cleared. And at the moment, we don't know how to make the distinction between those two cases. One hint is that the Armada XP GP appears in fact to be using a QSGMII connection with the PHY (Quad-SGMII), but configuring it as SGMII doesn't work, while RGMII-ID works. This needs more investigation, but in the mean time, let's unbreak the network for all those users. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Reported-by: Arnaud Ebalard <arno@natisbad.org> Reported-by: Alexander Reuter <Alexander.Reuter@gmx.net> Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=73401 Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Yang authored
pci_match_id() just match the static pci_device_id, which may return NULL if someone binds the driver to a device manually using /sys/bus/pci/drivers/.../new_id. This patch wrap up a helper function __mlx4_remove_one() which does the tear down function but preserve the drv_data. Functions like mlx4_pci_err_detected() and mlx4_restart_one() will call this one with out releasing drvdata. Fixes: 97a5221f "net/mlx4_core: pass pci_device_id.driver_data to __mlx4_init_one during reset". CC: Bjorn Helgaas <bhelgaas@google.com> CC: Amir Vadai <amirv@mellanox.com> CC: Jack Morgenstein <jackm@dev.mellanox.co.il> CC: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wang, Xiaoming authored
Plug a group_info refcount leak in ping_init. group_info is only needed during initialization and the code failed to release the reference on exit. While here move grabbing the reference to a place where it is actually needed. Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> Signed-off-by: Zhang Dongxing <dongxing.zhang@intel.com> Signed-off-by: xiaoming wang <xiaoming.wang@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 13 Apr, 2014 8 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuildLinus Torvalds authored
Pull misc kbuild changes from Michal Marek: "Here is the non-critical part of kbuild: - One bogus coccinelle check removed, one check fixed not to suggest the obsolete PTR_RET macro - scripts/tags.sh does not index the generated *.mod.c files - new objdiff tool to list differences between two versions of an object file - A fix for scripts/bootgraph.pl" * 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: scripts/coccinelle: Use PTR_ERR_OR_ZERO scripts/bootgraph.pl: Add graphic header scripts: objdiff: detect object code changes between two commits Coccicheck: Remove memcpy to struct assignment test scripts/tags.sh: Ignore *.mod.c
-
Mikulas Patocka authored
This patch fixes I/O errors with the sym53c8xx_2 driver when the disk returns QUEUE FULL status. When the controller encounters an error (including QUEUE FULL or BUSY status), it aborts all not yet submitted requests in the function sym_dequeue_from_squeue. This function aborts them with DID_SOFT_ERROR. If the disk has full tag queue, the request that caused the overflow is aborted with QUEUE FULL status (and the scsi midlayer properly retries it until it is accepted by the disk), but the sym53c8xx_2 driver aborts the following requests with DID_SOFT_ERROR --- for them, the midlayer does just a few retries and then signals the error up to sd. The result is that disk returning QUEUE FULL causes request failures. The error was reproduced on 53c895 with COMPAQ BD03685A24 disk (rebranded ST336607LC) with command queue 48 or 64 tags. The disk has 64 tags, but under some access patterns it return QUEUE FULL when there are less than 64 pending tags. The SCSI specification allows returning QUEUE FULL anytime and it is up to the host to retry. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Paul Mackerras authored
Commit 8f619b54 ("powerpc/ppc64: Do not turn AIL (reloc-on interrupts) too early") added code to set the AIL bit in the LPCR without checking whether the kernel is running in hypervisor mode. The result is that when the kernel is running as a guest (i.e., under PowerKVM or PowerVM), the processor takes a privileged instruction interrupt at that point, causing a panic. The visible result is that the kernel hangs after printing "returning from prom_init". This fixes it by checking for hypervisor mode being available before setting LPCR. If we are not in hypervisor mode, we enable relocation-on interrupts later in pSeries_setup_arch using the H_SET_MODE hcall. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
Commits 11d4616b ("futex: revert back to the explicit waiter counting code") and 69cd9eba ("futex: avoid race between requeue and wake") changed some of the finer details of how we think about futexes. One was a late fix and the other a consequence of overlooking the whole requeuing logic. The first change caused our documentation to be incorrect, and the second made us aware that we need to explicitly add more details to it. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull yet more networking updates from David Miller: 1) Various fixes to the new Redpine Signals wireless driver, from Fariya Fatima. 2) L2TP PPP connect code takes PMTU from the wrong socket, fix from Dmitry Petukhov. 3) UFO and TSO packets differ in whether they include the protocol header in gso_size, account for that in skb_gso_transport_seglen(). From Florian Westphal. 4) If VLAN untagging fails, we double free the SKB in the bridging output path. From Toshiaki Makita. 5) Several call sites of sk->sk_data_ready() were referencing an SKB just added to the socket receive queue in order to calculate the second argument via skb->len. This is dangerous because the moment the skb is added to the receive queue it can be consumed in another context and freed up. It turns out also that none of the sk->sk_data_ready() implementations even care about this second argument. So just kill it off and thus fix all these use-after-free bugs as a side effect. 6) Fix inverted test in tcp_v6_send_response(), from Lorenzo Colitti. 7) pktgen needs to do locking properly for LLTX devices, from Daniel Borkmann. 8) xen-netfront driver initializes TX array entries in RX loop :-) From Vincenzo Maffione. 9) After refactoring, some tunnel drivers allow a tunnel to be configured on top itself. Fix from Nicolas Dichtel. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits) vti: don't allow to add the same tunnel twice gre: don't allow to add the same tunnel twice drivers: net: xen-netfront: fix array initialization bug pktgen: be friendly to LLTX devices r8152: check RTL8152_UNPLUG net: sun4i-emac: add promiscuous support net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO net: ipv6: Fix oif in TCP SYN+ACK route lookup. drivers: net: cpsw: enable interrupts after napi enable and clearing previous interrupts drivers: net: cpsw: discard all packets received when interface is down net: Fix use after free by removing length arg from sk_data_ready callbacks. Drivers: net: hyperv: Address UDP checksum issues Drivers: net: hyperv: Negotiate suitable ndis version for offload support Drivers: net: hyperv: Allocate memory for all possible per-pecket information bridge: Fix double free and memory leak around br_allowed_ingress bonding: Remove debug_fs files when module init fails i40evf: program RSS LUT correctly i40evf: remove open-coded skb_cow_head ixgb: remove open-coded skb_cow_head igbvf: remove open-coded skb_cow_head ...
-
Linus Torvalds authored
Merge tag 'blackfin-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/realmz6/blackfin-linux Pull blackfin updates from Steven Miao: "Code cleanup, some previously ignored patches, and bug fixes" * tag 'blackfin-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/realmz6/blackfin-linux: blackfin: cleanup board files bf609: clock: drop unused clock bit set/clear functions Blackfin: bf537: rename "CONFIG_ADT75" Blackfin: bf537: rename "CONFIG_AD7314" Blackfin: bf537: rename ad2s120x ->ad2s1200 blackfin: bf537: fix typo "CONFIG_SND_SOC_ADV80X_MODULE" blackfin: dma: current count mmr is read only bfin_crc: Move architecture independant crc header file out of the blackfin folder. bf54x: drop unuesd HOST status,control,timeout registers bit define macros blackfin: portmux: cleanup head file Blackfin: remove "config IP_CHECKSUM_L1" blackfin: Remove GENERIC_GPIO config option again blackfin:Use generic /proc/interrupts implementation blackfin: bf60x: fix typo "CONFIG_PM_BFIN_WAKE_PA15_POL"
-
Linus Torvalds authored
Merge tag 'remoteproc-3.15-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc Pull remoteproc cleanups from Ohad Ben-Cohen: "Several remoteproc cleanup patches coming from Jingoo Han, Julia Lawall and Uwe Kleine-König" * tag 'remoteproc-3.15-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc: remoteproc/ste_modem: staticize local symbols remoteproc/davinci: simplify use of devm_ioremap_resource remoteproc/davinci: drop needless devm_clk_put
-
git://git.linuxfoundation.org/llvmlinux/kernelLinus Torvalds authored
Pull llvm patches from Behan Webster: "These are some initial updates to support compiling the kernel with clang. These patches have been through the proper reviews to the best of my ability, and have been soaking in linux-next for a few weeks. These patches by themselves still do not completely allow clang to be used with the kernel code, but lay the foundation for other patches which are still under review. Several other of the LLVMLinux patches have been already added via maintainer trees" * tag 'llvmlinux-for-v3.15' of git://git.linuxfoundation.org/llvmlinux/kernel: x86: LLVMLinux: Fix "incomplete type const struct x86cpu_device_id" x86 kbuild: LLVMLinux: More cc-options added for clang x86, acpi: LLVMLinux: Remove nested functions from Thinkpad ACPI LLVMLinux: Add support for clang to compiler.h and new compiler-clang.h LLVMLinux: Remove warning about returning an uninitialized variable kbuild: LLVMLinux: Fix LINUX_COMPILER definition script for compilation with clang Documentation: LLVMLinux: Update Documentation/dontdiff kbuild: LLVMLinux: Adapt warnings for compilation with clang kbuild: LLVMLinux: Add Kbuild support for building kernel with Clang
-
- 12 Apr, 2014 6 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pendingLinus Torvalds authored
Pull SCSI target updates from Nicholas Bellinger: "Here are the target pending updates for v3.15-rc1. Apologies in advance for waiting until the second to last day of the merge window to send these out. The highlights this round include: - iser-target support for T10 PI (DIF) offloads (Sagi + Or) - Fix Task Aborted Status (TAS) handling in target-core (Alex Leung) - Pass in transport supported PI at session initialization (Sagi + MKP + nab) - Add WRITE_INSERT + READ_STRIP T10 PI support in target-core (nab + Sagi) - Fix iscsi-target ERL=2 ASYNC_EVENT connection pointer bug (nab) - Fix tcm_fc use-after-free of ft_tpg (Andy Grover) - Use correct ib_sg_dma primitives in ib_isert (Mike Marciniszyn) Also, note the virtio-scsi + vhost-scsi changes to expose T10 PI metadata into KVM guest have been left-out for now, as there where a few comments from MST + Paolo that where not able to be addressed in time for v3.15. Please expect this feature for v3.16-rc1" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (43 commits) ib_srpt: Use correct ib_sg_dma primitives target/tcm_fc: Rename ft_tport_create to ft_tport_get target/tcm_fc: Rename ft_{add,del}_lport to {add,del}_wwn target/tcm_fc: Rename structs and list members for clarity target/tcm_fc: Limit to 1 TPG per wwn target/tcm_fc: Don't export ft_lport_list target/tcm_fc: Fix use-after-free of ft_tpg target: Add check to prevent Abort Task from aborting itself target: Enable READ_STRIP emulation in target_complete_ok_work target/sbc: Add sbc_dif_read_strip software emulation target: Enable WRITE_INSERT emulation in target_execute_cmd target/sbc: Add sbc_dif_generate software emulation target/sbc: Only expose PI read_cap16 bits when supported by fabric target/spc: Only expose PI mode page bits when supported by fabric target/spc: Only expose PI inquiry bits when supported by fabric target: Pass in transport supported PI at session initialization target/iblock: Fix double bioset_integrity_free bug Target/sbc: Initialize COMPARE_AND_WRITE write_sg scatterlist target/rd: T10-Dif: RAM disk is allocating more space than required. iscsi-target: Fix ERL=2 ASYNC_EVENT connection pointer bug ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-mediaLinus Torvalds authored
Pull media fixes from Mauro Carvalho Chehab: "A series of bug fix patches for v3.15-rc1. Most are just driver fixes. There are some changes at remote controller core level, fixing some definitions on a new API added for Kernel v3.15. It also adds the missing include at include/uapi/linux/v4l2-common.h, to allow its compilation on userspace, as pointed by you" * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (24 commits) [media] gpsca: remove the risk of a division by zero [media] stk1160: warrant a NUL terminated string [media] v4l: ti-vpe: retain v4l2_buffer flags for captured buffers [media] v4l: ti-vpe: Set correct field parameter for output and capture buffers [media] v4l: ti-vpe: zero out reserved fields in try_fmt [media] v4l: ti-vpe: Fix initial configuration queue data [media] v4l: ti-vpe: Use correct bus_info name for the device in querycap [media] v4l: ti-vpe: report correct capabilities in querycap [media] v4l: ti-vpe: Allow usage of smaller images [media] v4l: ti-vpe: Use video_device_release_empty [media] v4l: ti-vpe: Make sure in job_ready that we have the needed number of dst_bufs [media] lgdt3305: include sleep functionality in lgdt3304_ops [media] drx-j: use customise option correctly [media] m88rs2000: fix sparse static warnings [media] r820t: fix size and init values [media] rc-core: remove generic scancode filter [media] rc-core: split dev->s_filter [media] rc-core: do not change 32bit NEC scancode format for now [media] rtl28xxu: remove duplicate ID 0458:707f Genius TVGo DVB-T03 [media] xc2028: add missing break to switch ...
-
git://github.com/jonmason/ntbLinus Torvalds authored
Pull PCIe non-transparent bridge fixes and features from Jon Mason: "NTB driver bug fixes to address issues in list traversal, skb leak in ntb_netdev, a typo, and a leak of msix entries in the error path. Clean ups of the event handling logic, as well as a overall style cleanup. Finally, the driver was converted to use the new pci_enable_msix_range logic (and the refactoring to go along with it)" * tag 'ntb-3.15' of git://github.com/jonmason/ntb: ntb: Use pci_enable_msix_range() instead of pci_enable_msix() ntb: Split ntb_setup_msix() into separate BWD/SNB routines ntb: Use pci_msix_vec_count() to obtain number of MSI-Xs NTB: Code Style Clean-up NTB: client event cleanup ntb: Fix leakage of ntb_device::msix_entries[] array NTB: Fix typo in setting one translation register ntb_netdev: Fix skb free issue in open ntb_netdev: Fix list_for_each_entry exit issue
-
Linus Torvalds authored
The vfs merge caused a latent bug to show up: In file included from fs/ceph/super.h:4:0, from fs/ceph/ioctl.c:3: include/linux/ceph/ceph_debug.h:4:0: warning: "pr_fmt" redefined [enabled by default] #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt ^ In file included from include/linux/kernel.h:13:0, from include/linux/uio.h:12, from include/linux/socket.h:7, from include/uapi/linux/in.h:22, from include/linux/in.h:23, from fs/ceph/ioctl.c:1: include/linux/printk.h:214:0: note: this is the location of the previous definition #define pr_fmt(fmt) fmt ^ where the reason is that <linux/ceph_debug.h> is included much too late for the "pr_fmt()" define. The include of <linux/ceph_debug.h> needs to be the first include in the file, but fs/ceph/ioctl.c had for some reason missed that, and it wasn't noticeable until some unrelated header file changes brought in an indirect earlier include of <linux/kernel.h>. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull vfs updates from Al Viro: "The first vfs pile, with deep apologies for being very late in this window. Assorted cleanups and fixes, plus a large preparatory part of iov_iter work. There's a lot more of that, but it'll probably go into the next merge window - it *does* shape up nicely, removes a lot of boilerplate, gets rid of locking inconsistencie between aio_write and splice_write and I hope to get Kent's direct-io rewrite merged into the same queue, but some of the stuff after this point is having (mostly trivial) conflicts with the things already merged into mainline and with some I want more testing. This one passes LTP and xfstests without regressions, in addition to usual beating. BTW, readahead02 in ltp syscalls testsuite has started giving failures since "mm/readahead.c: fix readahead failure for memoryless NUMA nodes and limit readahead pages" - might be a false positive, might be a real regression..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) missing bits of "splice: fix racy pipe->buffers uses" cifs: fix the race in cifs_writev() ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure kill generic_file_buffered_write() ocfs2_file_aio_write(): switch to generic_perform_write() ceph_aio_write(): switch to generic_perform_write() xfs_file_buffered_aio_write(): switch to generic_perform_write() export generic_perform_write(), start getting rid of generic_file_buffer_write() generic_file_direct_write(): get rid of ppos argument btrfs_file_aio_write(): get rid of ppos kill the 5th argument of generic_file_buffered_write() kill the 4th argument of __generic_file_aio_write() lustre: don't open-code kernel_recvmsg() ocfs2: don't open-code kernel_recvmsg() drbd: don't open-code kernel_recvmsg() constify blk_rq_map_user_iov() and friends lustre: switch to kernel_sendmsg() ocfs2: don't open-code kernel_sendmsg() take iov_iter stuff to mm/iov_iter.c process_vm_access: tidy up a bit ...
-
David S. Miller authored
Nicolas Dichtel says: ==================== tunnels: don't allow to add the same tunnel twice This series fixes the check of an existing tunnel with the same parameters when a new tunnel is added. I've checked all users of ip_tunnel_newlink(): gre, gretap, ipip and vti. The bug exists only for gre and vti. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-