- 27 Feb, 2015 12 commits
-
-
Eric Dumazet authored
[ Upstream commit bdbbb852 ] In commit be9f4a44 ("ipv4: tcp: remove per net tcp_sock") I tried to address contention on a socket lock, but the solution I chose was horrible : commit 3a7c384f ("ipv4: tcp: unicast_sock should not land outside of TCP stack") addressed a selinux regression. commit 0980e56e ("ipv4: tcp: set unicast_sock uc_ttl to -1") took care of another regression. commit b5ec8eea ("ipv4: fix ip_send_skb()") fixed another regression. commit 811230cd ("tcp: ipv4: initialize unicast_sock sk_pacing_rate") was another shot in the dark. Really, just use a proper socket per cpu, and remove the skb_orphan() call, to re-enable flow control. This solves a serious problem with FQ packet scheduler when used in hostile environments, as we do not want to allocate a flow structure for every RST packet sent in response to a spoofed packet. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 811230cd ] When I added sk_pacing_rate field, I forgot to initialize its value in the per cpu unicast_sock used in ip_send_unicast_reply() This means that for sch_fq users, RST packets, or ACK packets sent on behalf of TIME_WAIT sockets might be sent to slowly or even dropped once we reach the per flow limit. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 95bd09eb ("tcp: TSO packets automatic sizing") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Roopa Prabhu authored
[ Upstream commit 59ccaaaa ] Reported in: https://bugzilla.kernel.org/show_bug.cgi?id=92081 This patch avoids calling rtnl_notify if the device ndo_bridge_getlink handler does not return any bytes in the skb. Alternately, the skb->len check can be moved inside rtnl_notify. For the bridge vlan case described in 92081, there is also a fix needed in bridge driver to generate a proper notification. Will fix that in subsequent patch. v2: rebase patch on net tree Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hannes Frederic Sowa authored
[ Upstream commit 6e9e16e6 ] Lubomir Rintel reported that during replacing a route the interface reference counter isn't correctly decremented. To quote bug <https://bugzilla.kernel.org/show_bug.cgi?id=91941>: | [root@rhel7-5 lkundrak]# sh -x lal | + ip link add dev0 type dummy | + ip link set dev0 up | + ip link add dev1 type dummy | + ip link set dev1 up | + ip addr add 2001:db8:8086::2/64 dev dev0 | + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20 | + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10 | + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20 | + ip link del dev0 type dummy | Message from syslogd@rhel7-5 at Jan 23 10:54:41 ... | kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2 | | Message from syslogd@rhel7-5 at Jan 23 10:54:51 ... | kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2 During replacement of a rt6_info we must walk all parent nodes and check if the to be replaced rt6_info got propagated. If so, replace it with an alive one. Fixes: 4a287eba ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag") Reported-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Tested-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
subashab@codeaurora.org authored
[ Upstream commit fc752f1f ] An exception is seen in ICMP ping receive path where the skb destructor sock_rfree() tries to access a freed socket. This happens because ping_rcv() releases socket reference with sock_put() and this internally frees up the socket. Later icmp_rcv() will try to free the skb and as part of this, skb destructor is called and which leads to a kernel panic as the socket is freed already in ping_rcv(). -->|exception -007|sk_mem_uncharge -007|sock_rfree -008|skb_release_head_state -009|skb_release_all -009|__kfree_skb -010|kfree_skb -011|icmp_rcv -012|ip_local_deliver_finish Fix this incorrect free by cloning this skb and processing this cloned skb instead. This patch was suggested by Eric Dumazet Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Herbert Xu authored
[ Upstream commit 86f3cddb ] While working on rhashtable walking I noticed that the UDP diag dumping code is buggy. In particular, the socket skipping within a chain never happens, even though we record the number of sockets that should be skipped. As this code was supposedly copied from TCP, this patch does what TCP does and resets num before we walk a chain. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hannes Frederic Sowa authored
[ Upstream commit df4d9254 ] Not caching dst_entries which cause redirects could be exploited by hosts on the same subnet, causing a severe DoS attack. This effect aggravated since commit f8864972 ("ipv4: fix dst race in sk_dst_get()"). Lookups causing redirects will be allocated with DST_NOCACHE set which will force dst_release to free them via RCU. Unfortunately waiting for RCU grace period just takes too long, we can end up with >1M dst_entries waiting to be released and the system will run OOM. rcuos threads cannot catch up under high softirq load. Attaching the flag to emit a redirect later on to the specific skb allows us to cache those dst_entries thus reducing the pressure on allocation and deallocation. This issue was discovered by Marcelo Leitner. Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: Marcelo Leitner <mleitner@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Borkmann authored
[ Upstream commit 600ddd68 ] When hitting an INIT collision case during the 4WHS with AUTH enabled, as already described in detail in commit 1be9a950 ("net: sctp: inherit auth_capable on INIT collisions"), it can happen that we occasionally still remotely trigger the following panic on server side which seems to have been uncovered after the fix from commit 1be9a950 ... [ 533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff [ 533.913657] IP: [<ffffffff811ac385>] __kmalloc+0x95/0x230 [ 533.940559] PGD 5030f2067 PUD 0 [ 533.957104] Oops: 0000 [#1] SMP [ 533.974283] Modules linked in: sctp mlx4_en [...] [ 534.939704] Call Trace: [ 534.951833] [<ffffffff81294e30>] ? crypto_init_shash_ops+0x60/0xf0 [ 534.984213] [<ffffffff81294e30>] crypto_init_shash_ops+0x60/0xf0 [ 535.015025] [<ffffffff8128c8ed>] __crypto_alloc_tfm+0x6d/0x170 [ 535.045661] [<ffffffff8128d12c>] crypto_alloc_base+0x4c/0xb0 [ 535.074593] [<ffffffff8160bd42>] ? _raw_spin_lock_bh+0x12/0x50 [ 535.105239] [<ffffffffa0418c11>] sctp_inet_listen+0x161/0x1e0 [sctp] [ 535.138606] [<ffffffff814e43bd>] SyS_listen+0x9d/0xb0 [ 535.166848] [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b ... or depending on the the application, for example this one: [ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff [ 1370.026506] IP: [<ffffffff811ab455>] kmem_cache_alloc+0x75/0x1d0 [ 1370.054568] PGD 633c94067 PUD 0 [ 1370.070446] Oops: 0000 [#1] SMP [ 1370.085010] Modules linked in: sctp kvm_amd kvm [...] [ 1370.963431] Call Trace: [ 1370.974632] [<ffffffff8120f7cf>] ? SyS_epoll_ctl+0x53f/0x960 [ 1371.000863] [<ffffffff8120f7cf>] SyS_epoll_ctl+0x53f/0x960 [ 1371.027154] [<ffffffff812100d3>] ? anon_inode_getfile+0xd3/0x170 [ 1371.054679] [<ffffffff811e3d67>] ? __alloc_fd+0xa7/0x130 [ 1371.080183] [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b With slab debugging enabled, we can see that the poison has been overwritten: [ 669.826368] BUG kmalloc-128 (Tainted: G W ): Poison overwritten [ 669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b [ 669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494 [ 669.826424] __slab_alloc+0x4bf/0x566 [ 669.826433] __kmalloc+0x280/0x310 [ 669.826453] sctp_auth_create_key+0x23/0x50 [sctp] [ 669.826471] sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp] [ 669.826488] sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp] [ 669.826505] sctp_do_sm+0x29d/0x17c0 [sctp] [...] [ 669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494 [ 669.826635] __slab_free+0x39/0x2a8 [ 669.826643] kfree+0x1d6/0x230 [ 669.826650] kzfree+0x31/0x40 [ 669.826666] sctp_auth_key_put+0x19/0x20 [sctp] [ 669.826681] sctp_assoc_update+0x1ee/0x2d0 [sctp] [ 669.826695] sctp_do_sm+0x674/0x17c0 [sctp] Since this only triggers in some collision-cases with AUTH, the problem at heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice when having refcnt 1, once directly in sctp_assoc_update() and yet again from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on the already kzfree'd memory, which is also consistent with the observation of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected at a later point in time when poison is checked on new allocation). Reference counting of auth keys revisited: Shared keys for AUTH chunks are being stored in endpoints and associations in endpoint_shared_keys list. On endpoint creation, a null key is being added; on association creation, all endpoint shared keys are being cached and thus cloned over to the association. struct sctp_shared_key only holds a pointer to the actual key bytes, that is, struct sctp_auth_bytes which keeps track of users internally through refcounting. Naturally, on assoc or enpoint destruction, sctp_shared_key are being destroyed directly and the reference on sctp_auth_bytes dropped. User space can add keys to either list via setsockopt(2) through struct sctp_authkey and by passing that to sctp_auth_set_key() which replaces or adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes with refcount 1 and in case of replacement drops the reference on the old sctp_auth_bytes. A key can be set active from user space through setsockopt() on the id via sctp_auth_set_active_key(), which iterates through either endpoint_shared_keys and in case of an assoc, invokes (one of various places) sctp_auth_asoc_init_active_key(). sctp_auth_asoc_init_active_key() computes the actual secret from local's and peer's random, hmac and shared key parameters and returns a new key directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops the reference if there was a previous one. The secret, which where we eventually double drop the ref comes from sctp_auth_asoc_set_secret() with intitial refcount of 1, which also stays unchanged eventually in sctp_assoc_update(). This key is later being used for crypto layer to set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac(). To close the loop: asoc->asoc_shared_key is freshly allocated secret material and independant of the sctp_shared_key management keeping track of only shared keys in endpoints and assocs. Hence, also commit 4184b2a7 ("net: sctp: fix memory leak in auth key management") is independant of this bug here since it concerns a different layer (though same structures being used eventually). asoc->asoc_shared_key is reference dropped correctly on assoc destruction in sctp_association_free() and when active keys are being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is to remove that sctp_auth_key_put() from there which fixes these panics. Fixes: 730fc3d0 ("[SCTP]: Implete SCTP-AUTH parameter processing") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 6088beef ] NAPI poll logic now enforces that a poller returns exactly the budget when it wants to be called again. If a driver limits TX completion, it has to return budget as well when the limit is hit, not the number of received packets. Reported-and-tested-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: d75b1ade ("net: less interrupt masking in NAPI") Cc: Manish Chopra <manish.chopra@qlogic.com> Acked-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hagen Paul Pfeifer authored
[ Upstream commit 9d289715 ] Reduce the attack vector and stop generating IPv6 Fragment Header for paths with an MTU smaller than the minimum required IPv6 MTU size (1280 byte) - called atomic fragments. See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1] for more information and how this "feature" can be misused. [1] https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00Signed-off-by: Fernando Gont <fgont@si6networks.com> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit ac64da0b ] softnet_data.input_pkt_queue is protected by a spinlock that we must hold when transferring packets from victim queue to an active one. This is because other cpus could still be trying to enqueue packets into victim queue. A second problem is that when we transfert the NAPI poll_list from victim to current cpu, we absolutely need to special case the percpu backlog, because we do not want to add complex locking to protect process_queue : Only owner cpu is allowed to manipulate it, unless cpu is offline. Based on initial patch from Prasad Sodagudi & Subash Abhinov Kasiviswanathan. This version is better because we do not slow down packet processing, only make migration safer. Reported-by: Prasad Sodagudi <psodagud@codeaurora.org> Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Willem de Bruijn authored
[ Upstream commit f812116b ] The sockaddr is returned in IP(V6)_RECVERR as part of errhdr. That structure is defined and allocated on the stack as struct { struct sock_extended_err ee; struct sockaddr_in(6) offender; } errhdr; The second part is only initialized for certain SO_EE_ORIGIN values. Always initialize it completely. An MTU exceeded error on a SOCK_RAW/IPPROTO_RAW is one example that would return uninitialized bytes. Signed-off-by: Willem de Bruijn <willemb@google.com> ---- Also verified that there is no padding between errhdr.ee and errhdr.offender that could leak additional kernel data. Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 11 Feb, 2015 19 commits
-
-
Greg Kroah-Hartman authored
-
Mathias Krause authored
The backport of commit 5d26a105 ("crypto: prefix module autoloading with "crypto-"") lost the MODULE_ALIAS_CRYPTO() annotation of crc32c.c. Add it to fix the reported filesystem related regressions. Signed-off-by: Mathias Krause <minipli@googlemail.com> Reported-by: Philip Müller <philm@manjaro.org> Cc: Kees Cook <keescook@chromium.org> Cc: Rob McCathie <rob@manjaro.org> Cc: Luis Henriques <luis.henriques@canonical.com> Cc: Kamal Mostafa <kamal@canonical.com> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andy Lutomirski authored
commit d974baa3 upstream. CR4 isn't constant; at least the TSD and PCE bits can vary. TBH, treating CR0 and CR3 as constant scares me a bit, too, but it looks like it's correct. This adds a branch and a read from cr4 to each vm entry. Because it is extremely likely that consecutive entries into the same vcpu will have the same host cr4 value, this fixes up the vmcs instead of restoring cr4 after the fact. A subsequent patch will add a kernel-wide cr4 shadow, reducing the overhead in the common case to just two memory reads and a branch. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [wangkai: Backport to 3.10: adjust context] Signed-off-by: Wang Kai <morgan.wang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Petr Matousek authored
commit a642fc30 upstream. On systems with invvpid instruction support (corresponding bit in IA32_VMX_EPT_VPID_CAP MSR is set) guest invocation of invvpid causes vm exit, which is currently not handled and results in propagation of unknown exit to userspace. Fix this by installing an invvpid vm exit handler. This is CVE-2014-3646. Cc: stable@vger.kernel.org Signed-off-by: Petr Matousek <pmatouse@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [wangkai: Backport to 3.10: adjust context] Signed-off-by: Wang Kai <morgan.wang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lai Jiangshan authored
commit 4bee9686 upstream. The following race exists in the smpboot percpu threads management: CPU0 CPU1 cpu_up(2) get_online_cpus(); smpboot_create_threads(2); smpboot_register_percpu_thread(); for_each_online_cpu(); __smpboot_create_thread(); __cpu_up(2); This results in a missing per cpu thread for the newly onlined cpu2 and in a NULL pointer dereference on a consecutive offline of that cpu. Proctect smpboot_register_percpu_thread() with get_online_cpus() to prevent that. [ tglx: Massaged changelog and removed the change in smpboot_unregister_percpu_thread() because that's an optimization and therefor not stable material. ] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/1406777421-12830-1-git-send-email-laijs@cn.fujitsu.comSigned-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Takashi Iwai authored
commit 4161b450 upstream. When ak4114 work calls its callback and the callback invokes ak4114_reinit(), it stalls due to flush_delayed_work(). For avoiding this, control the reentrance by introducing a refcount. Also flush_delayed_work() is replaced with cancel_delayed_work_sync(). The exactly same bug is present in ak4113.c and fixed as well. Reported-by: Pavel Hofman <pavel.hofman@ivitera.com> Acked-by: Jaroslav Kysela <perex@perex.cz> Tested-by: Pavel Hofman <pavel.hofman@ivitera.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Nelson authored
commit 58cc9c9a upstream. To quote from section 1.3.1 of the data sheet: The SGTL5000 has an internal reset that is deasserted 8 SYS_MCLK cycles after all power rails have been brought up. After this time, communication can start ... 1.0us represents 8 SYS_MCLK cycles at the minimum 8.0 MHz SYS_MCLK. Signed-off-by: Eric Nelson <eric.nelson@boundarydevices.com> Reviewed-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Bo Shen authored
commit a43bd7e1 upstream. According to the I2S specification information as following: - WS = 0, channel 1 (left) - WS = 1, channel 2 (right) So, the start event should be TF/RF falling edge. Reported-by: Songjun Wu <songjun.wu@atmel.com> Signed-off-by: Bo Shen <voice.shen@atmel.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
karl beldan authored
commit 9ce35779 upstream. Fixed commit added from64to32 under _#ifndef do_csum_ but used it under _#ifndef csum_tcpudp_nofold_, breaking some builds (Fengguang's robot reported TILEGX's). Move from64to32 under the latter. Fixes: 150ae0e9 ("lib/checksum.c: fix carry in csum_tcpudp_nofold") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Cc: Eric Dumazet <edumazet@google.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dmitry Monakhov authored
commit a41537e6 upstream. O_DIRECT flags can be toggeled via fcntl(F_SETFL). But this value checked twice inside ext4_file_write_iter() and __generic_file_write() which result in BUG_ON inside ext4_direct_IO. Let's initialize iocb->private unconditionally. TESTCASE: xfstest:generic/036 https://patchwork.ozlabs.org/patch/402445/ #TYPICAL STACK TRACE: kernel BUG at fs/ext4/inode.c:2960! invalid opcode: 0000 [#1] SMP Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod CPU: 6 PID: 5505 Comm: aio-dio-fcntl-r Not tainted 3.17.0-rc2-00176-gff5c017 #161 Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011 task: ffff88080e95a7c0 ti: ffff88080f908000 task.ti: ffff88080f908000 RIP: 0010:[<ffffffff811fabf2>] [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0 RSP: 0018:ffff88080f90bb58 EFLAGS: 00010246 RAX: 0000000000000400 RBX: ffff88080fdb2a28 RCX: 00000000a802c818 RDX: 0000040000080000 RSI: ffff88080d8aeb80 RDI: 0000000000000001 RBP: ffff88080f90bbc8 R08: 0000000000000000 R09: 0000000000001581 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88080d8aeb80 R13: ffff88080f90bbf8 R14: ffff88080fdb28c8 R15: ffff88080fdb2a28 FS: 00007f23b2055700(0000) GS:ffff880818400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f23b2045000 CR3: 000000080cedf000 CR4: 00000000000407e0 Stack: ffff88080f90bb98 0000000000000000 7ffffffffffffffe ffff88080fdb2c30 0000000000000200 0000000000000200 0000000000000001 0000000000000200 ffff88080f90bbc8 ffff88080fdb2c30 ffff88080f90be08 0000000000000200 Call Trace: [<ffffffff8112ca9d>] generic_file_direct_write+0xed/0x180 [<ffffffff8112f2b2>] __generic_file_write_iter+0x222/0x370 [<ffffffff811f495b>] ext4_file_write_iter+0x34b/0x400 [<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410 [<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410 [<ffffffff810990e5>] ? local_clock+0x25/0x30 [<ffffffff810abd94>] ? __lock_acquire+0x274/0x700 [<ffffffff811f4610>] ? ext4_unwritten_wait+0xb0/0xb0 [<ffffffff811bd756>] aio_run_iocb+0x286/0x410 [<ffffffff810990e5>] ? local_clock+0x25/0x30 [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190 [<ffffffff811bc05b>] ? lookup_ioctx+0x4b/0xf0 [<ffffffff811bde3b>] do_io_submit+0x55b/0x740 [<ffffffff811bdcaa>] ? do_io_submit+0x3ca/0x740 [<ffffffff811be030>] SyS_io_submit+0x10/0x20 [<ffffffff815ce192>] system_call_fastpath+0x16/0x1b Code: 01 48 8b 80 f0 01 00 00 48 8b 18 49 8b 45 10 0f 85 f1 01 00 00 48 03 45 c8 48 3b 43 48 0f 8f e3 01 00 00 49 83 7c 24 18 00 75 04 <0f> 0b eb fe f0 ff 83 ec 01 00 00 49 8b 44 24 18 8b 00 85 c0 89 RIP [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0 RSP <ffff88080f90bb58> Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> [hujianyang: Backported to 3.10 - Move initialization of iocb->private to ext4_file_write() as we don't have ext4_file_write_iter(), which is introduced by commit 9b884164. - Adjust context to make 'overwrite' changes apply to ext4_file_dio_write() as ext4_file_dio_write() is not move into ext4_file_write()] Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mark Rutland authored
commit 44b82b77 upstream. Commit d7a49086 (arm64: cpuinfo: print info for all CPUs) attempted to clean up /proc/cpuinfo, but due to concerns regarding further changes was reverted in commit 5e39977e (Revert "arm64: cpuinfo: print info for all CPUs"). There are two major issues with the arm64 /proc/cpuinfo format currently: * The "Features" line describes (only) the 64-bit hwcaps, which is problematic for some 32-bit applications which attempt to parse it. As the same names are used for analogous ISA features (e.g. aes) despite these generally being architecturally unrelated, it is not possible to simply append the 64-bit and 32-bit hwcaps in a manner that might not be misleading to some applications. Various potential solutions have appeared in vendor kernels. Typically the format of the Features line varies depending on whether the task is 32-bit. * Information is only printed regarding a single CPU. This does not match the ARM format, and does not provide sufficient information in big.LITTLE systems where CPUs are heterogeneous. The CPU information printed is queried from the current CPU's registers, which is racy w.r.t. cross-cpu migration. This patch attempts to solve these issues. The following changes are made: * When a task with a LINUX32 personality attempts to read /proc/cpuinfo, the "Features" line contains the decoded 32-bit hwcaps, as with the arm port. Otherwise, the decoded 64-bit hwcaps are shown. This aligns with the behaviour of COMPAT_UTS_MACHINE and COMPAT_ELF_PLATFORM. In the absense of compat support, the Features line is empty. The set of hwcaps injected into a task's auxval are unaffected. * Properties are printed per-cpu, as with the ARM port. The per-cpu information is queried from pre-recorded cpu information (as used by the sanity checks). * As with the previous attempt at fixing up /proc/cpuinfo, the hardware field is removed. The only users so far are 32-bit applications tied to particular boards, so no portable applications should be affected, and this should prevent future tying to particular boards. The following differences remain: * No model_name is printed, as this cannot be queried from the hardware and cannot be provided in a stable fashion. Use of the CPU {implementor,variant,part,revision} fields is sufficient to identify a CPU and is portable across arm and arm64. * The following system-wide properties are not provided, as they are not possible to provide generally. Programs relying on these are already tied to particular (32-bit only) boards: - Hardware - Revision - Serial No software has yet been identified for which these remaining differences are problematic. Cc: Greg Hackmann <ghackmann@google.com> Cc: Ian Campbell <ijc@hellion.org.uk> Cc: Serban Constantinescu <serban.constantinescu@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: cross-distro@lists.linaro.org Cc: linux-api@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Acked-by: Catalin Marinas <catalin.marinas@arm.com> [Mark: backport to v3.10.x] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ryusuke Konishi authored
commit 7ef3ff2f upstream. Nilfs2 eventually hangs in a stress test with fsstress program. This issue was caused by the following deadlock over I_SYNC flag between nilfs_segctor_thread() and writeback_sb_inodes(): nilfs_segctor_thread() nilfs_segctor_thread_construct() nilfs_segctor_unlock() nilfs_dispose_list() iput() iput_final() evict() inode_wait_for_writeback() * wait for I_SYNC flag writeback_sb_inodes() * set I_SYNC flag on inode->i_state __writeback_single_inode() do_writepages() nilfs_writepages() nilfs_construct_dsync_segment() nilfs_segctor_sync() * wait for completion of segment constructor inode_sync_complete() * clear I_SYNC flag after __writeback_single_inode() completed writeback_sb_inodes() calls do_writepages() for dirty inodes after setting I_SYNC flag on inode->i_state. do_writepages() in turn calls nilfs_writepages(), which can run segment constructor and wait for its completion. On the other hand, segment constructor calls iput(), which can call evict() and wait for the I_SYNC flag on inode_wait_for_writeback(). Since segment constructor doesn't know when I_SYNC will be set, it cannot know whether iput() will block or not unless inode->i_nlink has a non-zero count. We can prevent evict() from being called in iput() by implementing sop->drop_inode(), but it's not preferable to leave inodes with i_nlink == 0 for long periods because it even defers file truncation and inode deallocation. So, this instead resolves the deadlock by calling iput() asynchronously with a workqueue for inodes with i_nlink == 0. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
karl beldan authored
commit 150ae0e9 upstream. The carry from the 64->32bits folding was dropped, e.g with: saddr=0xFFFFFFFF daddr=0xFF0000FF len=0xFFFF proto=0 sum=1, csum_tcpudp_nofold returned 0 instead of 1. Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Mike Frysinger <vapier@gentoo.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Shiraz Hashim authored
commit 23aaed66 upstream. walk_page_range() silently skips vma having VM_PFNMAP set, which leads to undesirable behaviour at client end (who called walk_page_range). Userspace applications get the wrong data, so the effect is like just confusing users (if the applications just display the data) or sometimes killing the processes (if the applications do something with misunderstanding virtual addresses due to the wrong data.) For example for pagemap_read, when no callbacks are called against VM_PFNMAP vma, pagemap_read may prepare pagemap data for next virtual address range at wrong index. Eventually userspace may get wrong pagemap data for a task. Corresponding to a VM_PFNMAP marked vma region, kernel may report mappings from subsequent vma regions. User space in turn may account more pages (than really are) to the task. In my case I was using procmem, procrack (Android utility) which uses pagemap interface to account RSS pages of a task. Due to this bug it was giving a wrong picture for vmas (with VM_PFNMAP set). Fixes: a9ff785e ("mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas") Signed-off-by: Shiraz Hashim <shashim@codeaurora.org> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hemmo Nieminen authored
commit c7754e75 upstream. As printk() invocation can cause e.g. a TLB miss, printk() cannot be called before the exception handlers have been properly initialized. This can happen e.g. when netconsole has been loaded as a kernel module and the TLB table has been cleared when a CPU was offline. Call cpu_report() in start_secondary() only after the exception handlers have been initialized to fix this. Without the patch the kernel will randomly either lockup or crash after a CPU is onlined and the console driver is a module. Signed-off-by: Hemmo Nieminen <hemmo.nieminen@iki.fi> Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/8953/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Felix Fietkau authored
commit a3e6c1ef upstream. If the irq_chip does not define .irq_disable, any call to disable_irq will defer disabling the IRQ until it fires while marked as disabled. This assumes that the handler function checks for this condition, which handle_percpu_irq does not. In this case, calling disable_irq leads to an IRQ storm, if the interrupt fires while disabled. This optimization is only useful when disabling the IRQ is slow, which is not true for the MIPS CPU IRQ. Disable this optimization by implementing .irq_disable and .irq_enable Signed-off-by: Felix Fietkau <nbd@openwrt.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/8949/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Charlotte Richardson authored
commit 51ac3d2f upstream. NEC OEMs the same platforms as Stratus does, which have multiple devices on some PCIe buses under downstream ports. Link: https://bugzilla.kernel.org/show_bug.cgi?id=51331 Fixes: 1278998f ("PCI: Work around Stratus ftServer broken PCIe hierarchy (fix DMI check)") Signed-off-by: Charlotte Richardson <charlotte.richardson@stratus.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Myron Stowe <myron.stowe@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Johan Hovold authored
commit 49d2ca84 upstream. Fix memory leak in the gpio sysfs interface due to failure to drop reference to device returned by class_find_device when setting the gpio-line polarity. Fixes: 07697461 ("gpiolib: add support for changing value polarity in sysfs") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Johan Hovold authored
commit 0f303db0 upstream. Fix memory leak in the gpio sysfs interface due to failure to drop reference to device returned by class_find_device when creating a link. Fixes: a4177ee7 ("gpiolib: allow exported GPIO nodes to be named using sysfs links") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 06 Feb, 2015 9 commits
-
-
Greg Kroah-Hartman authored
-
Nicholas Bellinger authored
commit 046ba642 upstream. This patch drops the arbitrary maximum I/O size limit in sbc_parse_cdb(), which currently for fabric_max_sectors is hardcoded to 8192 (4 MB for 512 byte sector devices), and for hw_max_sectors is a backend driver dependent value. This limit is problematic because Linux initiators have only recently started to honor block limits MAXIMUM TRANSFER LENGTH, and other non-Linux based initiators (eg: MSFT Fibre Channel) can also generate I/Os larger than 4 MB in size. Currently when this happens, the following message will appear on the target resulting in I/Os being returned with non recoverable status: SCSI OP 28h with too big sectors 16384 exceeds fabric_max_sectors: 8192 Instead, drop both [fabric,hw]_max_sector checks in sbc_parse_cdb(), and convert the existing hw_max_sectors into a purely informational attribute used to represent the granuality that backend driver and/or subsystem code is splitting I/Os upon. Also, update FILEIO with an explicit FD_MAX_BYTES check in fd_execute_rw() to deal with the one special iovec limitiation case. v2 changes: - Drop hw_max_sectors check in sbc_parse_cdb() Reported-by: Lance Gropper <lance.gropper@qosserver.com> Reported-by: Stefan Priebe <s.priebe@profihost.ag> Cc: Christoph Hellwig <hch@lst.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sagi Grimberg authored
commit b02efbfc upstream. In situations such as bond failover, The new session establishment implicitly invokes the termination of the old connection. So, we don't want to wait for the old connection wait_conn to completely terminate before we accept the new connection and post a login response. The solution is to deffer the comp_wait completion and the conn_put to a work so wait_conn will effectively be non-blocking (flush errors are assumed to come very fast). We allocate isert_release_wq with WQ_UNBOUND and WQ_UNBOUND_MAX_ACTIVE to spread the concurrency of release works. Reported-by: Slava Shwartsman <valyushash@gmail.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sagi Grimberg authored
commit ca6c1d82 upstream. The np listener cm_id will also get ADDR_CHANGE event upcall (in case it is bound to a specific IP). Handle it correctly by creating a new cm_id and implicitly destroy the old one. Since this is the second event a listener np cm_id may encounter, we move the np cm_id event handling to a routine. Squashed: iser-target: Move cma_id setup to a function Reported-by: Slava Shwartsman <valyushash@gmail.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sagi Grimberg authored
commit 19e2090f upstream. Take isert_conn pointer from cm_id->qp->qp_context. This will allow us to know that the cm_id context is always the network portal. This will make the cm_id event check (connection or network portal) more reliable. In order to avoid a NULL dereference in cma_id->qp->qp_context we destroy the qp after we destroy the cm_id (and make the dereference safe). session stablishment/teardown sequences can happen in parallel, we should take into account that connected_handler might race with connection teardown flow. Also, protect isert_conn->conn_device->active_qps decrement within the error patch during QP creation failure and the normal teardown path in isert_connect_release(). Squashed: iser-target: Decrement completion context active_qps in error flow Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sagi Grimberg authored
commit 2371e5da upstream. There is no point in accepting a new CM request only when we are completely done with the last iscsi login. Instead we accept immediately, this will also cause the CM connection to reach connected state and the initiator is allowed to send the first login. We mark that we got the initial login and let iscsi layer pick it up when it gets there. This reduces the parallel login sequence by a factor of more then 4 (and more for multi-login) and also prevents the initiator (who does all logins in parallel) from giving up on login timeout expiration. In order to support multiple login requests sequence (CHAP) we call isert_rx_login_req from isert_rx_completion insead of letting isert_get_login_rx call it. Squashed: iser-target: Use kref_get_unless_zero in connected_handler iser-target: Acquire conn_mutex when changing connection state iser-target: Reject connect request in failure path Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sagi Grimberg authored
commit 128e9cc8 upstream. ISER_CONN_UP state is not sufficient to know if we should wait for completion of flush errors and disconnected_handler event. Instead, split it to 2 states: - ISER_CONN_UP: Got to CM connected phase, This state indicates that we need to wait for a CM disconnect event before going to teardown. - ISER_CONN_FULL_FEATURE: Got to full feature phase after we posted login response, This state indicates that we posted recv buffers and we need to wait for flush completions before going to teardown. Also avoid deffering disconnected handler to a work, and handle it within disconnected handler. More work here is needed to handle DEVICE_REMOVAL event correctly (cleanup all resources). Squashed: iser-target: Don't deffer disconnected handler to a work Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sagi Grimberg authored
commit 954f2372 upstream. Since commit 0fc4ea70 ("Target/iser: Don't put isert_conn inside disconnected handler") we put the conn kref in isert_wait_conn, so we need .wait_conn to be invoked also in the error path. Introduce call to isert_conn_terminate (called under lock) which transitions the connection state to TERMINATING and calls rdma_disconnect. If the state is already teminating, just bail out back (temination started). Also, make sure to destroy the connection when getting a connect error event if didn't get to connected (state UP). Same for the handling of REJECTED and UNREACHABLE cma events. Squashed: iscsi-target: Add call to wait_conn in establishment error flow Reported-by: Slava Shwartsman <valyushash@gmail.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nicholas Bellinger authored
commit 46243860 upstream. While looking at hch's recent conversion to drop the MSG_*_TAG definitions, I noticed a long standing bug in vhost-scsi where the VIRTIO_SCSI_S_* attribute definitions where incorrectly being passed directly into target_submit_cmd_map_sgls(). This patch adds the missing virtio-scsi to TCM/SAM task attribute conversion. Cc: Christoph Hellwig <hch@lst.de> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-