- 28 Oct, 2015 34 commits
-
-
David Howells authored
commit f05819df upstream. The following sequence of commands: i=`keyctl add user a a @s` keyctl request2 keyring foo bar @t keyctl unlink $i @s tries to invoke an upcall to instantiate a keyring if one doesn't already exist by that name within the user's keyring set. However, if the upcall fails, the code sets keyring->type_data.reject_error to -ENOKEY or some other error code. When the key is garbage collected, the key destroy function is called unconditionally and keyring_destroy() uses list_empty() on keyring->type_data.link - which is in a union with reject_error. Subsequently, the kernel tries to unlink the keyring from the keyring names list - which oopses like this: BUG: unable to handle kernel paging request at 00000000ffffff8a IP: [<ffffffff8126e051>] keyring_destroy+0x3d/0x88 ... Workqueue: events key_garbage_collector ... RIP: 0010:[<ffffffff8126e051>] keyring_destroy+0x3d/0x88 RSP: 0018:ffff88003e2f3d30 EFLAGS: 00010203 RAX: 00000000ffffff82 RBX: ffff88003bf1a900 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000000003bfc6901 RDI: ffffffff81a73a40 RBP: ffff88003e2f3d38 R08: 0000000000000152 R09: 0000000000000000 R10: ffff88003e2f3c18 R11: 000000000000865b R12: ffff88003bf1a900 R13: 0000000000000000 R14: ffff88003bf1a908 R15: ffff88003e2f4000 ... CR2: 00000000ffffff8a CR3: 000000003e3ec000 CR4: 00000000000006f0 ... Call Trace: [<ffffffff8126c756>] key_gc_unused_keys.constprop.1+0x5d/0x10f [<ffffffff8126ca71>] key_garbage_collector+0x1fa/0x351 [<ffffffff8105ec9b>] process_one_work+0x28e/0x547 [<ffffffff8105fd17>] worker_thread+0x26e/0x361 [<ffffffff8105faa9>] ? rescuer_thread+0x2a8/0x2a8 [<ffffffff810648ad>] kthread+0xf3/0xfb [<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2 [<ffffffff815f2ccf>] ret_from_fork+0x3f/0x70 [<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2 Note the value in RAX. This is a 32-bit representation of -ENOKEY. The solution is to only call ->destroy() if the key was successfully instantiated. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Cc: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
David Howells authored
commit 94c4554b upstream. There appears to be a race between: (1) key_gc_unused_keys() which frees key->security and then calls keyring_destroy() to unlink the name from the name list (2) find_keyring_by_name() which calls key_permission(), thus accessing key->security, on a key before checking to see whether the key usage is 0 (ie. the key is dead and might be cleaned up). Fix this by calling ->destroy() before cleaning up the core key data - including key->security. Reported-by: Petr Matousek <pmatouse@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Sabrina Dubroca authored
Without this length argument, we can read past the end of the iovec in memcpy_toiovec because we have no way of knowing the total length of the iovec's buffers. This is needed for stable kernels where 89c22d8c ("net: Fix skb csum races when peeking") has been backported but that don't have the ioviter conversion, which is almost all the stable trees <= 3.18. This also fixes a kernel crash for NFS servers when the client uses -onfsvers=3,proto=udp to mount the export. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> [ luis: backported to 3.16: - dropped changes to net/rxrpc/ar-recvmsg.c ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Arad, Ronen authored
commit db65a3aa upstream. netlink_dump() allocates skb based on the calculated min_dump_alloc or a per socket max_recvmsg_len. min_alloc_size is maximum space required for any single netdev attributes as calculated by rtnl_calcit(). max_recvmsg_len tracks the user provided buffer to netlink_recvmsg. It is capped at 16KiB. The intention is to avoid small allocations and to minimize the number of calls required to obtain dump information for all net devices. netlink_dump packs as many small messages as could fit within an skb that was sized for the largest single netdev information. The actual space available within an skb is larger than what is requested. It could be much larger and up to near 2x with align to next power of 2 approach. Allowing netlink_dump to use all the space available within the allocated skb increases the buffer size a user has to provide to avoid truncaion (i.e. MSG_TRUNG flag set). It was observed that with many VLANs configured on at least one netdev, a larger buffer of near 64KiB was necessary to avoid "Message truncated" error in "ip link" or "bridge [-c[ompressvlans]] vlan show" when min_alloc_size was only little over 32KiB. This patch trims skb to allocated size in order to allow the user to avoid truncation with more reasonable buffer size. Signed-off-by: Ronen Arad <ronen.arad@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Konstantin Khlebnikov authored
commit 598c12d0 upstream. When openvswitch tries allocate memory from offline numa node 0: stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO, 0) It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)) [ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h This patch disables numa affinity in this case. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Joe Perches authored
commit 077cb37f upstream. It seems that kernel memory can leak into userspace by a kmalloc, ethtool_get_strings, then copy_to_user sequence. Avoid this by using kcalloc to zero fill the copied buffer. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Guillaume Nault authored
commit e6740165 upstream. Since commit 2b018d57 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release"), pppoe_release() calls dev_put(po->pppoe_dev) if sk is in the PPPOX_ZOMBIE state. But pppoe_flush_dev() can set sk->sk_state to PPPOX_ZOMBIE _and_ reset po->pppoe_dev to NULL. This leads to the following oops: [ 570.140800] BUG: unable to handle kernel NULL pointer dereference at 00000000000004e0 [ 570.142931] IP: [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe] [ 570.144601] PGD 3d119067 PUD 3dbc1067 PMD 0 [ 570.144601] Oops: 0000 [#1] SMP [ 570.144601] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppoe pppox ppp_generic slhc loop crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper acpi_cpufreq evdev serio_raw processor button ext4 crc16 mbcache jbd2 virtio_net virtio_blk virtio_pci virtio_ring virtio [ 570.144601] CPU: 1 PID: 15738 Comm: ppp-apitest Not tainted 4.2.0 #1 [ 570.144601] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 570.144601] task: ffff88003d30d600 ti: ffff880036b60000 task.ti: ffff880036b60000 [ 570.144601] RIP: 0010:[<ffffffffa018c701>] [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe] [ 570.144601] RSP: 0018:ffff880036b63e08 EFLAGS: 00010202 [ 570.144601] RAX: 0000000000000000 RBX: ffff880034340000 RCX: 0000000000000206 [ 570.144601] RDX: 0000000000000006 RSI: ffff88003d30dd20 RDI: ffff88003d30dd20 [ 570.144601] RBP: ffff880036b63e28 R08: 0000000000000001 R09: 0000000000000000 [ 570.144601] R10: 00007ffee9b50420 R11: ffff880034340078 R12: ffff8800387ec780 [ 570.144601] R13: ffff8800387ec7b0 R14: ffff88003e222aa0 R15: ffff8800387ec7b0 [ 570.144601] FS: 00007f5672f48700(0000) GS:ffff88003fc80000(0000) knlGS:0000000000000000 [ 570.144601] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 570.144601] CR2: 00000000000004e0 CR3: 0000000037f7e000 CR4: 00000000000406a0 [ 570.144601] Stack: [ 570.144601] ffffffffa018f240 ffff8800387ec780 ffffffffa018f240 ffff8800387ec7b0 [ 570.144601] ffff880036b63e48 ffffffff812caabe ffff880039e4e000 0000000000000008 [ 570.144601] ffff880036b63e58 ffffffff812cabad ffff880036b63ea8 ffffffff811347f5 [ 570.144601] Call Trace: [ 570.144601] [<ffffffff812caabe>] sock_release+0x1a/0x75 [ 570.144601] [<ffffffff812cabad>] sock_close+0xd/0x11 [ 570.144601] [<ffffffff811347f5>] __fput+0xff/0x1a5 [ 570.144601] [<ffffffff811348cb>] ____fput+0x9/0xb [ 570.144601] [<ffffffff81056682>] task_work_run+0x66/0x90 [ 570.144601] [<ffffffff8100189e>] prepare_exit_to_usermode+0x8c/0xa7 [ 570.144601] [<ffffffff81001a26>] syscall_return_slowpath+0x16d/0x19b [ 570.144601] [<ffffffff813babb1>] int_ret_from_sys_call+0x25/0x9f [ 570.144601] Code: 48 8b 83 c8 01 00 00 a8 01 74 12 48 89 df e8 8b 27 14 e1 b8 f7 ff ff ff e9 b7 00 00 00 8a 43 12 a8 0b 74 1c 48 8b 83 a8 04 00 00 <48> 8b 80 e0 04 00 00 65 ff 08 48 c7 83 a8 04 00 00 00 00 00 00 [ 570.144601] RIP [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe] [ 570.144601] RSP <ffff880036b63e08> [ 570.144601] CR2: 00000000000004e0 [ 570.200518] ---[ end trace 46956baf17349563 ]--- pppoe_flush_dev() has no reason to override sk->sk_state with PPPOX_ZOMBIE. pppox_unbind_sock() already sets sk->sk_state to PPPOX_DEAD, which is the correct state given that sk is unbound and po->pppoe_dev is NULL. Fixes: 2b018d57 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release") Tested-by: Oleksii Berezhniak <core@irc.lg.ua> Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Eric Dumazet authored
commit c7c49b8f upstream. Greg reported crashes hitting the following check in __sk_backlog_rcv() BUG_ON(!sock_flag(sk, SOCK_MEMALLOC)); The pfmemalloc bit is currently checked in sk_filter(). This works correctly for TCP, because sk_filter() is ran in tcp_v[46]_rcv() before hitting the prequeue or backlog checks. For UDP or other protocols, this does not work, because the sk_filter() is ran from sock_queue_rcv_skb(), which might be called _after_ backlog queuing if socket is owned by user by the time packet is processed by softirq handler. Fixes: b4b9e355 ("netvm: set PF_MEMALLOC as appropriate during SKB processing") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Greg Thelen <gthelen@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Pravin B Shelar authored
commit 31b33dfb upstream. Earlier patch 6ae459bd tried to detect void ckecksum partial skb by comparing pull length to checksum offset. But it does not work for all cases since checksum-offset depends on updates to skb->data. Following patch fixes it by validating checksum start offset after skb-data pointer is updated. Negative value of checksum offset start means there is no need to checksum. Fixes: 6ae459bd ("skbuff: Fix skb checksum flag on skb pull") Reported-by: Andrew Vagin <avagin@odin.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Pravin B Shelar authored
commit 6ae459bd upstream. VXLAN device can receive skb with checksum partial. But the checksum offset could be in outer header which is pulled on receive. This results in negative checksum offset for the skb. Such skb can cause the assert failure in skb_checksum_help(). Following patch fixes the bug by setting checksum-none while pulling outer header. Following is the kernel panic msg from old kernel hitting the bug. ------------[ cut here ]------------ kernel BUG at net/core/dev.c:1906! RIP: 0010:[<ffffffff81518034>] skb_checksum_help+0x144/0x150 Call Trace: <IRQ> [<ffffffffa0164c28>] queue_userspace_packet+0x408/0x470 [openvswitch] [<ffffffffa016614d>] ovs_dp_upcall+0x5d/0x60 [openvswitch] [<ffffffffa0166236>] ovs_dp_process_packet_with_key+0xe6/0x100 [openvswitch] [<ffffffffa016629b>] ovs_dp_process_received_packet+0x4b/0x80 [openvswitch] [<ffffffffa016c51a>] ovs_vport_receive+0x2a/0x30 [openvswitch] [<ffffffffa0171383>] vxlan_rcv+0x53/0x60 [openvswitch] [<ffffffffa01734cb>] vxlan_udp_encap_recv+0x8b/0xf0 [openvswitch] [<ffffffff8157addc>] udp_queue_rcv_skb+0x2dc/0x3b0 [<ffffffff8157b56f>] __udp4_lib_rcv+0x1cf/0x6c0 [<ffffffff8157ba7a>] udp_rcv+0x1a/0x20 [<ffffffff8154fdbd>] ip_local_deliver_finish+0xdd/0x280 [<ffffffff81550128>] ip_local_deliver+0x88/0x90 [<ffffffff8154fa7d>] ip_rcv_finish+0x10d/0x370 [<ffffffff81550365>] ip_rcv+0x235/0x300 [<ffffffff8151ba1d>] __netif_receive_skb+0x55d/0x620 [<ffffffff8151c360>] netif_receive_skb+0x80/0x90 [<ffffffff81459935>] virtnet_poll+0x555/0x6f0 [<ffffffff8151cd04>] net_rx_action+0x134/0x290 [<ffffffff810683d8>] __do_softirq+0xa8/0x210 [<ffffffff8162fe6c>] call_softirq+0x1c/0x30 [<ffffffff810161a5>] do_softirq+0x65/0xa0 [<ffffffff810687be>] irq_exit+0x8e/0xb0 [<ffffffff81630733>] do_IRQ+0x63/0xe0 [<ffffffff81625f2e>] common_interrupt+0x6e/0x6e Reported-by: Anupam Chanda <achanda@vmware.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Andrey Vagin authored
commit e9193d60 upstream. Now send with MSG_PEEK can return data from multiple SKBs. Unfortunately we take into account the peek offset for each skb, that is wrong. We need to apply the peek offset only once. In addition, the peek offset should be used only if MSG_PEEK is set. Cc: "David S. Miller" <davem@davemloft.net> (maintainer:NETWORKING Cc: Eric Dumazet <edumazet@google.com> (commit_signer:1/14=7%) Cc: Aaron Conole <aconole@bytheb.org> Fixes: 9f389e35 ("af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag") Signed-off-by: Andrey Vagin <avagin@openvz.org> Tested-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Aaron Conole authored
commit 9f389e35 upstream. AF_UNIX sockets now return multiple skbs from recv() when MSG_PEEK flag is set. This is referenced in kernel bugzilla #12323 @ https://bugzilla.kernel.org/show_bug.cgi?id=12323 As described both in the BZ and lkml thread @ http://lkml.org/lkml/2008/1/8/444 calling recv() with MSG_PEEK on an AF_UNIX socket only reads a single skb, where the desired effect is to return as much skb data has been queued, until hitting the recv buffer size (whichever comes first). The modified MSG_PEEK path will now move to the next skb in the tree and jump to the again: label, rather than following the natural loop structure. This requires duplicating some of the loop head actions. This was tested using the python socketpair python code attached to the bugzilla issue. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net> [ luis: backported to 3.16: used davem's backport to 3.14 ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Aaron Conole authored
commit 4613012d upstream. As suggested by Eric Dumazet this change replaces the complaints by the compiler when misusing the API. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Alexander Couzens authored
commit 06a15f51 upstream. There is a small chance that tunnel_free() is called before tunnel->del_work scheduled resulting in a zero pointer dereference. Signed-off-by: Alexander Couzens <lynxis@fe80.eu> Acked-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Nikhil Badola authored
commit f8786a91 upstream. Incoming packets in high speed are randomly corrupted by h/w resulting in multiple errors. This workaround makes FS as default mode in all affected socs by disabling HS chirp signalling.This errata does not affect FS and LS mode. Forces all HS devices to connect in FS mode for all socs affected by this erratum: P3041 and P2041 rev 1.0 and 1.1 P5020 and P5010 rev 1.0 and 2.0 P5040, P1010 and T4240 rev 1.0 Signed-off-by: Ramneek Mehresh <ramneek.mehresh@freescale.com> Signed-off-by: Nikhil Badola <nikhil.badola@freescale.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Nikhil Badola authored
commit 523f1dec upstream. USB controller version-2.5 requires to enable internal UTMI phy and program PTS field in PORTSC register before asserting controller reset. This is must for successful resetting of the controller and subsequent enumeration of usb devices Signed-off-by: Nikhil Badola <nikhil.badola@freescale.com> Signed-off-by: Suresh Gupta <suresh.gupta@freescale.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Manfred Spraul authored
commit e8577d1f upstream. ipc_addid() makes a new ipc identifier visible to everyone. New objects start as locked, so that the caller can complete the initialization after the call. Within struct sem_array, at least sma->sem_base and sma->sem_nsems are accessed without any locks, therefore this approach doesn't work. Thus: Move the ipc_addid() to the end of the initialization. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Reported-by: Rik van Riel <riel@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
lucien authored
commit f648f807 upstream. Commit f8d96052 ("sctp: Enforce retransmission limit during shutdown") fixed a problem with excessive retransmissions in the SHUTDOWN_PENDING by not resetting the association overall_error_count. This allowed the association to better enforce assoc.max_retrans limit. However, the same issue still exists when the association is in SHUTDOWN_RECEIVED state. In this state, HB-ACKs will continue to reset the overall_error_count for the association would extend the lifetime of association unnecessarily. This patch solves this by resetting the overall_error_count whenever the current state is small then SCTP_STATE_SHUTDOWN_PENDING. As a small side-effect, we end up also handling SCTP_STATE_SHUTDOWN_ACK_SENT and SCTP_STATE_SHUTDOWN_SENT states, but they are not really impacted because we disable Heartbeats in those states. Fixes: Commit f8d96052 ("sctp: Enforce retransmission limit during shutdown") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Jann Horn authored
commit fbb18169 upstream. It was possible for an attacking user to trick root (or another user) into writing his coredumps into an attacker-readable, pre-existing file using rename() or link(), causing the disclosure of secret data from the victim process' virtual memory. Depending on the configuration, it was also possible to trick root into overwriting system files with coredumps. Fix that issue by never writing coredumps into existing files. Requirements for the attack: - The attack only applies if the victim's process has a nonzero RLIMIT_CORE and is dumpable. - The attacker can trick the victim into coredumping into an attacker-writable directory D, either because the core_pattern is relative and the victim's cwd is attacker-writable or because an absolute core_pattern pointing to a world-writable directory is used. - The attacker has one of these: A: on a system with protected_hardlinks=0: execute access to a folder containing a victim-owned, attacker-readable file on the same partition as D, and the victim-owned file will be deleted before the main part of the attack takes place. (In practice, there are lots of files that fulfill this condition, e.g. entries in Debian's /var/lib/dpkg/info/.) This does not apply to most Linux systems because most distros set protected_hardlinks=1. B: on a system with protected_hardlinks=1: execute access to a folder containing a victim-owned, attacker-readable and attacker-writable file on the same partition as D, and the victim-owned file will be deleted before the main part of the attack takes place. (This seems to be uncommon.) C: on any system, independent of protected_hardlinks: write access to a non-sticky folder containing a victim-owned, attacker-readable file on the same partition as D (This seems to be uncommon.) The basic idea is that the attacker moves the victim-owned file to where he expects the victim process to dump its core. The victim process dumps its core into the existing file, and the attacker reads the coredump from it. If the attacker can't move the file because he does not have write access to the containing directory, he can instead link the file to a directory he controls, then wait for the original link to the file to be deleted (because the kernel checks that the link count of the corefile is 1). A less reliable variant that requires D to be non-sticky works with link() and does not require deletion of the original link: link() the file into D, but then unlink() it directly before the kernel performs the link count check. On systems with protected_hardlinks=0, this variant allows an attacker to not only gain information from coredumps, but also clobber existing, victim-writable files with coredumps. (This could theoretically lead to a privilege escalation.) Signed-off-by: Jann Horn <jann@thejh.net> Cc: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Thomas Hellstrom authored
commit ed7d78b2 upstream. The commit "drm/vmwgfx: Fix up user_dmabuf refcounting", while fixing a kernel crash introduced a NULL pointer dereference on older hardware. Fix this. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Joonsoo Kim authored
commit 03a2d2a3 upstream. Commit description is copied from the original post of this bug: http://comments.gmane.org/gmane.linux.kernel.mm/135349 Kernels after v3.9 use kmalloc_size(INDEX_NODE + 1) to get the next larger cache size than the size index INDEX_NODE mapping. In kernels 3.9 and earlier we used malloc_sizes[INDEX_L3 + 1].cs_size. However, sometimes we can't get the right output we expected via kmalloc_size(INDEX_NODE + 1), causing a BUG(). The mapping table in the latest kernel is like: index = {0, 1, 2 , 3, 4, 5, 6, n} size = {0, 96, 192, 8, 16, 32, 64, 2^n} The mapping table before 3.10 is like this: index = {0 , 1 , 2, 3, 4 , 5 , 6, n} size = {32, 64, 96, 128, 192, 256, 512, 2^(n+3)} The problem on my mips64 machine is as follows: (1) When configured DEBUG_SLAB && DEBUG_PAGEALLOC && DEBUG_LOCK_ALLOC && DEBUG_SPINLOCK, the sizeof(struct kmem_cache_node) will be "150", and the macro INDEX_NODE turns out to be "2": #define INDEX_NODE kmalloc_index(sizeof(struct kmem_cache_node)) (2) Then the result of kmalloc_size(INDEX_NODE + 1) is 8. (3) Then "if(size >= kmalloc_size(INDEX_NODE + 1)" will lead to "size = PAGE_SIZE". (4) Then "if ((size >= (PAGE_SIZE >> 3))" test will be satisfied and "flags |= CFLGS_OFF_SLAB" will be covered. (5) if (flags & CFLGS_OFF_SLAB)" test will be satisfied and will go to "cachep->slabp_cache = kmalloc_slab(slab_size, 0u)", and the result here may be NULL while kernel bootup. (6) Finally,"BUG_ON(ZERO_OR_NULL_PTR(cachep->slabp_cache));" causes the BUG info as the following shows (may be only mips64 has this problem): This patch fixes the problem of kmalloc_size(INDEX_NODE + 1) and removes the BUG by adding 'size >= 256' check to guarantee that all necessary small sized slabs are initialized regardless sequence of slab size in mapping table. Fixes: e3366016 ("slab: Use common kmalloc_index/kmalloc_size...") Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reported-by: Liuhailong <liu.hailong6@zte.com.cn> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Andy Shevchenko authored
commit 6bea0f6d upstream. In case we have less than maximum allowed channels (8) and autoconfiguration is enabled the DWC_PARAMS read is wrong because it uses different arithmetic to what is needed for channel priority setup. Re-do the caclulations properly. This now works on AVR32 board well. Fixes: fed2574b (dw_dmac: introduce software emulation of LLP transfers) Cc: yitian.bu@tangramtek.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
John Stultz authored
commit 67dfae0c upstream. This patch fixes one cases where abs() was being used with 64-bit nanosecond values, where the result may be capped at 32-bits. This potentially could cause watchdog false negatives on 32-bit systems, so this patch addresses the issue by using abs64(). Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1442279124-7309-2-git-send-email-john.stultz@linaro.orgSigned-off-by: Thomas Gleixner <tglx@linutronix.de> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Stephen Smalley authored
commit ab76f7b4 upstream. Unused space between the end of __ex_table and the start of rodata can be left W+x in the kernel page tables. Extend the setting of the NX bit to cover this gap by starting from text_end rather than rodata_start. Before: ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd 0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte 0xffffffff81754000-0xffffffff81800000 688K RW GLB x pte 0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd 0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte 0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte 0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd 0xffffffff82200000-0xffffffffa0000000 478M pmd After: ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd 0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte 0xffffffff81754000-0xffffffff81800000 688K RW GLB NX pte 0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd 0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte 0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte 0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd 0xffffffff82200000-0xffffffffa0000000 478M pmd Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.govSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Mel Gorman authored
commit 2f84a899 upstream. SunDong reported the following on https://bugzilla.kernel.org/show_bug.cgi?id=103841 I think I find a linux bug, I have the test cases is constructed. I can stable recurring problems in fedora22(4.0.4) kernel version, arch for x86_64. I construct transparent huge page, when the parent and child process with MAP_SHARE, MAP_PRIVATE way to access the same huge page area, it has the opportunity to lead to huge page copy on write failure, and then it will munmap the child corresponding mmap area, but then the child mmap area with VM_MAYSHARE attributes, child process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags functions (vma - > vm_flags & VM_MAYSHARE). There were a number of problems with the report (e.g. it's hugetlbfs that triggers this, not transparent huge pages) but it was fundamentally correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that looks like this vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000 next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800 prot 8000000000000027 anon_vma (null) vm_ops ffffffff8182a7a0 pgoff 0 file ffff88106bdb9800 private_data (null) flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb) ------------ kernel BUG at mm/hugetlb.c:462! SMP Modules linked in: xt_pkttype xt_LOG xt_limit [..] CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1 Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012 set_vma_resv_flags+0x2d/0x30 The VM_BUG_ON is correct because private and shared mappings have different reservation accounting but the warning clearly shows that the VMA is shared. When a private COW fails to allocate a new page then only the process that created the VMA gets the page -- all the children unmap the page. If the children access that data in the future then they get killed. The problem is that the same file is mapped shared and private. During the COW, the allocation fails, the VMAs are traversed to unmap the other private pages but a shared VMA is found and the bug is triggered. This patch identifies such VMAs and skips them. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: SunDong <sund_sky@126.com> Reviewed-by: Michal Hocko <mhocko@suse.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: David Rientjes <rientjes@google.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Dirk Müller authored
commit d2922422 upstream. The cpu feature flags are not ever going to change, so warning everytime can cause a lot of kernel log spam (in our case more than 10GB/hour). The warning seems to only occur when nested virtualization is enabled, so it's probably triggered by a KVM bug. This is a sensible and safe change anyway, and the KVM bug fix might not be suitable for stable releases anyway. Signed-off-by: Dirk Mueller <dmueller@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Matt Fleming authored
commit a5caa209 upstream. Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced that signals that the firmware PE/COFF loader supports splitting code and data sections of PE/COFF images into separate EFI memory map entries. This allows the kernel to map those regions with strict memory protections, e.g. EFI_MEMORY_RO for code, EFI_MEMORY_XP for data, etc. Unfortunately, an unwritten requirement of this new feature is that the regions need to be mapped with the same offsets relative to each other as observed in the EFI memory map. If this is not done crashes like this may occur, BUG: unable to handle kernel paging request at fffffffefe6086dd IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd Call Trace: [<ffffffff8104c90e>] efi_call+0x7e/0x100 [<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90 [<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70 [<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392 [<ffffffff81f37e1b>] start_kernel+0x38a/0x417 [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef Here 0xfffffffefe6086dd refers to an address the firmware expects to be mapped but which the OS never claimed was mapped. The issue is that included in these regions are relative addresses to other regions which were emitted by the firmware toolchain before the "splitting" of sections occurred at runtime. Needless to say, we don't satisfy this unwritten requirement on x86_64 and instead map the EFI memory map entries in reverse order. The above crash is almost certainly triggerable with any kernel newer than v3.13 because that's when we rewrote the EFI runtime region mapping code, in commit d2f7cbe7 ("x86/efi: Runtime services virtual mapping"). For kernel versions before v3.13 things may work by pure luck depending on the fragmentation of the kernel virtual address space at the time we map the EFI regions. Instead of mapping the EFI memory map entries in reverse order, where entry N has a higher virtual address than entry N+1, map them in the same order as they appear in the EFI memory map to preserve this relative offset between regions. This patch has been kept as small as possible with the intention that it should be applied aggressively to stable and distribution kernels. It is very much a bugfix rather than support for a new feature, since when EFI_PROPERTIES_TABLE is enabled we must map things as outlined above to even boot - we have no way of asking the firmware not to split the code/data regions. In fact, this patch doesn't even make use of the more strict memory protections available in UEFI v2.5. That will come later. Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chun-Yi <jlee@suse.com> Cc: Dave Young <dyoung@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: James Bottomley <JBottomley@Odin.com> Cc: Lee, Chun-Yi <jlee@suse.com> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443218539-7610-2-git-send-email-matt@codeblueprint.co.ukSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Ben Hutchings authored
commit 95c2b175 upstream. Per-IRQ directories in procfs are created only when a handler is first added to the irqdesc, not when the irqdesc is created. In the case of a shared IRQ, multiple tasks can race to create a directory. This race condition seems to have been present forever, but is easier to hit with async probing. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Link: http://lkml.kernel.org/r/1443266636.2004.2.camel@decadent.org.ukSigned-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Fabiano Fidêncio authored
commit 8d0d9401 upstream. When disabling/enabling a crtc the primary area must be updated independently of which crtc has been disabled/enabled. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1264735Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Linus Torvalds authored
commit b9a53227 upstream. As reported by Dmitry Vyukov, we really shouldn't do ipc_addid() before having initialized the IPC object state. Yes, we initialize the IPC object in a locked state, but with all the lockless RCU lookup work, that IPC object lock no longer means that the state cannot be seen. We already did this for the IPC semaphore code (see commit e8577d1f: "ipc/sem.c: fully initialize sem_array before making it visible") but we clearly forgot about msg and shm. Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Paul Burton authored
commit 7a63076d upstream. The CONFIG_MIPS_MT symbol can be selected by CONFIG_MIPS_VPE_LOADER in addition to CONFIG_MIPS_MT_SMP. We only want MT code in the CPS SMP boot vector if we're using MT for SMP. Thus switch the config symbol we ifdef against to CONFIG_MIPS_MT_SMP. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/10867/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Paul Burton authored
commit a5b0f6db upstream. The MT-specific code in mips_cps_boot_vpes can safely be omitted from kernels which don't support MT, with the default VPE==0 case being used as it would be after the has_mt (Config3.MT) check failed at runtime. Discarding the code entirely will save us a few bytes & allow cleaner handling of MT ASE instructions by later patches. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/10866/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Paul Burton authored
commit 1e5fb282 upstream. The has_mt macro ended with a branch, leaving its callers with a delay slot that would be executed if Config3.MT is not set. However it would not be executed if Config3 (or earlier Config registers) don't exist which makes it somewhat inconsistent at best. Fill the delay slot in the macro & fix the mips_cps_boot_vpes caller appropriately. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/10865/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
James Hogan authored
commit 53960059 upstream. If there is a DMA zone (usually 24bit = 16MB I believe), but no DMA32 zone, as is the case for some 32-bit kernels, then massage_gfp_flags() will cause DMA memory allocated for devices with a 32..63-bit coherent_dma_mask to fall back to using __GFP_DMA, even though there may only be 32-bits of physical address available anyway. Correct that case to compare against a mask the size of phys_addr_t instead of always using a 64-bit mask. Signed-off-by: James Hogan <james.hogan@imgtec.com> Fixes: a2e715a8 ("MIPS: DMA: Fix computation of DMA flags from device's coherent_dma_mask.") Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/9610/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
- 19 Oct, 2015 6 commits
-
-
shengyong authored
commit 7c7feb2e upstream. UBI: attaching mtd1 to ubi0 UBI: scanning is finished UBI error: init_volumes: not enough PEBs, required 706, available 686 UBI error: ubi_wl_init: no enough physical eraseblocks (-20, need 1) UBI error: ubi_attach_mtd_dev: failed to attach mtd1, error -12 <= NOT ENOMEM UBI error: ubi_init: cannot attach mtd1 If available PEBs are not enough when initializing volumes, return -ENOSPC directly. If available PEBs are not enough when initializing WL, return -ENOSPC instead of -ENOMEM. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: David Gstir <david@sigma-star.at> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Richard Weinberger authored
commit 281fda27 upstream. Make sure that data_size is less than LEB size. Otherwise a handcrafted UBI image is able to trigger an out of bounds memory access in ubi_compare_lebs(). Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: David Gstir <david@sigma-star.at> [ luis: backported to 3.16: - no ubi_device parameter for the ubi_err() macro in 3.16 ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Andreas Schwab authored
commit 8474ba74 upstream. Make sure the compiler does not modify arguments of syscall functions. This can happen if the compiler generates a tailcall to another function. For example, without asmlinkage_protect sys_openat is compiled into this function: sys_openat: clr.l %d0 move.w 18(%sp),%d0 move.l %d0,16(%sp) jbra do_sys_open Note how the fourth argument is modified in place, modifying the register %d4 that gets restored from this stack slot when the function returns to user-space. The caller may expect the register to be unmodified across system calls. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Adrian Hunter authored
commit b5cabbcb upstream. A copy of /proc/kcore containing the kernel text can be made to the buildid cache. e.g. perf buildid-cache -v -k /proc/kcore To workaround objdump limitations, a copy is also made when annotating against /proc/kcore. The copying process stops working from libelf about v1.62 onwards (the problem was found with v1.63). The cause is that a call to gelf_getphdr() in kcore__add_phdr() fails because additional validation has been added to gelf_getphdr(). The use of gelf_getphdr() is a misguided attempt to get default initialization of the Gelf_Phdr structure. That should not be necessary because every member of the Gelf_Phdr structure is subsequently assigned. So just remove the call to gelf_getphdr(). Similarly, a call to gelf_getehdr() in gelf_kcore__init() can be removed also. Committer notes: Note to stable@kernel.org, from Adrian in the cover letter for this patchkit: The "Fix copying of /proc/kcore" problem goes back to v3.13 if you think it is important enough for stable. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443089122-19082-3-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Kapileshwar Singh authored
commit c2e4b24f upstream. When a trace recorded on a 32-bit device is processed with a 64-bit binary, the higher 32-bits of the address need to ignored. The lack of this results in the output of the 64-bit pointer value to the trace as the 32-bit address lookup fails in find_printk(). Before: burn-1778 [003] 548.600305: bputs: 0xc0046db2s: 2cec5c058d98c After: burn-1778 [003] 548.600305: bputs: 0xc0046db2s: RT throttling activated The problem occurs in PRINT_FIELD when the field is recognized as a pointer to a string (of the type const char *) Heterogeneous architectures cases below can arise and should be handled: * Traces recorded using 32-bit addresses processed on a 64-bit machine * Traces recorded using 64-bit addresses processed on a 32-bit machine Reported-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Kapileshwar Singh <kapileshwar.singh@arm.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: David Ahern <dsahern@gmail.com> Cc: Javi Merino <javi.merino@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1442928123-13824-1-git-send-email-kapileshwar.singh@arm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Prarit Bhargava authored
commit 7180dddf upstream. The kernel may delay interrupts for a long time which can result in timers being delayed. If this occurs the intel_pstate driver will crash with a divide by zero error: divide error: 0000 [#1] SMP Modules linked in: btrfs zlib_deflate raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc arc4 md4 nls_utf8 cifs dns_resolver tcp_lp bnep bluetooth rfkill fuse dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ftp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables intel_powerclamp coretemp vfat fat kvm_intel iTCO_wdt iTCO_vendor_support ipmi_devintf sr_mod kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel cdc_ether lrw usbnet cdrom mii gf128mul glue_helper ablk_helper cryptd lpc_ich mfd_core pcspkr sb_edac edac_core ipmi_si ipmi_msghandler ioatdma wmi shpchp acpi_pad nfsd auth_rpcgss nfs_acl lockd uinput dm_multipath sunrpc xfs libcrc32c usb_storage sd_mod crc_t10dif crct10dif_common ixgbe mgag200 syscopyarea sysfillrect sysimgblt mdio drm_kms_helper ttm igb drm ptp pps_core dca i2c_algo_bit megaraid_sas i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 113 PID: 0 Comm: swapper/113 Tainted: G W -------------- 3.10.0-229.1.2.el7.x86_64 #1 Hardware name: IBM x3950 X6 -[3837AC2]-/00FN827, BIOS -[A8E112BUS-1.00]- 08/27/2014 task: ffff880fe8abe660 ti: ffff880fe8ae4000 task.ti: ffff880fe8ae4000 RIP: 0010:[<ffffffff814a9279>] [<ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0 RSP: 0018:ffff883fff4e3db8 EFLAGS: 00010206 RAX: 0000000027100000 RBX: ffff883fe6965100 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000010 RDI: 000000002e53632d RBP: ffff883fff4e3e20 R08: 000e6f69a5a125c0 R09: ffff883fe84ec001 R10: 0000000000000002 R11: 0000000000000005 R12: 00000000000049f5 R13: 0000000000271000 R14: 00000000000049f5 R15: 0000000000000246 FS: 0000000000000000(0000) GS:ffff883fff4e0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7668601000 CR3: 000000000190a000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff883fff4e3e58 ffffffff81099dc1 0000000000000086 0000000000000071 ffff883fff4f3680 0000000000000071 fbdc8a965e33afee ffffffff810b69dd ffff883fe84ec000 ffff883fe6965108 0000000000000100 ffffffff814a9100 Call Trace: <IRQ> [<ffffffff81099dc1>] ? run_posix_cpu_timers+0x51/0x840 [<ffffffff810b69dd>] ? trigger_load_balance+0x5d/0x200 [<ffffffff814a9100>] ? pid_param_set+0x130/0x130 [<ffffffff8107df56>] call_timer_fn+0x36/0x110 [<ffffffff814a9100>] ? pid_param_set+0x130/0x130 [<ffffffff8107fdcf>] run_timer_softirq+0x21f/0x320 [<ffffffff81077b2f>] __do_softirq+0xef/0x280 [<ffffffff816156dc>] call_softirq+0x1c/0x30 [<ffffffff81015d95>] do_softirq+0x65/0xa0 [<ffffffff81077ec5>] irq_exit+0x115/0x120 [<ffffffff81616355>] smp_apic_timer_interrupt+0x45/0x60 [<ffffffff81614a1d>] apic_timer_interrupt+0x6d/0x80 <EOI> [<ffffffff814a9c32>] ? cpuidle_enter_state+0x52/0xc0 [<ffffffff814a9c28>] ? cpuidle_enter_state+0x48/0xc0 [<ffffffff814a9d65>] cpuidle_idle_call+0xc5/0x200 [<ffffffff8101d14e>] arch_cpu_idle+0xe/0x30 [<ffffffff810c67c1>] cpu_startup_entry+0xf1/0x290 [<ffffffff8104228a>] start_secondary+0x1ba/0x230 Code: 42 0f 00 45 89 e6 48 01 c2 43 8d 44 6d 00 39 d0 73 26 49 c1 e5 08 89 d2 4d 63 f4 49 63 c5 48 c1 e2 08 48 c1 e0 08 48 63 ca 48 99 <48> f7 f9 48 98 4c 0f af f0 49 c1 ee 08 8b 43 78 c1 e0 08 44 29 RIP [<ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0 RSP <ffff883fff4e3db8> The kernel values for cpudata for CPU 113 were: struct cpudata { cpu = 113, timer = { entry = { next = 0x0, prev = 0xdead000000200200 }, expires = 8357799745, base = 0xffff883fe84ec001, function = 0xffffffff814a9100 <intel_pstate_timer_func>, data = 18446612406765768960, <snip> i_gain = 0, d_gain = 0, deadband = 0, last_err = 22489 }, last_sample_time = { tv64 = 4063132438017305 }, prev_aperf = 287326796397463, prev_mperf = 251427432090198, sample = { core_pct_busy = 23081, aperf = 2937407, mperf = 3257884, freq = 2524484, time = { tv64 = 4063149215234118 } } } which results in the time between samples = last_sample_time - sample.time = 4063149215234118 - 4063132438017305 = 16777216813 which is 16.777 seconds. The duration between reads of the APERF and MPERF registers overflowed a s32 sized integer in intel_pstate_get_scaled_busy()'s call to div_fp(). The result is that int_tofp(duration_us) == 0, and the kernel attempts to divide by 0. While the kernel shouldn't be delaying for a long time, it can and does happen and the intel_pstate driver should not panic in this situation. This patch changes the div_fp() function to use div64_s64() to allow for "long" division. This will avoid the overflow condition on long delays. [v2]: use div64_s64() in div_fp() Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Renninger <trenn@suse.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-