- 09 Sep, 2016 4 commits
-
-
Dave Hansen authored
This patch adds two new system calls: int pkey_alloc(unsigned long flags, unsigned long init_access_rights) int pkey_free(int pkey); These implement an "allocator" for the protection keys themselves, which can be thought of as analogous to the allocator that the kernel has for file descriptors. The kernel tracks which numbers are in use, and only allows operations on keys that are valid. A key which was not obtained by pkey_alloc() may not, for instance, be passed to pkey_mprotect(). These system calls are also very important given the kernel's use of pkeys to implement execute-only support. These help ensure that userspace can never assume that it has control of a key unless it first asks the kernel. The kernel does not promise to preserve PKRU (right register) contents except for allocated pkeys. The 'init_access_rights' argument to pkey_alloc() specifies the rights that will be established for the returned pkey. For instance: pkey = pkey_alloc(flags, PKEY_DENY_WRITE); will allocate 'pkey', but also sets the bits in PKRU[1] such that writing to 'pkey' is already denied. The kernel does not prevent pkey_free() from successfully freeing in-use pkeys (those still assigned to a memory range by pkey_mprotect()). It would be expensive to implement the checks for this, so we instead say, "Just don't do it" since sane software will never do it anyway. Any piece of userspace calling pkey_alloc() needs to be prepared for it to fail. Why? pkey_alloc() returns the same error code (ENOSPC) when there are no pkeys and when pkeys are unsupported. They can be unsupported for a whole host of reasons, so apps must be prepared for this. Also, libraries or LD_PRELOADs might steal keys before an application gets access to them. This allocation mechanism could be implemented in userspace. Even if we did it in userspace, we would still need additional user/kernel interfaces to tell userspace which keys are being used by the kernel internally (such as for execute-only mappings). Having the kernel provide this facility completely removes the need for these additional interfaces, or having an implementation of this in userspace at all. Note that we have to make changes to all of the architectures that do not use mman-common.h because we use the new PKEY_DENY_ACCESS/WRITE macros in arch-independent code. 1. PKRU is the Protection Key Rights User register. It is a usermode-accessible register that controls whether writes and/or access to each individual pkey is allowed or denied. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163015.444FE75F@viggo.jf.intel.comSigned-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Dave Hansen authored
Today, mprotect() takes 4 bits of data: PROT_READ/WRITE/EXEC/NONE. Three of those bits: READ/WRITE/EXEC get translated directly in to vma->vm_flags by calc_vm_prot_bits(). If a bit is unset in mprotect()'s 'prot' argument then it must be cleared in vma->vm_flags during the mprotect() call. We do this clearing today by first calculating the VMA flags we want set, then clearing the ones we do not want to inherit from the original VMA: vm_flags = calc_vm_prot_bits(prot, key); ... newflags = vm_flags; newflags |= (vma->vm_flags & ~(VM_READ | VM_WRITE | VM_EXEC)); However, we *also* want to mask off the original VMA's vm_flags in which we store the protection key. To do that, this patch adds a new macro: ARCH_VM_PKEY_FLAGS which allows the architecture to specify additional bits that it would like cleared. We use that to ensure that the VM_PKEY_BIT* bits get cleared. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163013.E48D6981@viggo.jf.intel.comSigned-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Dave Hansen authored
pkey_mprotect() is just like mprotect, except it also takes a protection key as an argument. On systems that do not support protection keys, it still works, but requires that key=0. Otherwise it does exactly what mprotect does. I expect it to get used like this, if you want to guarantee that any mapping you create can *never* be accessed without the right protection keys set up. int real_prot = PROT_READ|PROT_WRITE; pkey = pkey_alloc(0, PKEY_DENY_ACCESS); ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey); This way, there is *no* window where the mapping is accessible since it was always either PROT_NONE or had a protection key set that denied all access. We settled on 'unsigned long' for the type of the key here. We only need 4 bits on x86 today, but I figured that other architectures might need some more space. Semantically, we have a bit of a problem if we combine this syscall with our previously-introduced execute-only support: What do we do when we mix execute-only pkey use with pkey_mprotect() use? For instance: pkey_mprotect(ptr, PAGE_SIZE, PROT_WRITE, 6); // set pkey=6 mprotect(ptr, PAGE_SIZE, PROT_EXEC); // set pkey=X_ONLY_PKEY? mprotect(ptr, PAGE_SIZE, PROT_WRITE); // is pkey=6 again? To solve that, we make the plain-mprotect()-initiated execute-only support only apply to VMAs that have the default protection key (0) set on them. Proposed semantics: 1. protection key 0 is special and represents the default, "unassigned" protection key. It is always allocated. 2. mprotect() never affects a mapping's pkey_mprotect()-assigned protection key. A protection key of 0 (even if set explicitly) represents an unassigned protection key. 2a. mprotect(PROT_EXEC) on a mapping with an assigned protection key may or may not result in a mapping with execute-only properties. pkey_mprotect() plus pkey_set() on all threads should be used to _guarantee_ execute-only semantics if this is not a strong enough semantic. 3. mprotect(PROT_EXEC) may result in an "execute-only" mapping. The kernel will internally attempt to allocate and dedicate a protection key for the purpose of execute-only mappings. This may not be possible in cases where there are no free protection keys available. It can also happen, of course, in situations where there is no hardware support for protection keys. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163012.3DDD36C4@viggo.jf.intel.comSigned-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Dave Hansen authored
PF_PK means that a memory access violated the protection key access restrictions. It is unconditionally an access_error() because the permissions set on the VMA don't matter (the PKRU value overrides it), and we never "resolve" PK faults (like how a COW can "resolve write fault). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163010.DD1FE1ED@viggo.jf.intel.comSigned-off-by: Thomas Gleixner <tglx@linutronix.de>
-
- 04 Sep, 2016 4 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fix from Thomas Gleixner: "A single fix for an AMD erratum so machines without a BIOS fix work" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/AMD: Apply erratum 665 on machines without a BIOS fix
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull timer fixes from Thomas Gleixner: "Two fixlet from the timers departement: - A fix for scheduler stalls in the tick idle code affecting NOHZ_FULL kernels - A trivial compile fix" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/nohz: Fix softlockup on scheduler stalls in kvm guest clocksource/drivers/atmel-pit: Fix compilation error
-
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dmLinus Torvalds authored
Pull device mapper fixes from Mike Snitzer: - a stable fix in both DM crypt and DM log-writes for too large bios (as generated by bcache) - two other stable fixes for DM log-writes - a stable fix for a DM crypt bug that could result in freeing pointers from uninitialized memory in the tfm allocation error path - a DM bufio cleanup to discontinue using create_singlethread_workqueue() * tag 'dm-4.8-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm bufio: remove use of deprecated create_singlethread_workqueue() dm crypt: fix free of bad values after tfm allocation failure dm crypt: fix error with too large bios dm log writes: fix check of kthread_run() return value dm log writes: fix bug with too large bios dm log writes: move IO accounting earlier to fix error path
-
- 03 Sep, 2016 9 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfsLinus Torvalds authored
Pull btrfs fixes from Chris Mason: "I'm still prepping a set of fixes for btrfs fsync, just nailing down a hard to trigger memory corruption. For now, these are tested and ready." * 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: fix one bug that process may endlessly wait for ticket in wait_reserve_ticket() Btrfs: fix endless loop in balancing block groups Btrfs: kill invalid ASSERT() in process_all_refs()
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds authored
Pull arm64 fixes from Catalin Marinas: "arm64 and arm/perf fixes: - arm64 fix: debug exception unmasking on the CPU resume path - ARM PMU fixes: memory leak on error path and NULL pointer dereference" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: kernel: Fix unmasked debug exceptions when restoring mdscr_el1 drivers/perf: arm_pmu: Fix NULL pointer dereference during probe drivers/perf: arm_pmu: Fix leak in error path
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds authored
Pull char/misc driver fixes from Greg KH: "Here are a number of small driver fixes for 4.8-rc5. The largest thing here is deleting an obsolete driver, drivers/misc/bh1780gli.c, as the functionality of it was replaced by an iio driver a while ago. The other fixes are things that have been reported, or reverts of broken stuff (the binder change). All of these changes have been in linux-next for a while with no reported issues" * tag 'char-misc-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: thunderbolt: Don't declare Falcon Ridge unsupported thunderbolt: Add support for INTEL_FALCON_RIDGE_2C controller. thunderbolt: Fix resume quirk for Falcon Ridge 4C. lkdtm: Mark lkdtm_rodata_do_nothing() notrace mei: me: disable driver on SPT SPS firmware Revert "android: binder: fix dangling pointer comparison" drivers/iio/light/Kconfig: SENSORS_BH1780 cleanup android: binder: fix dangling pointer comparison misc: delete bh1780 driver
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds authored
Pull driver core fixes from Greg KH: "Here are three small fixes for 4.8-rc5. One for sysfs, one for kernfs, and one documentation fix, all for reported issues. All of these have been in linux-next for a while" * tag 'driver-core-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: sysfs: correctly handle read offset on PREALLOC attrs documentation: drivers/core/of: fix name of of_node symlink kernfs: don't depend on d_find_any_alias() when generating notifications
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/stagingLinus Torvalds authored
Pull staging/IIO driver fixes from Greg KH: "Here are a number of small fixes for staging and IIO drivers that resolve reported problems. Full details are in the shortlog. All of these have been in linux-next with no reported issues" * tag 'staging-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (35 commits) arm: dts: rockchip: add reset node for the exist saradc SoCs arm64: dts: rockchip: add reset saradc node for rk3368 SoCs iio: adc: rockchip_saradc: reset saradc controller before programming it iio: accel: kxsd9: Fix raw read return iio: adc: ti_am335x_adc: Increase timeout value waiting for ADC sample iio: adc: ti_am335x_adc: Protect FIFO1 from concurrent access include/linux: fix excess fence.h kernel-doc notation staging: wilc1000: correctly check if associatedsta has not been found staging: wilc1000: NULL dereference on error staging: wilc1000: txq_event: Fix coding error MAINTAINERS: Add file patterns for ion device tree bindings MAINTAINERS: Update maintainer entry for wilc1000 iio: chemical: atlas-ph-sensor: fix typo in val assignment iio: fix sched WARNING "do not call blocking ops when !TASK_RUNNING" staging: comedi: ni_mio_common: fix AO inttrig backwards compatibility staging: comedi: dt2811: fix a precedence bug staging: comedi: adv_pci1760: Do not return EINVAL for CMDF_ROUND_DOWN. staging: comedi: ni_mio_common: fix wrong insn_write handler staging: comedi: comedi_test: fix timer race conditions staging: comedi: daqboard2000: bug fix board type matching code ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds authored
Pull serial driver fixes from Greg KH: "Here are some small serial driver fixes for 4.8-rc5. One fixes an oft-reported build issue with the fintek driver, another reverts a patch that was causing problems, one fixes a crash, and some new device ids were added. All of these have been in linux-next for a while" * tag 'tty-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: 8250: added acces i/o products quad and octal serial cards serial: 8250_mid: fix divide error bug if baud rate is 0 Revert "tty/serial/8250: use mctrl_gpio helpers" 8250/fintek: rename IRQ_MODE macro
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds authored
Pull USB/PHY fixes from Greg KH: "Here are some USB and PHY driver fixes for 4.8-rc5 Nothing major, lots of little fixes for reported bugs, and a build fix for a missing .h file that the phy drivers needed. All of these have been in linux-next for a while with no reported issues" * tag 'usb-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (24 commits) usb: musb: Fix locking errors for host only mode usb: dwc3: gadget: always decrement by 1 usb: dwc3: debug: fix ep name on trace output usb: gadget: udc: core: don't starve DMA resources USB: serial: option: add WeTelecom 0x6802 and 0x6803 products USB: avoid left shift by -1 USB: fix typo in wMaxPacketSize validation usb: gadget: Add the gserial port checking in gs_start_tx() usb: dwc3: gadget: don't rely on jiffies while holding spinlock usb: gadget: fsl_qe_udc: signedness bug in qe_get_frame() usb: gadget: function: f_rndis: socket buffer may be NULL usb: gadget: function: f_eem: socket buffer may be NULL usb: renesas_usbhs: gadget: fix return value check in usbhs_mod_gadget_probe() usb: dwc2: Add reset control to dwc2 usb: dwc3: core: allow device to runtime_suspend several times usb: dwc3: pci: runtime_resume child device USB: serial: option: add WeTelecom WM-D200 usb: chipidea: udc: don't touch DP when controller is in host mode USB: serial: mos7840: fix non-atomic allocation in write path USB: serial: mos7720: fix non-atomic allocation in write path ...
-
Linus Torvalds authored
In commit 8ead9dd5 ("devpts: more pty driver interface cleanups") I made devpts_get_priv() just return the dentry->fs_data directly. And because I thought it wouldn't happen, I added a warning if you ever saw a pts node that wasn't on devpts. And no, that warning never triggered under any actual real use, but you can trigger it by creating nonsensical pts nodes by hand. So just revert the warning, and make devpts_get_priv() return NULL for that case like it used to. Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org # 4.6+ Cc: Eric W Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block fixes from Jens Axboe: "A collection of fixes for the nvme over fabrics code" * 'for-linus' of git://git.kernel.dk/linux-block: nvme-rdma: Get rid of redundant defines nvme-rdma: Get rid of duplicate variable nvme: fabrics drivers don't need the nvme-pci driver nvme-fabrics: get a reference when reusing a nvme_host structure nvme-fabrics: change NQN UUID to big-endian format nvme-loop: set sqsize to 0-based value, per spec nvme-rdma: fix sqsize/hsqsize per spec fabrics: define admin sqsize min default, per spec nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize nvmet-rdma: Fix use after free nvme-rdma: initialize ret to zero to avoid returning garbage
-
- 02 Sep, 2016 23 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-securityLinus Torvalds authored
Pull TPM bugfix from James Morris. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: tpm: invalid self test error message
-
Jarkko Sakkinen authored
The driver emits invalid self test error message even though the init succeeds. Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Fixes: cae8b441 ("tpm: Factor out common startup code") Reviewed-by: James Morris <james.l.morris@oracle.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull ACPI fixes ffrom Rafael Wysocki: "Two stable-candidate fixes for the ACPI early device probing code added during the 4.4 cycle, one fixing a typo in a stub macro used when CONFIG_ACPI is unset and one that prevents sleeping functions from being called under a spinlock (Lorenzo Pieralisi)" * tag 'acpi-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / drivers: replace acpi_probe_lock spinlock with mutex ACPI / drivers: fix typo in ACPI_DECLARE_PROBE_ENTRY macro
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull power management fixes from Rafael Wysocki: "This includes a stable-candidate cpufreq-dt driver problem fix and annotations of tracepoints in the runtime PM framework. Specifics: - Fix the definition of the cpufreq-dt driver's machines table introduced during the 4.7 cycle that should be NULL-terminated, but the termination entry is missing from it (Wei Yongjun). - Annotate tracepoints in the runtime PM framework's core so as to allow the functions containing them to be called from the idle code path without causing RCU to complain about illegal usage (Paul McKenney)" * tag 'pm-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / runtime: Add _rcuidle suffix to allow rpm_idle() use from idle PM / runtime: Add _rcuidle suffix to allow rpm_resume() to be called from idle cpufreq: dt: Add terminate entry for of_device_id tables
-
Rafael J. Wysocki authored
* pm-cpufreq-fixes: cpufreq: dt: Add terminate entry for of_device_id tables * pm-core-fixes: PM / runtime: Add _rcuidle suffix to allow rpm_idle() use from idle PM / runtime: Add _rcuidle suffix to allow rpm_resume() to be called from idle
-
Lorenzo Pieralisi authored
Commit e647b532 ("ACPI: Add early device probing infrastructure") introduced code that allows inserting driver specific struct acpi_probe_entry probe entries into ACPI linker sections (one per-subsystem, eg irqchip, clocksource) that are then walked to retrieve the data and function hooks required to probe the respective kernel components. Probing for all entries in a section is triggered through the __acpi_probe_device_table() function, that in turn, according to the table ID a given probe entry reports parses the table with the function retrieved from the respective section structures (ie struct acpi_probe_entry). Owing to the current ACPI table parsing implementation, the __acpi_probe_device_table() function has to share global variables with the acpi_match_madt() function, so in order to guarantee mutual exclusion locking is required between the two functions. Current kernel code implements the locking through the acpi_probe_lock spinlock; this has the side effect of requiring all code called within the lock (ie struct acpi_probe_entry.probe_{table/subtbl} hooks) not to sleep. However, kernel subsystems that make use of the early probing infrastructure are relying on kernel APIs that may sleep (eg irq_domain_alloc_fwnode(), among others) in the function calls pointed at by struct acpi_probe_entry.{probe_table/subtbl} entries (eg gic_v2_acpi_init()), which is a bug. Since __acpi_probe_device_table() is called from context that is allowed to sleep the acpi_probe_lock spinlock can be replaced with a mutex; this fixes the issue whilst still guaranteeing mutual exclusion. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Fixes: e647b532 (ACPI: Add early device probing infrastructure) Cc: 4.4+ <stable@vger.kernel.org> # 4.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Lorenzo Pieralisi authored
When the ACPI_DECLARE_PROBE_ENTRY macro was added in commit e647b532 ("ACPI: Add early device probing infrastructure"), a stub macro adding an unused entry was added for the !CONFIG_ACPI Kconfig option case to make sure kernel code making use of the macro did not require to be guarded within CONFIG_ACPI in order to be compiled. The stub macro was never used since all kernel code that defines ACPI_DECLARE_PROBE_ENTRY entries is currently guarded within CONFIG_ACPI; it contains a typo that should be nonetheless fixed. Fix the typo in the stub (ie !CONFIG_ACPI) ACPI_DECLARE_PROBE_ENTRY() macro so that it can actually be used if needed. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Fixes: e647b532 (ACPI: Add early device probing infrastructure) Cc: 4.4+ <stable@vger.kernel.org> # 4.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Emanuel Czirai authored
AMD F12h machines have an erratum which can cause DIV/IDIV to behave unpredictably. The workaround is to set MSRC001_1029[31] but sometimes there is no BIOS update containing that workaround so let's do it ourselves unconditionally. It is simple enough. [ Borislav: Wrote commit message. ] Signed-off-by: Emanuel Czirai <icanrealizeum@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Yaowu Xu <yaowu@google.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20160902053550.18097-1-bp@alien8.deSigned-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Steven Rostedt authored
Łukasz Daniluk reported that on a RHEL kernel that his machine would lock up after enabling function tracer. I asked him to bisect the functions within available_filter_functions, which he did and it came down to three: _paravirt_nop(), _paravirt_ident_32() and _paravirt_ident_64() It was found that this is only an issue when noreplace-paravirt is added to the kernel command line. This means that those functions are most likely called within critical sections of the funtion tracer, and must not be traced. In newer kenels _paravirt_nop() is defined within gcc asm(), and is no longer an issue. But both _paravirt_ident_{32,64}() causes the following splat when they are traced: mm/pgtable-generic.c:33: bad pmd ffff8800d2435150(0000000001d00054) mm/pgtable-generic.c:33: bad pmd ffff8800d3624190(0000000001d00070) mm/pgtable-generic.c:33: bad pmd ffff8800d36a5110(0000000001d00054) mm/pgtable-generic.c:33: bad pmd ffff880118eb1450(0000000001d00054) NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-journal:469] Modules linked in: e1000e CPU: 2 PID: 469 Comm: systemd-journal Not tainted 4.6.0-rc4-test+ #513 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 task: ffff880118f740c0 ti: ffff8800d4aec000 task.ti: ffff8800d4aec000 RIP: 0010:[<ffffffff81134148>] [<ffffffff81134148>] queued_spin_lock_slowpath+0x118/0x1a0 RSP: 0018:ffff8800d4aefb90 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011eb16d40 RDX: ffffffff82485760 RSI: 000000001f288820 RDI: ffffea0000008030 RBP: ffff8800d4aefb90 R08: 00000000000c0000 R09: 0000000000000000 R10: ffffffff821c8e0e R11: 0000000000000000 R12: ffff880000200fb8 R13: 00007f7a4e3f7000 R14: ffffea000303f600 R15: ffff8800d4b562e0 FS: 00007f7a4e3d7840(0000) GS:ffff88011eb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7a4e3f7000 CR3: 00000000d3e71000 CR4: 00000000001406e0 Call Trace: _raw_spin_lock+0x27/0x30 handle_pte_fault+0x13db/0x16b0 handle_mm_fault+0x312/0x670 __do_page_fault+0x1b1/0x4e0 do_page_fault+0x22/0x30 page_fault+0x28/0x30 __vfs_read+0x28/0xe0 vfs_read+0x86/0x130 SyS_read+0x46/0xa0 entry_SYSCALL_64_fastpath+0x1e/0xa8 Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 6d 01 00 48 03 14 c5 80 6a 5d 82 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 8b Reported-by: Łukasz Daniluk <lukasz.daniluk@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfsLinus Torvalds authored
Pull overlayfs fixes from Miklos Szeredi: "Most of this is regression fixes for posix acl behavior introduced in 4.8-rc1 (these were caught by the pjd-fstest suite). The are also miscellaneous fixes marked as stable material and cleanups. Other than overlayfs code, it touches <linux/fs.h> to add a constant with which to disable posix acl caching. No changes needed to the actual caching code, it automatically does the right thing, although later we may want to optimize this case. I'm now testing overlayfs with the following test suites to catch regressions: - unionmount-testsuite - xfstests - pjd-fstest" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: update doc ovl: listxattr: use strnlen() ovl: Switch to generic_getxattr ovl: copyattr after setting POSIX ACL ovl: Switch to generic_removexattr ovl: Get rid of ovl_xattr_noacl_handlers array ovl: Fix OVL_XATTR_PREFIX ovl: fix spelling mistake: "directries" -> "directories" ovl: don't cache acl on overlay layer ovl: use cached acl on underlying layer ovl: proper cleanup of workdir ovl: remove posix_acl_default from workdir ovl: handle umask and posix_acl_default correctly on creation ovl: don't copy up opaqueness
-
James Morse authored
Changes to make the resume from cpu_suspend() code behave more like secondary boot caused debug exceptions to be unmasked early by __cpu_setup(). We then go on to restore mdscr_el1 in cpu_do_resume(), potentially taking break or watch points based on uninitialised registers. Mask debug exceptions in cpu_do_resume(), which is specific to resume from cpu_suspend(). Debug exceptions will be restored to their original state by local_dbg_restore() in cpu_suspend(), which runs after hw_breakpoint_restore() has re-initialised the other registers. Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Fixes: cabe1c81 ("arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va") Cc: <stable@vger.kernel.org> # 4.7+ Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Stefan Wahren authored
Patch 7f1d642f ("drivers/perf: arm-pmu: Fix handling of SPI lacking interrupt-affinity property") unintended also fixes perf_event support for bcm2835 which doesn't have PMU interrupts. Unfortunately this change introduce a NULL pointer dereference on bcm2835, because irq_is_percpu always expected to be called with a valid IRQ. So fix this regression by validating the IRQ before. Tested-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Fixes: 7f1d642f ("drivers/perf: arm-pmu: Fix handling of SPI lacking "interrupt-affinity" property") Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Stefan Wahren authored
In case of a IRQ type mismatch in of_pmu_irq_cfg() the device node for interrupt affinity isn't freed. So fix this issue by calling of_node_put(). Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Fixes: fa8ad788 ("arm: perf: factor arm_pmu core out to drivers") Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds authored
Pull dmaengine fixes from Vinod Koul: "The fixes this time are all in drivers: - possible NULL dereference in img-mdc - correct device identity for free_irq in at_xdmac - missing of_node_put() in fsl probe - fix debug log and hotchain corner case for pxa-dma - fix checking hardware bits in isr in usb dmac" * tag 'dmaengine-fix-4.8-rc5' of git://git.infradead.org/users/vkoul/slave-dma: dmaengine: img-mdc: fix a possible NULL dereference dmaengine: at_xdmac: fix to pass correct device identity to free_irq() dmaengine: fsl_raid: add missing of_node_put() in fsl_re_probe() dmaengine: pxa_dma: fix debug message dmaengine: pxa_dma: fix hotchain corner case dmaengine: usb-dmac: check CHCR.DE bit in usb_dmac_isr_channel()
-
git://people.freedesktop.org/~airlied/linuxLinus Torvalds authored
Pull drm fixes from Dave Airlie: "Contains fixes for imx, amdgpu, vc4, msm and one nouveau ACPI fix" * tag 'drm-fixes-for-4.8-rc5' of git://people.freedesktop.org/~airlied/linux: drm/amdgpu: record error code when ring test failed drm/amd/amdgpu: compute ring test fail during S4 on CI drm/amd/amdgpu: sdma resume fail during S4 on CI drm/nouveau/acpi: use DSM if bridge does not support D3cold drm/imx: fix crtc vblank state regression drm/imx: Add active plane reconfiguration support drm/msm: protect against faults from copy_from_user() in submit ioctl drm/msm: fix use of copy_from_user() while holding spinlock drm/vc4: Fix oops when userspace hands in a bad BO. drm/vc4: Fix overflow mem unreferencing when the binner runs dry. drm/vc4: Free hang state before destroying BO cache. drm/vc4: Fix handling of a pm_runtime_get_sync() success case. drm/vc4: Use drm_malloc_ab to fix large rendering jobs. drm/vc4: Use drm_free_large() on handles to match its allocation.
-
Wanpeng Li authored
tick_nohz_start_idle() is prevented to be called if the idle tick can't be stopped since commit 1f3b0f82 ("tick/nohz: Optimize nohz idle enter"). As a result, after suspend/resume the host machine, full dynticks kvm guest will softlockup: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [swapper/0:0] Call Trace: default_idle+0x31/0x1a0 arch_cpu_idle+0xf/0x20 default_idle_call+0x2a/0x50 cpu_startup_entry+0x39b/0x4d0 rest_init+0x138/0x140 ? rest_init+0x5/0x140 start_kernel+0x4c1/0x4ce ? set_init_arg+0x55/0x55 ? early_idt_handler_array+0x120/0x120 x86_64_start_reservations+0x24/0x26 x86_64_start_kernel+0x142/0x14f In addition, cat /proc/stat | grep cpu in guest or host: cpu 398 16 5049 15754 5490 0 1 46 0 0 cpu0 206 5 450 0 0 0 1 14 0 0 cpu1 81 0 3937 3149 1514 0 0 9 0 0 cpu2 45 6 332 6052 2243 0 0 11 0 0 cpu3 65 2 328 6552 1732 0 0 11 0 0 The idle and iowait states are weird 0 for cpu0(housekeeping). The bug is present in both guest and host kernels, and they both have cpu0's idle and iowait states issue, however, host kernel's suspend/resume path etc will touch watchdog to avoid the softlockup. - The watchdog will not be touched in tick_nohz_stop_idle path (need be touched since the scheduler stall is expected) if idle_active flags are not detected. - The idle and iowait states will not be accounted when exit idle loop (resched or interrupt) if idle start time and idle_active flags are not set. This patch fixes it by reverting commit 1f3b0f82 since can't stop idle tick doesn't mean can't be idle. Fixes: 1f3b0f82 ("tick/nohz: Optimize nohz idle enter") Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Sanjeev Yadav<sanjeev.yadav@spreadtrum.com> Cc: Gaurav Jindal<gaurav.jindal@spreadtrum.com> Cc: stable@vger.kernel.org Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/1472798303-4154-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: Thomas Gleixner <tglx@linutronix.de>
-
https://github.com/anholt/linuxDave Airlie authored
This pull request brings in fixes for VC4 3D in 4.8, most of which are covered by testcases. * tag 'drm-vc4-fixes-2016-08-29' of https://github.com/anholt/linux: drm/vc4: Fix oops when userspace hands in a bad BO. drm/vc4: Fix overflow mem unreferencing when the binner runs dry. drm/vc4: Free hang state before destroying BO cache. drm/vc4: Fix handling of a pm_runtime_get_sync() success case. drm/vc4: Use drm_malloc_ab to fix large rendering jobs. drm/vc4: Use drm_free_large() on handles to match its allocation.
-
git://git.pengutronix.de/git/pza/linuxDave Airlie authored
imx-drm atomic modeset regression fixes - add active plane reconfiguration support - add back crtc vblank state reporting * tag 'imx-drm-fixes-2016-08-30' of git://git.pengutronix.de/git/pza/linux: drm/imx: fix crtc vblank state regression drm/imx: Add active plane reconfiguration support
-
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linuxLinus Torvalds authored
Pull clk fixes from Stephen Boyd: "A collection of small fixes for various SoC vendor clk drivers" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: rockchip: mark aclk_emmc_noc as a critical clock on rk3399 clk: tegra: remove TEGRA_PLL_USE_LOCK for PLLD/PLLD2 clk: rockchip: fix incorrect GATE bits for {c, g}pll_aclk_perihp_src on rk3399 clk: rockchip: fix incorrect aclk_emmc source gate bits on rk3399 clk: renesas: r8a7795: Fix SD clocks clk: rockchip: fix rk3399 aclk_vio gate bit clk: sunxi-ng: Fix inverted test condition in ccu_helper_wait_for_lock
-
Linus Torvalds authored
Merge fixes from Andrew Morton: "14 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: rapidio/tsi721: fix incorrect detection of address translation condition rapidio/documentation/mport_cdev: add missing parameter description kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd MAINTAINERS: Vladimir has moved mm, mempolicy: task->mempolicy must be NULL before dropping final reference printk/nmi: avoid direct printk()-s from __printk_nmi_flush() treewide: remove references to the now unnecessary DEFINE_PCI_DEVICE_TABLE drivers/scsi/wd719x.c: remove last declaration using DEFINE_PCI_DEVICE_TABLE mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator lib/test_hash.c: fix warning in preprocessor symbol evaluation lib/test_hash.c: fix warning in two-dimensional array init kconfig: tinyconfig: provide whole choice blocks to avoid warnings kexec: fix double-free when failing to relocate the purgatory mm, oom: prevent premature OOM killer invocation for high order request
-
Alexandre Bounine authored
Fix incorrect condition to identify involvment of a address translation mechanism. This bug results in NULL pointer kernel crash dump in cases when mapping of inbound RapidIO address range is requested within existing aprture. Link: http://lkml.kernel.org/r/20160901173144.2983-1-alexandre.bounine@idt.comSigned-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Cc: <stable@vger.kernel.org> [4.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexandre Bounine authored
Add missing description for rio_mport_cdev driver parameter 'dma_timeout'. This patch is applicable to kernel versions starting from v4.6. Link: http://lkml.kernel.org/r/20160901173104.2928-1-alexandre.bounine@idt.comSigned-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
Commit fec1d011 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit") has caused a subtle regression in nscd which uses CLONE_CHILD_CLEARTID to clear the nscd_certainly_running flag in the shared databases, so that the clients are notified when nscd is restarted. Now, when nscd uses a non-persistent database, clients that have it mapped keep thinking the database is being updated by nscd, when in fact nscd has created a new (anonymous) one (for non-persistent databases it uses an unlinked file as backend). The original proposal for the CLONE_CHILD_CLEARTID change claimed (https://lkml.org/lkml/2006/10/25/233): : The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls : on behalf of pthread_create() library calls. This feature is used to : request that the kernel clear the thread-id in user space (at an address : provided in the syscall) when the thread disassociates itself from the : address space, which is done in mm_release(). : : Unfortunately, when a multi-threaded process incurs a core dump (such as : from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of : the other threads, which then proceed to clear their user-space tids : before synchronizing in exit_mm() with the start of core dumping. This : misrepresents the state of process's address space at the time of the : SIGSEGV and makes it more difficult for someone to debug NPTL and glibc : problems (misleading him/her to conclude that the threads had gone away : before the fault). : : The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a : core dump has been initiated. The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269) seems to have a larger scope than the original patch asked for. It seems that limitting the scope of the check to core dumping should work for SIGSEGV issue describe above. [Changelog partly based on Andreas' description] Fixes: fec1d011 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit") Link: http://lkml.kernel.org/r/1471968749-26173-1-git-send-email-mhocko@kernel.orgSigned-off-by: Michal Hocko <mhocko@suse.com> Tested-by: William Preston <wpreston@suse.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Andreas Schwab <schwab@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-