1. 01 Aug, 2014 1 commit
    • Jan Kara's avatar
      timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks · 504d5874
      Jan Kara authored
      clockevents_increase_min_delta() calls printk() from under
      hrtimer_bases.lock. That causes lock inversion on scheduler locks because
      printk() can call into the scheduler. Lockdep puts it as:
      
      ======================================================
      [ INFO: possible circular locking dependency detected ]
      3.15.0-rc8-06195-g939f04be #2 Not tainted
      -------------------------------------------------------
      trinity-main/74 is trying to acquire lock:
       (&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c
      
      but task is already holding lock:
       (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #5 (hrtimer_bases.lock){-.-...}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
             [<8103c918>] __hrtimer_start_range_ns+0x1c/0x197
             [<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85
             [<81080792>] task_clock_event_start+0x3a/0x3f
             [<810807a4>] task_clock_event_add+0xd/0x14
             [<8108259a>] event_sched_in+0xb6/0x17a
             [<810826a2>] group_sched_in+0x44/0x122
             [<81082885>] ctx_sched_in.isra.67+0x105/0x11f
             [<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b
             [<81082bf6>] __perf_install_in_context+0x8b/0xa3
             [<8107eb8e>] remote_function+0x12/0x2a
             [<8105f5af>] smp_call_function_single+0x2d/0x53
             [<8107e17d>] task_function_call+0x30/0x36
             [<8107fb82>] perf_install_in_context+0x87/0xbb
             [<810852c9>] SYSC_perf_event_open+0x5c6/0x701
             [<810856f9>] SyS_perf_event_open+0x17/0x19
             [<8142f8ee>] syscall_call+0x7/0xb
      
      -> #4 (&ctx->lock){......}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f04c>] _raw_spin_lock+0x21/0x30
             [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
             [<8142cacc>] __schedule+0x4c6/0x4cb
             [<8142cae0>] schedule+0xf/0x11
             [<8142f9a6>] work_resched+0x5/0x30
      
      -> #3 (&rq->lock){-.-.-.}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f04c>] _raw_spin_lock+0x21/0x30
             [<81040873>] __task_rq_lock+0x33/0x3a
             [<8104184c>] wake_up_new_task+0x25/0xc2
             [<8102474b>] do_fork+0x15c/0x2a0
             [<810248a9>] kernel_thread+0x1a/0x1f
             [<814232a2>] rest_init+0x1a/0x10e
             [<817af949>] start_kernel+0x303/0x308
             [<817af2ab>] i386_start_kernel+0x79/0x7d
      
      -> #2 (&p->pi_lock){-.-...}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
             [<810413dd>] try_to_wake_up+0x1d/0xd6
             [<810414cd>] default_wake_function+0xb/0xd
             [<810461f3>] __wake_up_common+0x39/0x59
             [<81046346>] __wake_up+0x29/0x3b
             [<811b8733>] tty_wakeup+0x49/0x51
             [<811c3568>] uart_write_wakeup+0x17/0x19
             [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
             [<811c5f28>] serial8250_handle_irq+0x54/0x6a
             [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
             [<811c56d8>] serial8250_interrupt+0x38/0x9e
             [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
             [<81051296>] handle_irq_event+0x2c/0x43
             [<81052cee>] handle_level_irq+0x57/0x80
             [<81002a72>] handle_irq+0x46/0x5c
             [<810027df>] do_IRQ+0x32/0x89
             [<8143036e>] common_interrupt+0x2e/0x33
             [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
             [<811c25a4>] uart_start+0x2d/0x32
             [<811c2c04>] uart_write+0xc7/0xd6
             [<811bc6f6>] n_tty_write+0xb8/0x35e
             [<811b9beb>] tty_write+0x163/0x1e4
             [<811b9cd9>] redirected_tty_write+0x6d/0x75
             [<810b6ed6>] vfs_write+0x75/0xb0
             [<810b7265>] SyS_write+0x44/0x77
             [<8142f8ee>] syscall_call+0x7/0xb
      
      -> #1 (&tty->write_wait){-.....}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
             [<81046332>] __wake_up+0x15/0x3b
             [<811b8733>] tty_wakeup+0x49/0x51
             [<811c3568>] uart_write_wakeup+0x17/0x19
             [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
             [<811c5f28>] serial8250_handle_irq+0x54/0x6a
             [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
             [<811c56d8>] serial8250_interrupt+0x38/0x9e
             [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
             [<81051296>] handle_irq_event+0x2c/0x43
             [<81052cee>] handle_level_irq+0x57/0x80
             [<81002a72>] handle_irq+0x46/0x5c
             [<810027df>] do_IRQ+0x32/0x89
             [<8143036e>] common_interrupt+0x2e/0x33
             [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
             [<811c25a4>] uart_start+0x2d/0x32
             [<811c2c04>] uart_write+0xc7/0xd6
             [<811bc6f6>] n_tty_write+0xb8/0x35e
             [<811b9beb>] tty_write+0x163/0x1e4
             [<811b9cd9>] redirected_tty_write+0x6d/0x75
             [<810b6ed6>] vfs_write+0x75/0xb0
             [<810b7265>] SyS_write+0x44/0x77
             [<8142f8ee>] syscall_call+0x7/0xb
      
      -> #0 (&port_lock_key){-.....}:
             [<8104a62d>] __lock_acquire+0x9ea/0xc6d
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
             [<811c60be>] serial8250_console_write+0x8c/0x10c
             [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
             [<8104f5d5>] console_unlock+0x1d7/0x398
             [<8104fb70>] vprintk_emit+0x3da/0x3e4
             [<81425f76>] printk+0x17/0x19
             [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
             [<8105c548>] clockevents_program_event+0xe7/0xf3
             [<8105cc1c>] tick_program_event+0x1e/0x23
             [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
             [<8103c49e>] __remove_hrtimer+0x5b/0x79
             [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
             [<8103cb4b>] hrtimer_cancel+0xd/0x18
             [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
             [<81080705>] task_clock_event_stop+0x20/0x64
             [<81080756>] task_clock_event_del+0xd/0xf
             [<81081350>] event_sched_out+0xab/0x11e
             [<810813e0>] group_sched_out+0x1d/0x66
             [<81081682>] ctx_sched_out+0xaf/0xbf
             [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
             [<8142cacc>] __schedule+0x4c6/0x4cb
             [<8142cae0>] schedule+0xf/0x11
             [<8142f9a6>] work_resched+0x5/0x30
      
      other info that might help us debug this:
      
      Chain exists of:
        &port_lock_key --> &ctx->lock --> hrtimer_bases.lock
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(hrtimer_bases.lock);
                                     lock(&ctx->lock);
                                     lock(hrtimer_bases.lock);
        lock(&port_lock_key);
      
       *** DEADLOCK ***
      
      4 locks held by trinity-main/74:
       #0:  (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb
       #1:  (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
       #2:  (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
       #3:  (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4
      
      stack backtrace:
      CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04be #2
       00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570
       8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0
       8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003
      Call Trace:
       [<81426f69>] dump_stack+0x16/0x18
       [<81425a99>] print_circular_bug+0x18f/0x19c
       [<8104a62d>] __lock_acquire+0x9ea/0xc6d
       [<8104a942>] lock_acquire+0x92/0x101
       [<811c60be>] ? serial8250_console_write+0x8c/0x10c
       [<811c6032>] ? wait_for_xmitr+0x76/0x76
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<811c60be>] ? serial8250_console_write+0x8c/0x10c
       [<811c60be>] serial8250_console_write+0x8c/0x10c
       [<8104af87>] ? lock_release+0x191/0x223
       [<811c6032>] ? wait_for_xmitr+0x76/0x76
       [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
       [<8104f5d5>] console_unlock+0x1d7/0x398
       [<8104fb70>] vprintk_emit+0x3da/0x3e4
       [<81425f76>] printk+0x17/0x19
       [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
       [<8105cc1c>] tick_program_event+0x1e/0x23
       [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
       [<8103c49e>] __remove_hrtimer+0x5b/0x79
       [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
       [<8103cb4b>] hrtimer_cancel+0xd/0x18
       [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
       [<81080705>] task_clock_event_stop+0x20/0x64
       [<81080756>] task_clock_event_del+0xd/0xf
       [<81081350>] event_sched_out+0xab/0x11e
       [<810813e0>] group_sched_out+0x1d/0x66
       [<81081682>] ctx_sched_out+0xaf/0xbf
       [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
       [<8104416d>] ? __dequeue_entity+0x23/0x27
       [<81044505>] ? pick_next_task_fair+0xb1/0x120
       [<8142cacc>] __schedule+0x4c6/0x4cb
       [<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108
       [<810475b0>] ? trace_hardirqs_off+0xb/0xd
       [<81056346>] ? rcu_irq_exit+0x64/0x77
      
      Fix the problem by using printk_deferred() which does not call into the
      scheduler.
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      504d5874
  2. 24 Jul, 2014 6 commits
  3. 23 Jul, 2014 18 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · b292d6b5
      Linus Torvalds authored
      Pull input layer fixes from Dmitry Torokhov:
       "A few fixups for the input subsystem"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: document INPUT_PROP_TOPBUTTONPAD
        Input: fix defuzzing logic
        Input: sirfsoc-onkey - fix GPL v2 license string typo
        Input: st-keyscan - fix 'defined but not used' compiler warnings
        Input: synaptics - add min/max quirk for pnp-id LEN2002 (Edge E531)
        Input: i8042 - add Acer Aspire 5710 to nomux blacklist
        Input: ti_am335x_tsc - warn about incorrect spelling
        Input: wacom - cleanup multitouch code when touch_max is 2
      b292d6b5
    • Linus Torvalds's avatar
      Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · 7442cf9a
      Linus Torvalds authored
      Pull powerpc fixes from Ben Herrenschmidt:
       "Here is a handful of powerpc fixes for 3.16.  They are all pretty
        simple and self contained and should still make this release"
      
      * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
        powerpc: use _GLOBAL_TOC for memmove
        powerpc/pseries: dynamically added OF nodes need to call of_node_init
        powerpc: subpage_protect: Increase the array size to take care of 64TB
        powerpc: Fix bugs in emulate_step()
        powerpc: Disable doorbells on Power8 DD1.x
      7442cf9a
    • Linus Torvalds's avatar
      Merge tag 'urgent-slab-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm · 355cb093
      Linus Torvalds authored
      Pull slab fix from Mike Snitzer:
       "This fixes the broken duplicate slab name check in
        kmem_cache_sanity_check() that has been repeatedly reported (as
        recently as today against Fedora rawhide).
      
        Pekka seemed to have it staged for a late 3.15-rc in his 'slab/urgent'
        branch but never sent a pull request, see:
            https://lkml.org/lkml/2014/5/23/648"
      
      * tag 'urgent-slab-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        slab_common: fix the check for duplicate slab names
      355cb093
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew Morton) · ed4a1084
      Linus Torvalds authored
      Merge fixes from Andrew Morton:
       "10 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: hugetlb: fix copy_hugetlb_page_range()
        simple_xattr: permit 0-size extended attributes
        mm/fs: fix pessimization in hole-punching pagecache
        shmem: fix splicing from a hole while it's punched
        shmem: fix faulting into a hole, not taking i_mutex
        mm: do not call do_fault_around for non-linear fault
        sh: also try passing -m4-nofpu for SH2A builds
        zram: avoid lockdep splat by revalidate_disk
        mm/rmap.c: fix pgoff calculation to handle hugepage correctly
        coredump: fix the setting of PF_DUMPCORE
      ed4a1084
    • Naoya Horiguchi's avatar
      mm: hugetlb: fix copy_hugetlb_page_range() · 0253d634
      Naoya Horiguchi authored
      Commit 4a705fef ("hugetlb: fix copy_hugetlb_page_range() to handle
      migration/hwpoisoned entry") changed the order of
      huge_ptep_set_wrprotect() and huge_ptep_get(), which leads to breakage
      in some workloads like hugepage-backed heap allocation via libhugetlbfs.
      This patch fixes it.
      
      The test program for the problem is shown below:
      
        $ cat heap.c
        #include <unistd.h>
        #include <stdlib.h>
        #include <string.h>
      
        #define HPS 0x200000
      
        int main() {
        	int i;
        	char *p = malloc(HPS);
        	memset(p, '1', HPS);
        	for (i = 0; i < 5; i++) {
        		if (!fork()) {
        			memset(p, '2', HPS);
        			p = malloc(HPS);
        			memset(p, '3', HPS);
        			free(p);
        			return 0;
        		}
        	}
        	sleep(1);
        	free(p);
        	return 0;
        }
      
        $ export HUGETLB_MORECORE=yes ; export HUGETLB_NO_PREFAULT= ; hugectl --heap ./heap
      
      Fixes 4a705fef ("hugetlb: fix copy_hugetlb_page_range() to handle
      migration/hwpoisoned entry"), so is applicable to -stable kernels which
      include it.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reported-by: default avatarGuillaume Morin <guillaume@morinfr.org>
      Suggested-by: default avatarGuillaume Morin <guillaume@morinfr.org>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>	[2.6.37+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0253d634
    • Hugh Dickins's avatar
      simple_xattr: permit 0-size extended attributes · 4e66d445
      Hugh Dickins authored
      If a filesystem uses simple_xattr to support user extended attributes,
      LTP setxattr01 and xfstests generic/062 fail with "Cannot allocate
      memory": simple_xattr_alloc()'s wrap-around test mistakenly excludes
      values of zero size.  Fix that off-by-one (but apparently no filesystem
      needs them yet).
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: Aristeu Rozanski <aris@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4e66d445
    • Hugh Dickins's avatar
      mm/fs: fix pessimization in hole-punching pagecache · 792ceaef
      Hugh Dickins authored
      I wanted to revert my v3.1 commit d0823576 ("mm: pincer in
      truncate_inode_pages_range"), to keep truncate_inode_pages_range() in
      synch with shmem_undo_range(); but have stepped back - a change to
      hole-punching in truncate_inode_pages_range() is a change to
      hole-punching in every filesystem (except tmpfs) that supports it.
      
      If there's a logical proof why no filesystem can depend for its own
      correctness on the pincer guarantee in truncate_inode_pages_range() - an
      instant when the entire hole is removed from pagecache - then let's
      revisit later.  But the evidence is that only tmpfs suffered from the
      livelock, and we have no intention of extending hole-punch to ramfs.  So
      for now just add a few comments (to match or differ from those in
      shmem_undo_range()), and fix one silliness noticed in d0823576...
      
      Its "index == start" addition to the hole-punch termination test was
      incomplete: it opened a way for the end condition to be missed, and the
      loop go on looking through the radix_tree, all the way to end of file.
      Fix that pessimization by resetting index when detected in inner loop.
      
      Note that it's actually hard to hit this case, without the obsessive
      concurrent faulting that trinity does: normally all pages are removed in
      the initial trylock_page() pass, and this loop finds nothing to do.  I
      had to "#if 0" out the initial pass to reproduce bug and test fix.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Lukas Czerner <lczerner@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      792ceaef
    • Hugh Dickins's avatar
      shmem: fix splicing from a hole while it's punched · b1a36650
      Hugh Dickins authored
      shmem_fault() is the actual culprit in trinity's hole-punch starvation,
      and the most significant cause of such problems: since a page faulted is
      one that then appears page_mapped(), needing unmap_mapping_range() and
      i_mmap_mutex to be unmapped again.
      
      But it is not the only way in which a page can be brought into a hole in
      the radix_tree while that hole is being punched; and Vlastimil's testing
      implies that if enough other processors are busy filling in the hole,
      then shmem_undo_range() can be kept from completing indefinitely.
      
      shmem_file_splice_read() is the main other user of SGP_CACHE, which can
      instantiate shmem pagecache pages in the read-only case (without holding
      i_mutex, so perhaps concurrently with a hole-punch).  Probably it's
      silly not to use SGP_READ already (using the ZERO_PAGE for holes): which
      ought to be safe, but might bring surprises - not a change to be rushed.
      
      shmem_read_mapping_page_gfp() is an internal interface used by
      drivers/gpu/drm GEM (and next by uprobes): it should be okay.  And
      shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when
      called internally by the kernel (perhaps for a stacking filesystem,
      which might rely on holes to be reserved): it's unclear whether it could
      be provoked to keep hole-punch busy or not.
      
      We could apply the same umbrella as now used in shmem_fault() to
      shmem_file_splice_read() and the others; but it looks ugly, and use over
      a range raises questions - should it actually be per page? can these get
      starved themselves?
      
      The origin of this part of the problem is my v3.1 commit d0823576
      ("mm: pincer in truncate_inode_pages_range"), once it was duplicated
      into shmem.c.  It seemed like a nice idea at the time, to ensure
      (barring RCU lookup fuzziness) that there's an instant when the entire
      hole is empty; but the indefinitely repeated scans to ensure that make
      it vulnerable.
      
      Revert that "enhancement" to hole-punch from shmem_undo_range(), but
      retain the unproblematic rescanning when it's truncating; add a couple
      of comments there.
      
      Remove the "indices[0] >= end" test: that is now handled satisfactorily
      by the inner loop, and mem_cgroup_uncharge_start()/end() are too light
      to be worth avoiding here.
      
      But if we do not always loop indefinitely, we do need to handle the case
      of swap swizzled back to page before shmem_free_swap() gets it: add a
      retry for that case, as suggested by Konstantin Khlebnikov; and for the
      case of page swizzled back to swap, as suggested by Johannes Weiner.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Suggested-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lukas Czerner <lczerner@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: <stable@vger.kernel.org>	[3.1+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b1a36650
    • Hugh Dickins's avatar
      shmem: fix faulting into a hole, not taking i_mutex · 8e205f77
      Hugh Dickins authored
      Commit f00cdc6d ("shmem: fix faulting into a hole while it's
      punched") was buggy: Sasha sent a lockdep report to remind us that
      grabbing i_mutex in the fault path is a no-no (write syscall may already
      hold i_mutex while faulting user buffer).
      
      We tried a completely different approach (see following patch) but that
      proved inadequate: good enough for a rational workload, but not good
      enough against trinity - which forks off so many mappings of the object
      that contention on i_mmap_mutex while hole-puncher holds i_mutex builds
      into serious starvation when concurrent faults force the puncher to fall
      back to single-page unmap_mapping_range() searches of the i_mmap tree.
      
      So return to the original umbrella approach, but keep away from i_mutex
      this time.  We really don't want to bloat every shmem inode with a new
      mutex or completion, just to protect this unlikely case from trinity.
      So extend the original with wait_queue_head on stack at the hole-punch
      end, and wait_queue item on the stack at the fault end.
      
      This involves further use of i_lock to guard against the races: lockdep
      has been happy so far, and I see fs/inode.c:unlock_new_inode() holds
      i_lock around wake_up_bit(), which is comparable to what we do here.
      i_lock is more convenient, but we could switch to shmem's info->lock.
      
      This issue has been tagged with CVE-2014-4171, which will require commit
      f00cdc6d and this and the following patch to be backported: we
      suggest to 3.1+, though in fact the trinity forkbomb effect might go
      back as far as 2.6.16, when madvise(,,MADV_REMOVE) came in - or might
      not, since much has changed, with i_mmap_mutex a spinlock before 3.0.
      Anyone running trinity on 3.0 and earlier? I don't think we need care.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lukas Czerner <lczerner@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: <stable@vger.kernel.org>	[3.1+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8e205f77
    • Konstantin Khlebnikov's avatar
      mm: do not call do_fault_around for non-linear fault · c118678b
      Konstantin Khlebnikov authored
      Ingo Korb reported that "repeated mapping of the same file on tmpfs
      using remap_file_pages sometimes triggers a BUG at mm/filemap.c:202 when
      the process exits".
      
      He bisected the bug to d7c17551 ("mm: implement ->map_pages for
      shmem/tmpfs"), although the bug was actually added by commit
      8c6e50b0 ("mm: introduce vm_ops->map_pages()").
      
      The problem is caused by calling do_fault_around for a _non-linear_
      fault.  In this case pgoff is shifted and might become negative during
      calculation.
      
      Faulting around non-linear page-fault makes no sense and breaks the
      logic in do_fault_around because pgoff is shifted.
      Signed-off-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Reported-by: default avatarIngo Korb <ingo.korb@tu-dortmund.de>
      Tested-by: default avatarIngo Korb <ingo.korb@tu-dortmund.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Ning Qu <quning@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>	[3.15.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c118678b
    • Geert Uytterhoeven's avatar
      sh: also try passing -m4-nofpu for SH2A builds · b1923b55
      Geert Uytterhoeven authored
      When compiling a SH2A kernel (e.g.  se7206_defconfig or rsk7203_defconfig)
      using sh4-linux-gcc, linking fails with:
      
        net/built-in.o: In function `__sk_run_filter':
        net/core/filter.c:566: undefined reference to `__fpscr_values'
        net/core/filter.c:269: undefined reference to `__fpscr_values'
        ...
        net/built-in.o:net/core/filter.c:580: more undefined references to `__fpscr_values' follow
      
      This happens because sh4-linux-gcc doesn't support the "-m2a-nofpu",
      which is thus filtered out by "$(call cc-option, ...)".
      
      As compiling using sh4-linux-gcc is useful for compile coverage, also
      try passing "-m4-nofpu" (which is presumably filtered out when using a
      real sh2a-linux toolchain) to disable the generation of FPU instructions
      and references to __fpscr_values[].
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Tony Breeds <tony@bakeyournoodle.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Magnus Damm <magnus.damm@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b1923b55
    • Minchan Kim's avatar
      zram: avoid lockdep splat by revalidate_disk · b4c5c609
      Minchan Kim authored
      Sasha reported lockdep warning [1] introduced by [2].
      
      It could be fixed by doing disk revalidation out of the init_lock.  It's
      okay because disk capacity change is protected by init_lock so that
      revalidate_disk always sees up-to-date value so there is no race.
      
      [1] https://lkml.org/lkml/2014/7/3/735
      [2] zram: revalidate disk after capacity change
      
      Fixes 2e32baea ("zram: revalidate disk after capacity change").
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: "Alexander E. Patrakov" <patrakov@gmail.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b4c5c609
    • Naoya Horiguchi's avatar
      mm/rmap.c: fix pgoff calculation to handle hugepage correctly · a0f7a756
      Naoya Horiguchi authored
      I triggered VM_BUG_ON() in vma_address() when I tried to migrate an
      anonymous hugepage with mbind() in the kernel v3.16-rc3.  This is
      because pgoff's calculation in rmap_walk_anon() fails to consider
      compound_order() only to have an incorrect value.
      
      This patch introduces page_to_pgoff(), which gets the page's offset in
      PAGE_CACHE_SIZE.
      
      Kirill pointed out that page cache tree should natively handle
      hugepages, and in order to make hugetlbfs fit it, page->index of
      hugetlbfs page should be in PAGE_CACHE_SIZE.  This is beyond this patch,
      but page_to_pgoff() contains the point to be fixed in a single function.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a0f7a756
    • Silesh C V's avatar
      coredump: fix the setting of PF_DUMPCORE · aed8adb7
      Silesh C V authored
      Commit 079148b9 ("coredump: factor out the setting of PF_DUMPCORE")
      cleaned up the setting of PF_DUMPCORE by removing it from all the
      linux_binfmt->core_dump() and moving it to zap_threads().But this ended
      up clearing all the previously set flags.  This causes issues during
      core generation when tsk->flags is checked again (eg.  for PF_USED_MATH
      to dump floating point registers).  Fix this.
      Signed-off-by: default avatarSilesh C V <svellattu@mvista.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Mandeep Singh Baines <msb@chromium.org>
      Cc: <stable@vger.kernel.org>	[3.10+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aed8adb7
    • Kinglong Mee's avatar
      NFSD: Fix crash encoding lock reply on 32-bit · f98bac5a
      Kinglong Mee authored
      Commit 8c7424cf "nfsd4: don't try to encode conflicting owner if low
      on space" forgot to free conf->data in nfsd4_encode_lockt and before
      sign conf->data to NULL in nfsd4_encode_lock_denied, causing a leak.
      
      Worse, kfree() can be called on an uninitialized pointer in the case of
      a succesful lock (or one that fails for a reason other than a conflict).
      
      (Note that lock->lk_denied.ld_owner.data appears it should be zero here,
      until you notice that it's one arm of a union the other arm of which is
      written to in the succesful case by the
      
      	memcpy(&lock->lk_resp_stateid, &lock_stp->st_stid.sc_stateid,
      	                                sizeof(stateid_t));
      
      in nfsd4_lock().  In the 32-bit case this overwrites ld_owner.data.)
      Signed-off-by: default avatarKinglong Mee <kinglongmee@gmail.com>
      Fixes: 8c7424cf ""nfsd4: don't try to encode conflicting owner if low on space"
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      f98bac5a
    • Tejun Heo's avatar
      libata: introduce ata_host->n_tags to avoid oops on SAS controllers · 1a112d10
      Tejun Heo authored
      1871ee13 ("libata: support the ata host which implements a queue
      depth less than 32") directly used ata_port->scsi_host->can_queue from
      ata_qc_new() to determine the number of tags supported by the host;
      unfortunately, SAS controllers doing SATA don't initialize ->scsi_host
      leading to the following oops.
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
       IP: [<ffffffff814e0618>] ata_qc_new_init+0x188/0x1b0
       PGD 0
       Oops: 0002 [#1] SMP
       Modules linked in: isci libsas scsi_transport_sas mgag200 drm_kms_helper ttm
       CPU: 1 PID: 518 Comm: udevd Not tainted 3.16.0-rc6+ #62
       Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
       task: ffff880c1a00b280 ti: ffff88061a000000 task.ti: ffff88061a000000
       RIP: 0010:[<ffffffff814e0618>]  [<ffffffff814e0618>] ata_qc_new_init+0x188/0x1b0
       RSP: 0018:ffff88061a003ae8  EFLAGS: 00010012
       RAX: 0000000000000001 RBX: ffff88000241ca80 RCX: 00000000000000fa
       RDX: 0000000000000020 RSI: 0000000000000020 RDI: ffff8806194aa298
       RBP: ffff88061a003ae8 R08: ffff8806194a8000 R09: 0000000000000000
       R10: 0000000000000000 R11: ffff88000241ca80 R12: ffff88061ad58200
       R13: ffff8806194aa298 R14: ffffffff814e67a0 R15: ffff8806194a8000
       FS:  00007f3ad7fe3840(0000) GS:ffff880627620000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000058 CR3: 000000061a118000 CR4: 00000000001407e0
       Stack:
        ffff88061a003b20 ffffffff814e96e1 ffff88000241ca80 ffff88061ad58200
        ffff8800b6bf6000 ffff880c1c988000 ffff880619903850 ffff88061a003b68
        ffffffffa0056ce1 ffff88061a003b48 0000000013d6e6f8 ffff88000241ca80
       Call Trace:
        [<ffffffff814e96e1>] ata_sas_queuecmd+0xa1/0x430
        [<ffffffffa0056ce1>] sas_queuecommand+0x191/0x220 [libsas]
        [<ffffffff8149afee>] scsi_dispatch_cmd+0x10e/0x300
        [<ffffffff814a3bc5>] scsi_request_fn+0x2f5/0x550
        [<ffffffff81317613>] __blk_run_queue+0x33/0x40
        [<ffffffff8131781a>] queue_unplugged+0x2a/0x90
        [<ffffffff8131ceb4>] blk_flush_plug_list+0x1b4/0x210
        [<ffffffff8131d274>] blk_finish_plug+0x14/0x50
        [<ffffffff8117eaa8>] __do_page_cache_readahead+0x198/0x1f0
        [<ffffffff8117ee21>] force_page_cache_readahead+0x31/0x50
        [<ffffffff8117ee7e>] page_cache_sync_readahead+0x3e/0x50
        [<ffffffff81172ac6>] generic_file_read_iter+0x496/0x5a0
        [<ffffffff81219897>] blkdev_read_iter+0x37/0x40
        [<ffffffff811e307e>] new_sync_read+0x7e/0xb0
        [<ffffffff811e3734>] vfs_read+0x94/0x170
        [<ffffffff811e43c6>] SyS_read+0x46/0xb0
        [<ffffffff811e33d1>] ? SyS_lseek+0x91/0xb0
        [<ffffffff8171ee29>] system_call_fastpath+0x16/0x1b
       Code: 00 00 00 88 50 29 83 7f 08 01 19 d2 83 e2 f0 83 ea 50 88 50 34 c6 81 1d 02 00 00 40 c6 81 17 02 00 00 00 5d c3 66 0f 1f 44 00 00 <89> 14 25 58 00 00 00
      
      Fix it by introducing ata_host->n_tags which is initialized to
      ATA_MAX_QUEUE - 1 in ata_host_init() for SAS controllers and set to
      scsi_host_template->can_queue in ata_host_register() for !SAS ones.
      As SAS hosts are never registered, this will give them the same
      ATA_MAX_QUEUE - 1 as before.  Note that we can't use
      scsi_host->can_queue directly for SAS hosts anyway as they can go
      higher than the libata maximum.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarMike Qiu <qiudayu@linux.vnet.ibm.com>
      Reported-by: default avatarJesse Brandeburg <jesse.brandeburg@gmail.com>
      Reported-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Tested-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Fixes: 1871ee13 ("libata: support the ata host which implements a queue depth less than 32")
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: stable@vger.kernel.org
      1a112d10
    • Catalin Marinas's avatar
      arm64: Create non-empty ZONE_DMA when DRAM starts above 4GB · d50314a6
      Catalin Marinas authored
      ZONE_DMA is created to allow 32-bit only devices to access memory in the
      absence of an IOMMU. On systems where the memory starts above 4GB, it is
      expected that some devices have a DMA offset hardwired to be able to
      access the bottom of the memory. Linux currently supports DT bindings
      for the DMA offsets but they are not (easily) available early during
      boot.
      
      This patch tries to guess a DMA offset and assumes that ZONE_DMA
      corresponds to the 32-bit mask above the start of DRAM.
      
      Fixes: 2d5a5612 (arm64: Limit the CMA buffer to 32-bit if ZONE_DMA)
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: default avatarMark Salter <msalter@redhat.com>
      Tested-by: default avatarMark Salter <msalter@redhat.com>
      Tested-by: default avatarAnup Patel <anup.patel@linaro.org>
      d50314a6
    • Peter Hutterer's avatar
  4. 22 Jul, 2014 13 commits
    • Mike Snitzer's avatar
      Merge branch 'slab/urgent' of... · 45ccaf47
      Mike Snitzer authored
      Merge branch 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux into for-3.16-rcX
      45ccaf47
    • Li Zhong's avatar
      powerpc: use _GLOBAL_TOC for memmove · 6f5405bc
      Li Zhong authored
      memmove may be called from module code copy_pages(btrfs), and it may
      call memcpy, which may call back to C code, so it needs to use
      _GLOBAL_TOC to set up r2 correctly.
      
      This fixes following error when I tried to boot an le guest:
      
      Vector: 300 (Data Access) at [c000000073f97210]
          pc: c000000000015004: enable_kernel_altivec+0x24/0x80
          lr: c000000000058fbc: enter_vmx_copy+0x3c/0x60
          sp: c000000073f97490
         msr: 8000000002009033
         dar: d000000001d50170
       dsisr: 40000000
        current = 0xc0000000734c0000
        paca    = 0xc00000000fff0000	 softe: 0	 irq_happened: 0x01
          pid   = 815, comm = mktemp
      enter ? for help
      [c000000073f974f0] c000000000058fbc enter_vmx_copy+0x3c/0x60
      [c000000073f97510] c000000000057d34 memcpy_power7+0x274/0x840
      [c000000073f97610] d000000001c3179c copy_pages+0xfc/0x110 [btrfs]
      [c000000073f97660] d000000001c3c248 memcpy_extent_buffer+0xe8/0x160 [btrfs]
      [c000000073f97700] d000000001be4be8 setup_items_for_insert+0x208/0x4a0 [btrfs]
      [c000000073f97820] d000000001be50b4 btrfs_insert_empty_items+0xf4/0x140 [btrfs]
      [c000000073f97890] d000000001bfed30 insert_with_overflow+0x70/0x180 [btrfs]
      [c000000073f97900] d000000001bff174 btrfs_insert_dir_item+0x114/0x2f0 [btrfs]
      [c000000073f979a0] d000000001c1f92c btrfs_add_link+0x10c/0x370 [btrfs]
      [c000000073f97a40] d000000001c20e94 btrfs_create+0x204/0x270 [btrfs]
      [c000000073f97b00] c00000000026d438 vfs_create+0x178/0x210
      [c000000073f97b50] c000000000270a70 do_last+0x9f0/0xe90
      [c000000073f97c20] c000000000271010 path_openat+0x100/0x810
      [c000000073f97ce0] c000000000272ea8 do_filp_open+0x58/0xd0
      [c000000073f97dc0] c00000000025ade8 do_sys_open+0x1b8/0x300
      [c000000073f97e30] c00000000000a008 syscall_exit+0x0/0x7c
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6f5405bc
    • Tyrel Datwyler's avatar
      powerpc/pseries: dynamically added OF nodes need to call of_node_init · 97a9a717
      Tyrel Datwyler authored
      Commit 75b57ecf refactored device tree nodes to use kobjects such that they
      can be exposed via /sysfs. A secondary commit 0829f6d1 furthered this rework
      by moving the kobect initialization logic out of of_node_add into its own
      of_node_init function. The inital commit removed the existing kref_init calls
      in the pseries dlpar code with the assumption kobject initialization would
      occur in of_node_add. The second commit had the side effect of triggering a
      BUG_ON during DLPAR, migration and suspend/resume operations as a result of
      dynamically added nodes being uninitialized.
      
      This patch fixes this by adding of_node_init calls in place of the previously
      removed kref_init calls.
      
      Fixes: 0829f6d1 ("of: device_node kobject lifecycle fixes")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Acked-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Acked-by: default avatarGrant Likely <grant.likely@linaro.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      97a9a717
    • Aneesh Kumar K.V's avatar
      powerpc: subpage_protect: Increase the array size to take care of 64TB · dad6f37c
      Aneesh Kumar K.V authored
      We now support TASK_SIZE of 16TB, hence the array should be 8.
      
      Fixes the below crash:
      
      Unable to handle kernel paging request for data at address 0x000100bd
      Faulting instruction address: 0xc00000000004f914
      cpu 0x13: Vector: 300 (Data Access) at [c000000fea75fa90]
          pc: c00000000004f914: .sys_subpage_prot+0x2d4/0x5c0
          lr: c00000000004fb5c: .sys_subpage_prot+0x51c/0x5c0
          sp: c000000fea75fd10
         msr: 9000000000009032
         dar: 100bd
       dsisr: 40000000
        current = 0xc000000fea6ae490
        paca    = 0xc00000000fb8ab00   softe: 0        irq_happened: 0x00
          pid   = 8237, comm = a.out
      enter ? for help
      [c000000fea75fe30] c00000000000a164 syscall_exit+0x0/0x98
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      dad6f37c
    • Paul Mackerras's avatar
      powerpc: Fix bugs in emulate_step() · e698b966
      Paul Mackerras authored
      This fixes some bugs in emulate_step().  First, the setting of the carry
      bit for the arithmetic right-shift instructions was not correct on 64-bit
      machines because we were masking with a mask of type int rather than
      unsigned long.  Secondly, the sld (shift left doubleword) instruction was
      using the wrong instruction field for the register containing the shift
      count.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e698b966
    • Joel Stanley's avatar
      powerpc: Disable doorbells on Power8 DD1.x · bd6ba351
      Joel Stanley authored
      These processors do not currently support doorbell IPIs, so remove them
      from the feature list if we are at DD 1.xx for the 0x004d part.
      
      This fixes a regression caused by d4e58e59 (powerpc/powernv: Enable
      POWER8 doorbell IPIs). With that patch the kernel would hang at boot
      when calling smp_call_function_many, as the doorbell would not be
      received by the target CPUs:
      
        .smp_call_function_many+0x2bc/0x3c0 (unreliable)
        .on_each_cpu_mask+0x30/0x100
        .cpuidle_register_driver+0x158/0x1a0
        .cpuidle_register+0x2c/0x110
        .powernv_processor_idle_init+0x23c/0x2c0
        .do_one_initcall+0xd4/0x260
        .kernel_init_freeable+0x25c/0x33c
        .kernel_init+0x1c/0x120
        .ret_from_kernel_thread+0x58/0x7c
      
      Fixes: d4e58e59 (powerpc/powernv: Enable POWER8 doorbell IPIs)
      Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      bd6ba351
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 15ba2236
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Null termination fix in dns_resolver got the pointer dereferncing
          wrong, fix from Ben Hutchings.
      
       2) ip_options_compile() has a benign but real buffer overflow when
          parsing options.  From Eric Dumazet.
      
       3) Table updates can crash in netfilter's nftables if none of the state
          flags indicate an actual change, from Pablo Neira Ayuso.
      
       4) Fix race in nf_tables dumping, also from Pablo.
      
       5) GRE-GRO support broke the forwarding path because the segmentation
          state was not fully initialized in these paths, from Jerry Chu.
      
       6) sunvnet driver leaks objects and potentially crashes on module
          unload, from Sowmini Varadhan.
      
       7) We can accidently generate the same handle for several u32
          classifier filters, fix from Cong Wang.
      
       8) Several edge case bug fixes in fragment handling in xen-netback,
          from Zoltan Kiss.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits)
        ipv4: fix buffer overflow in ip_options_compile()
        batman-adv: fix TT VLAN inconsistency on VLAN re-add
        batman-adv: drop QinQ claim frames in bridge loop avoidance
        dns_resolver: Null-terminate the right string
        xen-netback: Fix pointer incrementation to avoid incorrect logging
        xen-netback: Fix releasing header slot on error path
        xen-netback: Fix releasing frag_list skbs in error path
        xen-netback: Fix handling frag_list on grant op error path
        net_sched: avoid generating same handle for u32 filters
        net: huawei_cdc_ncm: add "subclass 3" devices
        net: qmi_wwan: add two Sierra Wireless/Netgear devices
        wan/x25_asy: integer overflow in x25_asy_change_mtu()
        net: ppp: fix creating PPP pass and active filters
        net/mlx4_en: cq->irq_desc wasn't set in legacy EQ's
        sunvnet: clean up objects created in vnet_new() on vnet_exit()
        r8169: Enable RX_MULTI_EN for RTL_GIGA_MAC_VER_40
        net-gre-gro: Fix a bug that breaks the forwarding path
        netfilter: nf_tables: 64bit stats need some extra synchronization
        netfilter: nf_tables: set NLM_F_DUMP_INTR if netlink dumping is stale
        netfilter: nf_tables: safe RCU iteration on list when dumping
        ...
      15ba2236
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 89faa06e
      Linus Torvalds authored
      Pull sparc fix from David Miller:
       "Need to hook up the new renameat2 system call"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc: Hook up renameat2 syscall.
      89faa06e
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide · 14867719
      Linus Torvalds authored
      Pull IDE fixes from David Miller:
       - fix interrupt registry for some Atari IDE chipsets.
       - adjust Kconfig dependencies for x86_32 specific chips.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide:
        ide: Fix SC1200 dependencies
        ide: Fix CS5520 and CS5530 dependencies
        m68k/atari - ide: do not register interrupt if host->get_lock is set
      14867719
    • Linus Torvalds's avatar
      Merge tag 'trace-fixes-v3.16-rc6' of... · 8dcc3be2
      Linus Torvalds authored
      Merge tag 'trace-fixes-v3.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
      
      Pull trace fix from Steven Rostedt:
       "Tony Luck found that using the "uptime" trace clock that uses jiffies
        as a counter was converted to nanoseconds (silly), and after 1 hour 11
        minutes and 34 seconds, this monotonic clock would wrap, causing havoc
        with the tracing system and making the clock useless.
      
        He converted that clock to use jiffies_64 and made it into a counter
        instead of nanosecond conversions, and displayed the clock with the
        straight jiffy count, which works much better than it did in the past"
      
      * tag 'trace-fixes-v3.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Fix wraparound problems in "uptime" trace clock
      8dcc3be2
    • David S. Miller's avatar
      sparc: Hook up renameat2 syscall. · 26053926
      David S. Miller authored
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26053926
    • David S. Miller's avatar
      Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge · 850717ef
      David S. Miller authored
      Antonio Quartulli says:
      
      ====================
      pull request [net]: batman-adv 20140721
      
      here you have two fixes that we have been testing for quite some time
      (this is why they arrived a bit late in the rc cycle).
      
      Patch 1) ensures that BLA packets get dropped and not forwarded to the
      mesh even if they reach batman-adv within QinQ frames. Forwarding them
      into the mesh means messing up with the TT database of other nodes which
      can generate all kind of unexpected behaviours during route computation.
      
      Patch 2) avoids a couple of race conditions triggered upon fast VLAN
      deletion-addition. Such race conditions are pretty dangerous because
      they not only create inconsistencies in the TT database of the nodes
      in the network, but such scenario is also unrecoverable (unless
      nodes are rebooted).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      850717ef
    • Eric Dumazet's avatar
      ipv4: fix buffer overflow in ip_options_compile() · 10ec9472
      Eric Dumazet authored
      There is a benign buffer overflow in ip_options_compile spotted by
      AddressSanitizer[1] :
      
      Its benign because we always can access one extra byte in skb->head
      (because header is followed by struct skb_shared_info), and in this case
      this byte is not even used.
      
      [28504.910798] ==================================================================
      [28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile
      [28504.913170] Read of size 1 by thread T15843:
      [28504.914026]  [<ffffffff81802f91>] ip_options_compile+0x121/0x9c0
      [28504.915394]  [<ffffffff81804a0d>] ip_options_get_from_user+0xad/0x120
      [28504.916843]  [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
      [28504.918175]  [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
      [28504.919490]  [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
      [28504.920835]  [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
      [28504.922208]  [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
      [28504.923459]  [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
      [28504.924722]
      [28504.925106] Allocated by thread T15843:
      [28504.925815]  [<ffffffff81804995>] ip_options_get_from_user+0x35/0x120
      [28504.926884]  [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
      [28504.927975]  [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
      [28504.929175]  [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
      [28504.930400]  [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
      [28504.931677]  [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
      [28504.932851]  [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
      [28504.934018]
      [28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right
      [28504.934377]  of 40-byte region [ffff880026382800, ffff880026382828)
      [28504.937144]
      [28504.937474] Memory state around the buggy address:
      [28504.938430]  ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.939884]  ffff880026382400: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.941294]  ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.942504]  ffff880026382600: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.943483]  ffff880026382700: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.944511] >ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.945573]                         ^
      [28504.946277]  ffff880026382900: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.094949]  ffff880026382a00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.096114]  ffff880026382b00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.097116]  ffff880026382c00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.098472]  ffff880026382d00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.099804] Legend:
      [28505.100269]  f - 8 freed bytes
      [28505.100884]  r - 8 redzone bytes
      [28505.101649]  . - 8 allocated bytes
      [28505.102406]  x=1..7 - x allocated bytes + (8-x) redzone bytes
      [28505.103637] ==================================================================
      
      [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernelSigned-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10ec9472
  5. 21 Jul, 2014 2 commits
    • Linus Torvalds's avatar
      Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 67dd8f35
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
       "A series of driver fixes:
         - fix DVB-S tuning with tda1071
         - fix tuner probe on af9035 when the device has a bad eeprom
         - some fixes for the new si2168/2157 drivers
         - one Kconfig build fix (for omap4iss)
         - fixes at vpif error path
         - don't lock saa7134 ioctl at driver's base core level, as it now
           uses V4L2 and VB2 locking schema
         - fix audio at hdpvr driver
         - fix the aspect ratio at the digital timings table
         - one new USB ID (at gspca_pac7302): Genius i-Look 317 webcam"
      
      * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        [media] gspca_pac7302: Add new usb-id for Genius i-Look 317
        [media] tda10071: fix returned symbol rate calculation
        [media] tda10071: fix spec inversion reporting
        [media] tda10071: add missing DVB-S2/PSK-8 FEC AUTO
        [media] tda10071: force modulation to QPSK on DVB-S
        [media] hdpvr: fix two audio bugs
        [media] davinci: vpif: missing unlocks on error
        [media] af9035: override tuner id when bad value set into eeprom
        [media] saa7134: use unlocked_ioctl instead of ioctl
        [media] media: v4l2-core: v4l2-dv-timings.c: Cleaning up code wrong value used in aspect ratio
        [media] si2168: firmware download fix
        [media] si2157: add one missing parenthesis
        [media] si2168: add one missing parenthesis
        [media] staging: tighten omap4iss dependencies
      67dd8f35
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 6890ad4b
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Final block fixes for 3.16
      
        Four small fixes that should go into 3.16, have been queued up for a
        bit and delayed due to vacation and other euro duties.  But here they
        are.  The pull request contains:
      
         - Fix for a reported crash with shared tagging on SCSI from Christoph
      
         - A regression fix for drbd.  From Lars Ellenberg.
      
         - Hooking up the compat ioctl for BLKZEROOUT, which requires no
           translation.  From Mikulas.
      
      - A fix for a regression where we woud crash on queue exit if the
        root_blkg is gone/not there. From Tejun"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        block: provide compat ioctl for BLKZEROOUT
        blkcg: don't call into policy draining if root_blkg is already gone
        drbd: fix regression 'out of mem, failed to invoke fence-peer helper'
        block: don't assume last put of shared tags is for the host
      6890ad4b