1. 02 Feb, 2016 40 commits
    • Vitaly Kuznetsov's avatar
      panic: release stale console lock to always get the logbuf printed out · c7afd462
      Vitaly Kuznetsov authored
      commit 08d78658 upstream.
      
      In some cases we may end up killing the CPU holding the console lock
      while still having valuable data in logbuf. E.g. I'm observing the
      following:
      
      - A crash is happening on one CPU and console_unlock() is being called on
        some other.
      
      - console_unlock() tries to print out the buffer before releasing the lock
        and on slow console it takes time.
      
      - in the meanwhile crashing CPU does lots of printk()-s with valuable data
        (which go to the logbuf) and sends IPIs to all other CPUs.
      
      - console_unlock() finishes printing previous chunk and enables interrupts
        before trying to print out the rest, the CPU catches the IPI and never
        releases console lock.
      
      This is not the only possible case: in VT/fb subsystems we have many other
      console_lock()/console_unlock() users.  Non-masked interrupts (or
      receiving NMI in case of extreme slowness) will have the same result.
      Getting the whole console buffer printed out on crash should be top
      priority.
      
      [akpm@linux-foundation.org: tweak comment text]
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      c7afd462
    • Martijn Coenen's avatar
      memcg: only free spare array when readers are done · b3bcee19
      Martijn Coenen authored
      commit 6611d8d7 upstream.
      
      A spare array holding mem cgroup threshold events is kept around to make
      sure we can always safely deregister an event and have an array to store
      the new set of events in.
      
      In the scenario where we're going from 1 to 0 registered events, the
      pointer to the primary array containing 1 event is copied to the spare
      slot, and then the spare slot is freed because no events are left.
      However, it is freed before calling synchronize_rcu(), which means
      readers may still be accessing threshold->primary after it is freed.
      
      Fixed by only freeing after synchronize_rcu().
      Signed-off-by: default avatarMartijn Coenen <maco@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      b3bcee19
    • Naoya Horiguchi's avatar
      mm: soft-offline: check return value in second __get_any_page() call · bd22318f
      Naoya Horiguchi authored
      commit d96b339f upstream.
      
      I saw the following BUG_ON triggered in a testcase where a process calls
      madvise(MADV_SOFT_OFFLINE) on thps, along with a background process that
      calls migratepages command repeatedly (doing ping-pong among different
      NUMA nodes) for the first process:
      
         Soft offlining page 0x60000 at 0x700000600000
         __get_any_page: 0x60000 free buddy page
         page:ffffea0001800000 count:0 mapcount:-127 mapping:          (null) index:0x1
         flags: 0x1fffc0000000000()
         page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0)
         ------------[ cut here ]------------
         kernel BUG at /src/linux-dev/include/linux/mm.h:342!
         invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
         Modules linked in: cfg80211 rfkill crc32c_intel serio_raw virtio_balloon i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi
         CPU: 3 PID: 3035 Comm: test_alloc_gene Tainted: G           O    4.4.0-rc8-v4.4-rc8-160107-1501-00000-rc8+ #74
         Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
         task: ffff88007c63d5c0 ti: ffff88007c210000 task.ti: ffff88007c210000
         RIP: 0010:[<ffffffff8118998c>]  [<ffffffff8118998c>] put_page+0x5c/0x60
         RSP: 0018:ffff88007c213e00  EFLAGS: 00010246
         Call Trace:
           put_hwpoison_page+0x4e/0x80
           soft_offline_page+0x501/0x520
           SyS_madvise+0x6bc/0x6f0
           entry_SYSCALL_64_fastpath+0x12/0x6a
         Code: 8b fc ff ff 5b 5d c3 48 89 df e8 b0 fa ff ff 48 89 df 31 f6 e8 c6 7d ff ff 5b 5d c3 48 c7 c6 08 54 a2 81 48 89 df e8 a4 c5 01 00 <0f> 0b 66 90 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 8b 47
         RIP  [<ffffffff8118998c>] put_page+0x5c/0x60
          RSP <ffff88007c213e00>
      
      The root cause resides in get_any_page() which retries to get a refcount
      of the page to be soft-offlined.  This function calls
      put_hwpoison_page(), expecting that the target page is putback to LRU
      list.  But it can be also freed to buddy.  So the second check need to
      care about such case.
      
      Fixes: af8fae7c ("mm/memory-failure.c: clean up soft_offline_page()")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      bd22318f
    • Kyeongdon Kim's avatar
      zram: try vmalloc() after kmalloc() · 75b79536
      Kyeongdon Kim authored
      commit d913897a upstream.
      
      When we're using LZ4 multi compression streams for zram swap, we found
      out page allocation failure message in system running test.  That was
      not only once, but a few(2 - 5 times per test).  Also, some failure
      cases were continually occurring to try allocation order 3.
      
      In order to make parallel compression private data, we should call
      kzalloc() with order 2/3 in runtime(lzo/lz4).  But if there is no order
      2/3 size memory to allocate in that time, page allocation fails.  This
      patch makes to use vmalloc() as fallback of kmalloc(), this prevents
      page alloc failure warning.
      
      After using this, we never found warning message in running test, also
      It could reduce process startup latency about 60-120ms in each case.
      
      For reference a call trace :
      
          Binder_1: page allocation failure: order:3, mode:0x10c0d0
          CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty #20
          Call trace:
            dump_backtrace+0x0/0x270
            show_stack+0x10/0x1c
            dump_stack+0x1c/0x28
            warn_alloc_failed+0xfc/0x11c
            __alloc_pages_nodemask+0x724/0x7f0
            __get_free_pages+0x14/0x5c
            kmalloc_order_trace+0x38/0xd8
            zcomp_lz4_create+0x2c/0x38
            zcomp_strm_alloc+0x34/0x78
            zcomp_strm_multi_find+0x124/0x1ec
            zcomp_strm_find+0xc/0x18
            zram_bvec_rw+0x2fc/0x780
            zram_make_request+0x25c/0x2d4
            generic_make_request+0x80/0xbc
            submit_bio+0xa4/0x15c
            __swap_writepage+0x218/0x230
            swap_writepage+0x3c/0x4c
            shrink_page_list+0x51c/0x8d0
            shrink_inactive_list+0x3f8/0x60c
            shrink_lruvec+0x33c/0x4cc
            shrink_zone+0x3c/0x100
            try_to_free_pages+0x2b8/0x54c
            __alloc_pages_nodemask+0x514/0x7f0
            __get_free_pages+0x14/0x5c
            proc_info_read+0x50/0xe4
            vfs_read+0xa0/0x12c
            SyS_read+0x44/0x74
          DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
               0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB
      
      [minchan@kernel.org: change vmalloc gfp and adding comment about gfp]
      [sergey.senozhatsky@gmail.com: tweak comments and styles]
      Signed-off-by: default avatarKyeongdon Kim <kyeongdon.kim@lge.com>
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      75b79536
    • Sergey Senozhatsky's avatar
      zram/zcomp: use GFP_NOIO to allocate streams · 3b1cecbc
      Sergey Senozhatsky authored
      commit 3d5fe03a upstream.
      
      We can end up allocating a new compression stream with GFP_KERNEL from
      within the IO path, which may result is nested (recursive) IO
      operations.  That can introduce problems if the IO path in question is a
      reclaimer, holding some locks that will deadlock nested IOs.
      
      Allocate streams and working memory using GFP_NOIO flag, forbidding
      recursive IO and FS operations.
      
      An example:
      
        inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
        git/20158 [HC0[0]:SC0[0]:HE1:SE1] takes:
         (jbd2_handle){+.+.?.}, at:  start_this_handle+0x4ca/0x555
        {IN-RECLAIM_FS-W} state was registered at:
           __lock_acquire+0x8da/0x117b
           lock_acquire+0x10c/0x1a7
           start_this_handle+0x52d/0x555
           jbd2__journal_start+0xb4/0x237
           __ext4_journal_start_sb+0x108/0x17e
           ext4_dirty_inode+0x32/0x61
           __mark_inode_dirty+0x16b/0x60c
           iput+0x11e/0x274
           __dentry_kill+0x148/0x1b8
           shrink_dentry_list+0x274/0x44a
           prune_dcache_sb+0x4a/0x55
           super_cache_scan+0xfc/0x176
           shrink_slab.part.14.constprop.25+0x2a2/0x4d3
           shrink_zone+0x74/0x140
           kswapd+0x6b7/0x930
           kthread+0x107/0x10f
           ret_from_fork+0x3f/0x70
        irq event stamp: 138297
        hardirqs last  enabled at (138297):  debug_check_no_locks_freed+0x113/0x12f
        hardirqs last disabled at (138296):  debug_check_no_locks_freed+0x33/0x12f
        softirqs last  enabled at (137818):  __do_softirq+0x2d3/0x3e9
        softirqs last disabled at (137813):  irq_exit+0x41/0x95
      
                     other info that might help us debug this:
         Possible unsafe locking scenario:
               CPU0
               ----
          lock(jbd2_handle);
          <Interrupt>
            lock(jbd2_handle);
      
                      *** DEADLOCK ***
        5 locks held by git/20158:
         #0:  (sb_writers#7){.+.+.+}, at: [<ffffffff81155411>] mnt_want_write+0x24/0x4b
         #1:  (&type->i_mutex_dir_key#2/1){+.+.+.}, at: [<ffffffff81145087>] lock_rename+0xd9/0xe3
         #2:  (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [<ffffffff8114f8e2>] lock_two_nondirectories+0x3f/0x6b
         #3:  (&sb->s_type->i_mutex_key#11/4){+.+.+.}, at: [<ffffffff8114f909>] lock_two_nondirectories+0x66/0x6b
         #4:  (jbd2_handle){+.+.?.}, at: [<ffffffff811e31db>] start_this_handle+0x4ca/0x555
      
                     stack backtrace:
        CPU: 2 PID: 20158 Comm: git Not tainted 4.1.0-rc7-next-20150615-dbg-00016-g8bdf555-dirty #211
        Call Trace:
          dump_stack+0x4c/0x6e
          mark_lock+0x384/0x56d
          mark_held_locks+0x5f/0x76
          lockdep_trace_alloc+0xb2/0xb5
          kmem_cache_alloc_trace+0x32/0x1e2
          zcomp_strm_alloc+0x25/0x73 [zram]
          zcomp_strm_multi_find+0xe7/0x173 [zram]
          zcomp_strm_find+0xc/0xe [zram]
          zram_bvec_rw+0x2ca/0x7e0 [zram]
          zram_make_request+0x1fa/0x301 [zram]
          generic_make_request+0x9c/0xdb
          submit_bio+0xf7/0x120
          ext4_io_submit+0x2e/0x43
          ext4_bio_write_page+0x1b7/0x300
          mpage_submit_page+0x60/0x77
          mpage_map_and_submit_buffers+0x10f/0x21d
          ext4_writepages+0xc8c/0xe1b
          do_writepages+0x23/0x2c
          __filemap_fdatawrite_range+0x84/0x8b
          filemap_flush+0x1c/0x1e
          ext4_alloc_da_blocks+0xb8/0x117
          ext4_rename+0x132/0x6dc
          ? mark_held_locks+0x5f/0x76
          ext4_rename2+0x29/0x2b
          vfs_rename+0x540/0x636
          SyS_renameat2+0x359/0x44d
          SyS_rename+0x1e/0x20
          entry_SYSCALL_64_fastpath+0x12/0x6f
      
      [minchan@kernel.org: add stable mark]
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Kyeongdon Kim <kyeongdon.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      3b1cecbc
    • Takashi Iwai's avatar
      ALSA: timer: Harden slave timer list handling · c8ca4c5a
      Takashi Iwai authored
      commit b5a663aa upstream.
      
      A slave timer instance might be still accessible in a racy way while
      operating the master instance as it lacks of locking.  Since the
      master operation is mostly protected with timer->lock, we should cope
      with it while changing the slave instance, too.  Also, some linked
      lists (active_list and ack_list) of slave instances aren't unlinked
      immediately at stopping or closing, and this may lead to unexpected
      accesses.
      
      This patch tries to address these issues.  It adds spin lock of
      timer->lock (either from master or slave, which is equivalent) in a
      few places.  For avoiding a deadlock, we ensure that the global
      slave_active_lock is always locked at first before each timer lock.
      
      Also, ack and active_list of slave instances are properly unlinked at
      snd_timer_stop() and snd_timer_close().
      
      Last but not least, remove the superfluous call of _snd_timer_stop()
      at removing slave links.  This is a noop, and calling it may confuse
      readers wrt locking.  Further cleanup will follow in a later patch.
      
      Actually we've got reports of use-after-free by syzkaller fuzzer, and
      this hopefully fixes these issues.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      c8ca4c5a
    • xuejiufei's avatar
      ocfs2/dlm: ignore cleaning the migration mle that is inuse · d6536cc5
      xuejiufei authored
      commit bef5502d upstream.
      
      We have found that migration source will trigger a BUG that the refcount
      of mle is already zero before put when the target is down during
      migration.  The situation is as follows:
      
      dlm_migrate_lockres
        dlm_add_migration_mle
        dlm_mark_lockres_migrating
        dlm_get_mle_inuse
        <<<<<< Now the refcount of the mle is 2.
        dlm_send_one_lockres and wait for the target to become the
        new master.
        <<<<<< o2hb detect the target down and clean the migration
        mle. Now the refcount is 1.
      
      dlm_migrate_lockres woken, and put the mle twice when found the target
      goes down which trigger the BUG with the following message:
      
        "ERROR: bad mle: ".
      Signed-off-by: default avatarJiufei Xue <xuejiufei@huawei.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      d6536cc5
    • Sergey Senozhatsky's avatar
      scripts/bloat-o-meter: fix python3 syntax error · f0e30712
      Sergey Senozhatsky authored
      commit 72214a24 upstream.
      
      In Python3+ print is a function so the old syntax is not correct
      anymore:
      
        $ ./scripts/bloat-o-meter vmlinux.o vmlinux.o.old
          File "./scripts/bloat-o-meter", line 61
            print "add/remove: %s/%s grow/shrink: %s/%s up/down: %s/%s (%s)" % \
                                                                           ^
        SyntaxError: invalid syntax
      
      Fix by calling print as a function.
      
      Tested on python 2.7.11, 3.5.1
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      f0e30712
    • Laura Abbott's avatar
      dma-debug: switch check from _text to _stext · 8e2097d9
      Laura Abbott authored
      commit ea535e41 upstream.
      
      In include/asm-generic/sections.h:
      
        /*
         * Usage guidelines:
         * _text, _data: architecture specific, don't use them in
         * arch-independent code
         * [_stext, _etext]: contains .text.* sections, may also contain
         * .rodata.*
         *                   and/or .init.* sections
      
      _text is not guaranteed across architectures.  Architectures such as ARM
      may reuse parts which are not actually text and erroneously trigger a bug.
      Switch to using _stext which is guaranteed to contain text sections.
      
      Came out of https://lkml.kernel.org/g/<567B1176.4000106@redhat.com>
      Signed-off-by: default avatarLaura Abbott <labbott@fedoraproject.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      8e2097d9
    • Sudip Mukherjee's avatar
      m32r: fix m32104ut_defconfig build fail · 2c90425d
      Sudip Mukherjee authored
      commit 601f1db6 upstream.
      
      The build of m32104ut_defconfig for m32r arch was failing for long long
      time with the error:
      
        ERROR: "memory_start" [fs/udf/udf.ko] undefined!
        ERROR: "memory_end" [fs/udf/udf.ko] undefined!
        ERROR: "memory_end" [drivers/scsi/sg.ko] undefined!
        ERROR: "memory_start" [drivers/scsi/sg.ko] undefined!
        ERROR: "memory_end" [drivers/i2c/i2c-dev.ko] undefined!
        ERROR: "memory_start" [drivers/i2c/i2c-dev.ko] undefined!
      
      As done in other architectures export the symbols to fix the error.
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarSudip Mukherjee <sudip@vectorindia.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      2c90425d
    • Vasily Averin's avatar
      cifs_dbg() outputs an uninitialized buffer in cifs_readdir() · c9c482ef
      Vasily Averin authored
      commit 01b9b0b2 upstream.
      
      In some cases tmp_bug can be not filled in cifs_filldir and stay uninitialized,
      therefore its printk with "%s" modifier can leak content of kernelspace memory.
      If old content of this buffer does not contain '\0' access bejond end of
      allocated object can crash the host.
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarSteve French <sfrench@localhost.localdomain>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      c9c482ef
    • Rabin Vincent's avatar
      cifs: fix race between call_async() and reconnect() · eb2c033d
      Rabin Vincent authored
      commit 820962dc upstream.
      
      cifs_call_async() queues the MID to the pending list and calls
      smb_send_rqst().  If smb_send_rqst() performs a partial send, it sets
      the tcpStatus to CifsNeedReconnect and returns an error code to
      cifs_call_async().  In this case, cifs_call_async() removes the MID
      from the list and returns to the caller.
      
      However, cifs_call_async() releases the server mutex _before_ removing
      the MID.  This means that a cifs_reconnect() can race with this function
      and manage to remove the MID from the list and delete the entry before
      cifs_call_async() calls cifs_delete_mid().  This leads to various
      crashes due to the use after free in cifs_delete_mid().
      
      Task1				Task2
      
      cifs_call_async():
       - rc = -EAGAIN
       - mutex_unlock(srv_mutex)
      
      				cifs_reconnect():
      				 - mutex_lock(srv_mutex)
      				 - mutex_unlock(srv_mutex)
      				 - list_delete(mid)
      				 - mid->callback()
      				 	cifs_writev_callback():
      				 		- mutex_lock(srv_mutex)
      						- delete(mid)
      				 		- mutex_unlock(srv_mutex)
      
       - cifs_delete_mid(mid) <---- use after free
      
      Fix this by removing the MID in cifs_call_async() before releasing the
      srv_mutex.  Also hold the srv_mutex in cifs_reconnect() until the MIDs
      are moved out of the pending list.
      Signed-off-by: default avatarRabin Vincent <rabin.vincent@axis.com>
      Acked-by: default avatarShirish Pargaonkar <shirishpargaonkar@gmail.com>
      Signed-off-by: default avatarSteve French <sfrench@localhost.localdomain>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      eb2c033d
    • Dmitry V. Levin's avatar
      sparc64: fix incorrect sign extension in sys_sparc64_personality · b51db3dc
      Dmitry V. Levin authored
      commit 525fd5a9 upstream.
      
      The value returned by sys_personality has type "long int".
      It is saved to a variable of type "int", which is not a problem
      yet because the type of task_struct->pesonality is "unsigned int".
      The problem is the sign extension from "int" to "long int"
      that happens on return from sys_sparc64_personality.
      
      For example, a userspace call personality((unsigned) -EINVAL) will
      result to any subsequent personality call, including absolutely
      harmless read-only personality(0xffffffff) call, failing with
      errno set to EINVAL.
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      b51db3dc
    • Takashi Iwai's avatar
      ALSA: timer: Fix race among timer ioctls · ffa534e3
      Takashi Iwai authored
      commit af368027 upstream.
      
      ALSA timer ioctls have an open race and this may lead to a
      use-after-free of timer instance object.  A simplistic fix is to make
      each ioctl exclusive.  We have already tread_sem for controlling the
      tread, and extend this as a global mutex to be applied to each ioctl.
      
      The downside is, of course, the worse concurrency.  But these ioctls
      aren't to be parallel accessible, in anyway, so it should be fine to
      serialize there.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      ffa534e3
    • Linus Walleij's avatar
      mmc: mmci: fix an ages old detection error · c072e2fd
      Linus Walleij authored
      commit 0bcb7efd upstream.
      
      commit 4956e109 ("ARM: 6244/1: mmci: add variant data and default
      MCICLOCK support") added variant data for ARM, U300 and Ux500 variants.
      The Nomadik NHK8815/8820 variant was erroneously labeled as a U300
      variant, and when the proper Nomadik variant was later introduced in
      commit 34fd4213 ("ARM: 7378/1: mmci: add support for the Nomadik MMCI
      variant") this was not fixes. Let's say this fixes the latter commit as
      there was no proper Nomadik support until then.
      
      Fixes: 34fd4213 ("ARM: 7378/1: mmci: add support for the Nomadik...")
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      c072e2fd
    • Mans Rullgard's avatar
      dmaengine: dw: fix cyclic transfer callbacks · 14e23fa4
      Mans Rullgard authored
      commit 2895b2ca upstream.
      
      Cyclic transfer callbacks rely on block completion interrupts which were
      disabled in commit ff7b05f2 ("dmaengine/dw_dmac: Don't handle block
      interrupts").  This re-enables block interrupts so the cyclic callbacks
      can work.  Other transfer types are not affected as they set the INT_EN
      bit only on the last block.
      
      Fixes: ff7b05f2 ("dmaengine/dw_dmac: Don't handle block interrupts")
      Signed-off-by: default avatarMans Rullgard <mans@mansr.com>
      Reviewed-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      14e23fa4
    • Mans Rullgard's avatar
      dmaengine: dw: fix cyclic transfer setup · e1f8b209
      Mans Rullgard authored
      commit df3bb8a0 upstream.
      
      Commit 61e183f8 ("dmaengine/dw_dmac: Reconfigure interrupt and
      chan_cfg register on resume") moved some channel initialisation to
      a new function which must be called before starting a transfer.
      
      This updates dw_dma_cyclic_start() to use dwc_dostart() like the other
      modes, thus ensuring dwc_initialize() gets called and removing some code
      duplication.
      
      Fixes: 61e183f8 ("dmaengine/dw_dmac: Reconfigure interrupt and chan_cfg register on resume")
      Signed-off-by: default avatarMans Rullgard <mans@mansr.com>
      Reviewed-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      e1f8b209
    • Takashi Iwai's avatar
      ALSA: timer: Fix double unlink of active_list · 4a7ff8dc
      Takashi Iwai authored
      commit ee8413b0 upstream.
      
      ALSA timer instance object has a couple of linked lists and they are
      unlinked unconditionally at snd_timer_stop().  Meanwhile
      snd_timer_interrupt() unlinks it, but it calls list_del() which leaves
      the element list itself unchanged.  This ends up with unlinking twice,
      and it was caught by syzkaller fuzzer.
      
      The fix is to use list_del_init() variant properly there, too.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      4a7ff8dc
    • Andy Lutomirski's avatar
      x86/mm: Improve switch_mm() barrier comments · 31d0fd05
      Andy Lutomirski authored
      commit 4eaffdd5 upstream.
      
      My previous comments were still a bit confusing and there was a
      typo. Fix it up.
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 71b3c126 ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization")
      Link: http://lkml.kernel.org/r/0a0b43cdcdd241c5faaaecfbcc91a155ddedc9a1.1452631609.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      31d0fd05
    • Ulrich Weigand's avatar
      powerpc/module: Handle R_PPC64_ENTRY relocations · cd8014be
      Ulrich Weigand authored
      commit a61674bd upstream.
      
      GCC 6 will include changes to generated code with -mcmodel=large,
      which is used to build kernel modules on powerpc64le.  This was
      necessary because the large model is supposed to allow arbitrary
      sizes and locations of the code and data sections, but the ELFv2
      global entry point prolog still made the unconditional assumption
      that the TOC associated with any particular function can be found
      within 2 GB of the function entry point:
      
      func:
      	addis r2,r12,(.TOC.-func)@ha
      	addi  r2,r2,(.TOC.-func)@l
      	.localentry func, .-func
      
      To remove this assumption, GCC will now generate instead this global
      entry point prolog sequence when using -mcmodel=large:
      
      	.quad .TOC.-func
      func:
      	.reloc ., R_PPC64_ENTRY
      	ld    r2, -8(r12)
      	add   r2, r2, r12
      	.localentry func, .-func
      
      The new .reloc triggers an optimization in the linker that will
      replace this new prolog with the original code (see above) if the
      linker determines that the distance between .TOC. and func is in
      range after all.
      
      Since this new relocation is now present in module object files,
      the kernel module loader is required to handle them too.  This
      patch adds support for the new relocation and implements the
      same optimization done by the GNU linker.
      Signed-off-by: default avatarUlrich Weigand <ulrich.weigand@de.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      cd8014be
    • Ulrich Weigand's avatar
      scripts/recordmcount.pl: support data in text section on powerpc · 6a3c0e4e
      Ulrich Weigand authored
      commit 2e50c4be upstream.
      
      If a text section starts out with a data blob before the first
      function start label, disassembly parsing doing in recordmcount.pl
      gets confused on powerpc, leading to creation of corrupted module
      objects.
      
      This was not a problem so far since the compiler would never create
      such text sections.  However, this has changed with a recent change
      in GCC 6 to support distances of > 2GB between a function and its
      assoicated TOC in the ELFv2 ABI, exposing this problem.
      
      There is already code in recordmcount.pl to handle such data blobs
      on the sparc64 platform.  This patch uses the same method to handle
      those on powerpc as well.
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarUlrich Weigand <ulrich.weigand@de.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      6a3c0e4e
    • Helge Deller's avatar
      parisc: Fix __ARCH_SI_PREAMBLE_SIZE · 65bffd7e
      Helge Deller authored
      commit e60fc5aa upstream.
      
      On a 64bit kernel build the compiler aligns the _sifields union in the
      struct siginfo_t on a 64bit address. The __ARCH_SI_PREAMBLE_SIZE define
      compensates for this alignment and thus fixes the wait testcase of the
      strace package.
      
      The symptoms of a wrong __ARCH_SI_PREAMBLE_SIZE value is that
      _sigchld.si_stime variable is missed to be copied and thus after a
      copy_siginfo() will have uninitialized values.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      65bffd7e
    • Minchan Kim's avatar
      virtio_balloon: fix race between migration and ballooning · 9795914c
      Minchan Kim authored
      commit 21ea9fb6 upstream.
      
      In balloon_page_dequeue, pages_lock should cover the loop
      (ie, list_for_each_entry_safe). Otherwise, the cursor page could
      be isolated by compaction and then list_del by isolation could
      poison the page->lru.{prev,next} so the loop finally could
      access wrong address like this. This patch fixes the bug.
      
      general protection fault: 0000 [#1] SMP
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 2 PID: 82 Comm: vballoon Not tainted 4.4.0-rc5-mm1-access_bit+ #1906
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff8800a7ff0000 ti: ffff8800a7fec000 task.ti: ffff8800a7fec000
      RIP: 0010:[<ffffffff8115e754>]  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
      RSP: 0018:ffff8800a7fefdc0  EFLAGS: 00010246
      RAX: ffff88013fff9a70 RBX: ffffea000056fe00 RCX: 0000000000002b7d
      RDX: ffff88013fff9a70 RSI: ffffea000056fe00 RDI: ffff88013fff9a68
      RBP: ffff8800a7fefde8 R08: ffffea000056fda0 R09: 0000000000000000
      R10: ffff8800a7fefd90 R11: 0000000000000001 R12: dead0000000000e0
      R13: ffffea000056fe20 R14: ffff880138809070 R15: ffff880138809060
      FS:  0000000000000000(0000) GS:ffff88013fc40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00007f229c10e000 CR3: 00000000b8b53000 CR4: 00000000000006a0
      Stack:
       0000000000000100 ffff880138809088 ffff880138809000 ffff880138809060
       0000000000000046 ffff8800a7fefe28 ffffffff812c86d3 ffff880138809020
       ffff880138809000 fffffffffff91900 0000000000000100 ffff880138809060
      Call Trace:
       [<ffffffff812c86d3>] leak_balloon+0x93/0x1a0
       [<ffffffff812c8bc7>] balloon+0x217/0x2a0
       [<ffffffff8143739e>] ? __schedule+0x31e/0x8b0
       [<ffffffff81078160>] ? abort_exclusive_wait+0xb0/0xb0
       [<ffffffff812c89b0>] ? update_balloon_stats+0xf0/0xf0
       [<ffffffff8105b6e9>] kthread+0xc9/0xe0
       [<ffffffff8105b620>] ? kthread_park+0x60/0x60
       [<ffffffff8143b4af>] ret_from_fork+0x3f/0x70
       [<ffffffff8105b620>] ? kthread_park+0x60/0x60
      Code: 8d 60 e0 0f 84 af 00 00 00 48 8b 43 20 a8 01 75 3b 48 89 d8 f0 0f ba 28 00 72 10 48 8b 03 f6 c4 08 75 2f 48 89 df e8 8c 83 f9 ff <49> 8b 44 24 20 4d 8d 6c 24 20 48 83 e8 20 4d 39 f5 74 7a 4c 89
      RIP  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
       RSP <ffff8800a7fefdc0>
      ---[ end trace 43cf28060d708d5f ]---
      Kernel panic - not syncing: Fatal exception
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Kernel Offset: disabled
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarRafael Aquini <aquini@redhat.com>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      9795914c
    • Minchan Kim's avatar
      virtio_balloon: fix race by fill and leak · 9f0f4835
      Minchan Kim authored
      commit f68b992b upstream.
      
      During my compaction-related stuff, I encountered a bug
      with ballooning.
      
      With repeated inflating and deflating cycle, guest memory(
      ie, cat /proc/meminfo | grep MemTotal) is decreased and
      couldn't be recovered.
      
      The reason is balloon_lock doesn't cover release_pages_balloon
      so struct virtio_balloon fields could be overwritten by race
      of fill_balloon(e,g, vb->*pfns could be critical).
      
      This patch fixes it in my test.
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      9f0f4835
    • Takashi Iwai's avatar
      ALSA: seq: Fix race at timer setup and close · 712df95d
      Takashi Iwai authored
      commit 3567eb6a upstream.
      
      ALSA sequencer code has an open race between the timer setup ioctl and
      the close of the client.  This was triggered by syzkaller fuzzer, and
      a use-after-free was caught there as a result.
      
      This patch papers over it by adding a proper queue->timer_mutex lock
      around the timer-related calls in the relevant code path.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      712df95d
    • Takashi Iwai's avatar
      ALSA: seq: Fix missing NULL check at remove_events ioctl · f9d70229
      Takashi Iwai authored
      commit 030e2c78 upstream.
      
      snd_seq_ioctl_remove_events() calls snd_seq_fifo_clear()
      unconditionally even if there is no FIFO assigned, and this leads to
      an Oops due to NULL dereference.  The fix is just to add a proper NULL
      check.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      f9d70229
    • Mario Kleiner's avatar
      x86/reboot/quirks: Add iMac10,1 to pci_reboot_dmi_table[] · bba25b2c
      Mario Kleiner authored
      commit 2f0c0b2d upstream.
      
      Without the reboot=pci method, the iMac 10,1 simply
      hangs after printing "Restarting system" at the point
      when it should reboot. This fixes it.
      Signed-off-by: default avatarMario Kleiner <mario.kleiner.de@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1450466646-26663-1-git-send-email-mario.kleiner.de@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      bba25b2c
    • Benjamin Tissoires's avatar
      Input: elantech - mark protocols v2 and v3 as semi-mt · ed3e56dc
      Benjamin Tissoires authored
      commit 6544a1df upstream.
      
      When using a protocol v2 or v3 hardware, elantech uses the function
      elantech_report_semi_mt_data() to report data. This devices are rather
      creepy because if num_finger is 3, (x2,y2) is (0,0). Yes, only one valid
      touch is reported.
      
      Anyway, userspace (libinput) is now confused by these (0,0) touches,
      and detect them as palm, and rejects them.
      
      Commit 3c0213d1 ("Input: elantech - fix semi-mt protocol for v3 HW")
      was sufficient enough for xf86-input-synaptics and libinput before it has
      palm rejection. Now we need to actually tell libinput that this device is
      a semi-mt one and it should not rely on the actual values of the 2 touches.
      Signed-off-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      ed3e56dc
    • Roman Volkov's avatar
      clocksource/drivers/vt8500: Increase the minimum delta · bb039541
      Roman Volkov authored
      commit f9eccf24 upstream.
      
      The vt8500 clocksource driver declares itself as capable to handle the
      minimum delay of 4 cycles by passing the value into
      clockevents_config_and_register(). The vt8500_timer_set_next_event()
      requires the passed cycles value to be at least 16. The impact is that
      userspace hangs in nanosleep() calls with small delay intervals.
      
      This problem is reproducible in Linux 4.2 starting from:
      c6eb3f70 ('hrtimer: Get rid of hrtimer softirq')
      
      From Russell King, more detailed explanation:
      
      "It's a speciality of the StrongARM/PXA hardware. It takes a certain
      number of OSCR cycles for the value written to hit the compare registers.
      So, if a very small delta is written (eg, the compare register is written
      with a value of OSCR + 1), the OSCR will have incremented past this value
      before it hits the underlying hardware. The result is, that you end up
      waiting a very long time for the OSCR to wrap before the event fires.
      
      So, we introduce a check in set_next_event() to detect this and return
      -ETIME if the calculated delta is too small, which causes the generic
      clockevents code to retry after adding the min_delta specified in
      clockevents_config_and_register() to the current time value.
      
      min_delta must be sufficient that we don't re-trip the -ETIME check - if
      we do, we will return -ETIME, forward the next event time, try to set it,
      return -ETIME again, and basically lock the system up. So, min_delta
      must be larger than the check inside set_next_event(). A factor of two
      was chosen to ensure that this situation would never occur.
      
      The PXA code worked on PXA systems for years, and I'd suggest no one
      changes this mechanism without access to a wide range of PXA systems,
      otherwise they're risking breakage."
      
      Cc: Russell King <linux@arm.linux.org.uk>
      Acked-by: default avatarAlexey Charkov <alchark@gmail.com>
      Signed-off-by: default avatarRoman Volkov <rvolkov@v1ros.org>
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      bb039541
    • Dave Chinner's avatar
      xfs: handle dquot buffer readahead in log recovery correctly · e4eb7061
      Dave Chinner authored
      commit 7d6a13f0 upstream.
      
      When we do dquot readahead in log recovery, we do not use a verifier
      as the underlying buffer may not have dquots in it. e.g. the
      allocation operation hasn't yet been replayed. Hence we do not want
      to fail recovery because we detect an operation to be replayed has
      not been run yet. This problem was addressed for inodes in commit
      d8914002 ("xfs: inode buffers may not be valid during recovery
      readahead") but the problem was not recognised to exist for dquots
      and their buffers as the dquot readahead did not have a verifier.
      
      The result of not using a verifier is that when the buffer is then
      next read to replay a dquot modification, the dquot buffer verifier
      will only be attached to the buffer if *readahead is not complete*.
      Hence we can read the buffer, replay the dquot changes and then add
      it to the delwri submission list without it having a verifier
      attached to it. This then generates warnings in xfs_buf_ioapply(),
      which catches and warns about this case.
      
      Fix this and make it handle the same readahead verifier error cases
      as for inode buffers by adding a new readahead verifier that has a
      write operation as well as a read operation that marks the buffer as
      not done if any corruption is detected.  Also make sure we don't run
      readahead if the dquot buffer has been marked as cancelled by
      recovery.
      
      This will result in readahead either succeeding and the buffer
      having a valid write verifier, or readahead failing and the buffer
      state requiring the subsequent read to resubmit the IO with the new
      verifier.  In either case, this will result in the buffer always
      ending up with a valid write verifier on it.
      
      Note: we also need to fix the inode buffer readahead error handling
      to mark the buffer with EIO. Brian noticed the code I copied from
      there wrong during review, so fix it at the same time. Add comments
      linking the two functions that handle readahead verifier errors
      together so we don't forget this behavioural link in future.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      [ luis: backported to 3.16:
        - struct xfs_buf_ops does not have a 'name' field in 3.16
        - adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      e4eb7061
    • Dave Chinner's avatar
      xfs: inode recovery readahead can race with inode buffer creation · 15814082
      Dave Chinner authored
      commit b79f4a1c upstream.
      
      When we do inode readahead in log recovery, we do can do the
      readahead before we've replayed the icreate transaction that stamps
      the buffer with inode cores. The inode readahead verifier catches
      this and marks the buffer as !done to indicate that it doesn't yet
      contain valid inodes.
      
      In adding buffer error notification  (i.e. setting b_error = -EIO at
      the same time as as we clear the done flag) to such a readahead
      verifier failure, we can then get subsequent inode recovery failing
      with this error:
      
      XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32
      
      This occurs when readahead completion races with icreate item replay
      such as:
      
      	inode readahead
      		find buffer
      		lock buffer
      		submit RA io
      	....
      	icreate recovery
      	    xfs_trans_get_buffer
      		find buffer
      		lock buffer
      		<blocks on RA completion>
      	.....
      	<ra completion>
      		fails verifier
      		clear XBF_DONE
      		set bp->b_error = -EIO
      		release and unlock buffer
      	<icreate gains lock>
      	icreate initialises buffer
      	marks buffer as done
      	adds buffer to delayed write queue
      	releases buffer
      
      At this point, we have an initialised inode buffer that is up to
      date but has an -EIO state registered against it. When we finally
      get to recovering an inode in that buffer:
      
      	inode item recovery
      	    xfs_trans_read_buffer
      		find buffer
      		lock buffer
      		sees XBF_DONE is set, returns buffer
      	    sees bp->b_error is set
      		fail log recovery!
      
      Essentially, we need xfs_trans_get_buf_map() to clear the error status of
      the buffer when doing a lookup. This function returns uninitialised
      buffers, so the buffer returned can not be in an error state and
      none of the code that uses this function expects b_error to be set
      on return. Indeed, there is an ASSERT(!bp->b_error); in the
      transaction case in xfs_trans_get_buf_map() that would have caught
      this if log recovery used transactions....
      
      This patch firstly changes the inode readahead failure to set -EIO
      on the buffer, and secondly changes xfs_buf_get_map() to never
      return a buffer with an error state set so this first change doesn't
      cause unexpected log recovery failures.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      [ luis: backported to 3.16:
        - file rename: fs/xfs/libxfs/xfs_inode_buf.c -> fs/xfs/xfs_inode_buf.c
        - adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      15814082
    • Ard Biesheuvel's avatar
      s390: fix normalization bug in exception table sorting · 5dcdcf49
      Ard Biesheuvel authored
      commit bcb7825a upstream.
      
      The normalization pass in the sorting routine of the relative exception
      table serves two purposes:
      - it ensures that the address fields of the exception table entries are
        fully ordered, so that no ambiguities arise between entries with
        identical instruction offsets (i.e., when two instructions that are
        exactly 8 bytes apart each have an exception table entry associated with
        them)
      - it ensures that the offsets of both the instruction and the fixup fields
        of each entry are relative to their final location after sorting.
      
      Commit eb608fb3 ("s390/exceptions: switch to relative exception table
      entries") ported the relative exception table format from x86, but modified
      the sorting routine to only normalize the instruction offset field and not
      the fixup offset field. The result is that the fixup offset of each entry
      will be relative to the original location of the entry before sorting,
      likely leading to crashes when those entries are dereferenced.
      
      Fixes: eb608fb3 ("s390/exceptions: switch to relative exception table entries")
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      5dcdcf49
    • H.J. Lu's avatar
      x86/boot: Double BOOT_HEAP_SIZE to 64KB · 4b9cc54a
      H.J. Lu authored
      commit 8c31902c upstream.
      
      When decompressing kernel image during x86 bootup, malloc memory
      for ELF program headers may run out of heap space, which leads
      to system halt.  This patch doubles BOOT_HEAP_SIZE to 64KB.
      
      Tested with 32-bit kernel which failed to boot without this patch.
      Signed-off-by: default avatarH.J. Lu <hjl.tools@gmail.com>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      4b9cc54a
    • Andy Lutomirski's avatar
      x86/mm: Add barriers and document switch_mm()-vs-flush synchronization · bab48cc4
      Andy Lutomirski authored
      commit 71b3c126 upstream.
      
      When switch_mm() activates a new PGD, it also sets a bit that
      tells other CPUs that the PGD is in use so that TLB flush IPIs
      will be sent.  In order for that to work correctly, the bit
      needs to be visible prior to loading the PGD and therefore
      starting to fill the local TLB.
      
      Document all the barriers that make this work correctly and add
      a couple that were missing.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [ luis: backported to 3.16:
        - dropped N/A comment in flush_tlb_mm_range()
        - adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      bab48cc4
    • Ben Skeggs's avatar
      drm/nouveau/kms: take mode_config mutex in connector hotplug path · 86fa80c5
      Ben Skeggs authored
      commit 0a882cad upstream.
      
      fdo#93634
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      86fa80c5
    • Vegard Nossum's avatar
      uml: flush stdout before forking · 10563759
      Vegard Nossum authored
      commit 0754fb29 upstream.
      
      I was seeing some really weird behaviour where piping UML's output
      somewhere would cause output to get duplicated:
      
        $ ./vmlinux | head -n 40
        Checking that ptrace can change system call numbers...Core dump limits :
                soft - 0
                hard - NONE
        OK
        Checking syscall emulation patch for ptrace...Core dump limits :
                soft - 0
                hard - NONE
        OK
        Checking advanced syscall emulation patch for ptrace...Core dump limits :
                soft - 0
                hard - NONE
        OK
        Core dump limits :
                soft - 0
                hard - NONE
      
      This is because these tests do a fork() which duplicates the non-empty
      stdout buffer, then glibc flushes the duplicated buffer as each child
      exits.
      
      A simple workaround is to flush before forking.
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      10563759
    • Vegard Nossum's avatar
      uml: fix hostfs mknod() · c4a74527
      Vegard Nossum authored
      commit 9f2dfda2 upstream.
      
      An inverted return value check in hostfs_mknod() caused the function
      to return success after handling it as an error (and cleaning up).
      
      It resulted in the following segfault when trying to bind() a named
      unix socket:
      
        Pid: 198, comm: a.out Not tainted 4.4.0-rc4
        RIP: 0033:[<0000000061077df6>]
        RSP: 00000000daae5d60  EFLAGS: 00010202
        RAX: 0000000000000000 RBX: 000000006092a460 RCX: 00000000dfc54208
        RDX: 0000000061073ef1 RSI: 0000000000000070 RDI: 00000000e027d600
        RBP: 00000000daae5de0 R08: 00000000da980ac0 R09: 0000000000000000
        R10: 0000000000000003 R11: 00007fb1ae08f72a R12: 0000000000000000
        R13: 000000006092a460 R14: 00000000daaa97c0 R15: 00000000daaa9a88
        Kernel panic - not syncing: Kernel mode fault at addr 0x40, ip 0x61077df6
        CPU: 0 PID: 198 Comm: a.out Not tainted 4.4.0-rc4 #1
        Stack:
         e027d620 dfc54208 0000006f da981398
         61bee000 0000c1ed daae5de0 0000006e
         e027d620 dfcd4208 00000005 6092a460
        Call Trace:
         [<60dedc67>] SyS_bind+0xf7/0x110
         [<600587be>] handle_syscall+0x7e/0x80
         [<60066ad7>] userspace+0x3e7/0x4e0
         [<6006321f>] ? save_registers+0x1f/0x40
         [<6006c88e>] ? arch_prctl+0x1be/0x1f0
         [<60054985>] fork_handler+0x85/0x90
      
      Let's also get rid of the "cosmic ray protection" while we're at it.
      
      Fixes: e9193059 "hostfs: fix races in dentry_name() and inode_name()"
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      c4a74527
    • Mikulas Patocka's avatar
      dm snapshot: fix hung bios when copy error occurs · e33972a9
      Mikulas Patocka authored
      commit 385277bf upstream.
      
      When there is an error copying a chunk dm-snapshot can incorrectly hold
      associated bios indefinitely, resulting in hung IO.
      
      The function copy_callback sets pe->error if there was error copying the
      chunk, and then calls complete_exception.  complete_exception calls
      pending_complete on error, otherwise it calls commit_exception with
      commit_callback (and commit_callback calls complete_exception).
      
      The persistent exception store (dm-snap-persistent.c) assumes that calls
      to prepare_exception and commit_exception are paired.
      persistent_prepare_exception increases ps->pending_count and
      persistent_commit_exception decreases it.
      
      If there is a copy error, persistent_prepare_exception is called but
      persistent_commit_exception is not.  This results in the variable
      ps->pending_count never returning to zero and that causes some pending
      exceptions (and their associated bios) to be held forever.
      
      Fix this by unconditionally calling commit_exception regardless of
      whether the copy was successful.  A new "valid" parameter is added to
      commit_exception -- when the copy fails this parameter is set to zero so
      that the chunk that failed to copy (and all following chunks) is not
      recorded in the snapshot store.  Also, remove commit_callback now that
      it is merely a wrapper around pending_complete.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      e33972a9
    • Vinod Koul's avatar
      ASoC: compress: Fix compress device direction check · 93155d78
      Vinod Koul authored
      commit a1068045 upstream.
      
      The detection of direction for compress was only taking into account codec
      capabilities and not CPU ones. Fix this by checking the CPU side capabilities
      as well
      Tested-by: default avatarAshish Panwar <ashish.panwar@intel.com>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      93155d78
    • Jeff Layton's avatar
      locks: fix unlock when fcntl_setlk races with a close · 993e1630
      Jeff Layton authored
      commit 7f3697e2 upstream.
      
      Dmitry reported that he was able to reproduce the WARN_ON_ONCE that
      fires in locks_free_lock_context when the flc_posix list isn't empty.
      
      The problem turns out to be that we're basically rebuilding the
      file_lock from scratch in fcntl_setlk when we discover that the setlk
      has raced with a close. If the l_whence field is SEEK_CUR or SEEK_END,
      then we may end up with fl_start and fl_end values that differ from
      when the lock was initially set, if the file position or length of the
      file has changed in the interim.
      
      Fix this by just reusing the same lock request structure, and simply
      override fl_type value with F_UNLCK as appropriate. That ensures that
      we really are unlocking the lock that was initially set.
      
      While we're there, make sure that we do pop a WARN_ON_ONCE if the
      removal ever fails. Also return -EBADF in this event, since that's
      what we would have returned if the close had happened earlier.
      
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Fixes: c293621b (stale POSIX lock handling)
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Acked-by: default avatar"J. Bruce Fields" <bfields@fieldses.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      993e1630