1. 10 Feb, 2016 40 commits
    • Herbert Xu's avatar
      crypto: af_alg - Forbid bind(2) when nokey child sockets are present · bcfa8411
      Herbert Xu authored
      [ Upstream commit a6a48c56 ]
      
      This patch forbids the calling of bind(2) when there are child
      sockets created by accept(2) in existence, even if they are created
      on the nokey path.
      
      This is needed as those child sockets have references to the tfm
      object which bind(2) will destroy.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      bcfa8411
    • Herbert Xu's avatar
      crypto: algif_hash - Remove custom release parent function · 96507c59
      Herbert Xu authored
      [ Upstream commit f1d84af1 ]
      
      This patch removes the custom release parent function as the
      generic af_alg_release_parent now works for nokey sockets too.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      96507c59
    • Herbert Xu's avatar
      crypto: af_alg - Allow af_af_alg_release_parent to be called on nokey path · eb1ab004
      Herbert Xu authored
      [ Upstream commit 6a935170 ]
      
      This patch allows af_alg_release_parent to be called even for
      nokey sockets.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      eb1ab004
    • Herbert Xu's avatar
      crypto: algif_hash - Require setkey before accept(2) · cec8983e
      Herbert Xu authored
      [ Upstream commit 6de62f15 ]
      
      Hash implementations that require a key may crash if you use
      them without setting a key.  This patch adds the necessary checks
      so that if you do attempt to use them without a key that we return
      -ENOKEY instead of proceeding.
      
      This patch also adds a compatibility path to support old applications
      that do acept(2) before setkey.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      cec8983e
    • Herbert Xu's avatar
      crypto: hash - Add crypto_ahash_has_setkey · 8c781796
      Herbert Xu authored
      [ Upstream commit a5596d63 ]
      
      This patch adds a way for ahash users to determine whether a key
      is required by a crypto_ahash transform.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8c781796
    • Alexander Aring's avatar
      mac802154: fix typo IEEE802515 to IEEE802154 · cfc8c439
      Alexander Aring authored
      [ Upstream commit 57205c14 ]
      
      This patch fixs a typo in address filter defines from IEEE802515 to
      IEEE802154.
      Signed-off-by: default avatarAlexander Aring <alex.aring@gmail.com>
      Cc: Alan Ott <alan@signal11.us>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      cfc8c439
    • Herbert Xu's avatar
      crypto: af_alg - Add nokey compatibility path · 3db68eba
      Herbert Xu authored
      [ Upstream commit 37766586 ]
      
      This patch adds a compatibility path to support old applications
      that do acept(2) before setkey.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3db68eba
    • Herbert Xu's avatar
      crypto: af_alg - Fix socket double-free when accept fails · ac9c75f8
      Herbert Xu authored
      [ Upstream commit a383292c ]
      
      When we fail an accept(2) call we will end up freeing the socket
      twice, once due to the direct sk_free call and once again through
      newsock.
      
      This patch fixes this by removing the sk_free call.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ac9c75f8
    • Herbert Xu's avatar
      crypto: af_alg - Disallow bind/setkey/... after accept(2) · f857638d
      Herbert Xu authored
      [ Upstream commit c840ac6a ]
      
      Each af_alg parent socket obtained by socket(2) corresponds to a
      tfm object once bind(2) has succeeded.  An accept(2) call on that
      parent socket creates a context which then uses the tfm object.
      
      Therefore as long as any child sockets created by accept(2) exist
      the parent socket must not be modified or freed.
      
      This patch guarantees this by using locks and a reference count
      on the parent socket.  Any attempt to modify the parent socket will
      fail with EBUSY.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f857638d
    • Tejun Heo's avatar
      printk: do cond_resched() between lines while outputting to consoles · 9044efa9
      Tejun Heo authored
      [ Upstream commit 8d91f8b1 ]
      
      @console_may_schedule tracks whether console_sem was acquired through
      lock or trylock.  If the former, we're inside a sleepable context and
      console_conditional_schedule() performs cond_resched().  This allows
      console drivers which use console_lock for synchronization to yield
      while performing time-consuming operations such as scrolling.
      
      However, the actual console outputting is performed while holding
      irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
      before starting outputting lines.  Also, only a few drivers call
      console_conditional_schedule() to begin with.  This means that when a
      lot of lines need to be output by console_unlock(), for example on a
      console registration, the task doing console_unlock() may not yield for
      a long time on a non-preemptible kernel.
      
      If this happens with a slow console devices, for example a serial
      console, the outputting task may occupy the cpu for a very long time.
      Long enough to trigger softlockup and/or RCU stall warnings, which in
      turn pile more messages, sometimes enough to trigger the next cycle of
      warnings incapacitating the system.
      
      Fix it by making console_unlock() insert cond_resched() between lines if
      @console_may_schedule.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarCalvin Owens <calvinowens@fb.com>
      Acked-by: default avatarJan Kara <jack@suse.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Kyle McMartin <kyle@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9044efa9
    • Vitaly Kuznetsov's avatar
      kernel/panic.c: turn off locks debug before releasing console lock · 6db3dd4b
      Vitaly Kuznetsov authored
      [ Upstream commit 7625b3a0 ]
      
      Commit 08d78658 ("panic: release stale console lock to always get the
      logbuf printed out") introduced an unwanted bad unlock balance report when
      panic() is called directly and not from OOPS (e.g.  from out_of_memory()).
      The difference is that in case of OOPS we disable locks debug in
      oops_enter() and on direct panic call nobody does that.
      
      Fixes: 08d78658 ("panic: release stale console lock to always get the logbuf printed out")
      Reported-by: default avatarkernel test robot <ying.huang@linux.intel.com>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6db3dd4b
    • Vitaly Kuznetsov's avatar
      panic: release stale console lock to always get the logbuf printed out · e3191b65
      Vitaly Kuznetsov authored
      [ Upstream commit 08d78658 ]
      
      In some cases we may end up killing the CPU holding the console lock
      while still having valuable data in logbuf. E.g. I'm observing the
      following:
      
      - A crash is happening on one CPU and console_unlock() is being called on
        some other.
      
      - console_unlock() tries to print out the buffer before releasing the lock
        and on slow console it takes time.
      
      - in the meanwhile crashing CPU does lots of printk()-s with valuable data
        (which go to the logbuf) and sends IPIs to all other CPUs.
      
      - console_unlock() finishes printing previous chunk and enables interrupts
        before trying to print out the rest, the CPU catches the IPI and never
        releases console lock.
      
      This is not the only possible case: in VT/fb subsystems we have many other
      console_lock()/console_unlock() users.  Non-masked interrupts (or
      receiving NMI in case of extreme slowness) will have the same result.
      Getting the whole console buffer printed out on crash should be top
      priority.
      
      [akpm@linux-foundation.org: tweak comment text]
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e3191b65
    • Martijn Coenen's avatar
      memcg: only free spare array when readers are done · ab0d430c
      Martijn Coenen authored
      [ Upstream commit 6611d8d7 ]
      
      A spare array holding mem cgroup threshold events is kept around to make
      sure we can always safely deregister an event and have an array to store
      the new set of events in.
      
      In the scenario where we're going from 1 to 0 registered events, the
      pointer to the primary array containing 1 event is copied to the spare
      slot, and then the spare slot is freed because no events are left.
      However, it is freed before calling synchronize_rcu(), which means
      readers may still be accessing threshold->primary after it is freed.
      
      Fixed by only freeing after synchronize_rcu().
      Signed-off-by: default avatarMartijn Coenen <maco@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ab0d430c
    • Naoya Horiguchi's avatar
      mm: soft-offline: check return value in second __get_any_page() call · a30cd72c
      Naoya Horiguchi authored
      [ Upstream commit d96b339f ]
      
      I saw the following BUG_ON triggered in a testcase where a process calls
      madvise(MADV_SOFT_OFFLINE) on thps, along with a background process that
      calls migratepages command repeatedly (doing ping-pong among different
      NUMA nodes) for the first process:
      
         Soft offlining page 0x60000 at 0x700000600000
         __get_any_page: 0x60000 free buddy page
         page:ffffea0001800000 count:0 mapcount:-127 mapping:          (null) index:0x1
         flags: 0x1fffc0000000000()
         page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0)
         ------------[ cut here ]------------
         kernel BUG at /src/linux-dev/include/linux/mm.h:342!
         invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
         Modules linked in: cfg80211 rfkill crc32c_intel serio_raw virtio_balloon i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi
         CPU: 3 PID: 3035 Comm: test_alloc_gene Tainted: G           O    4.4.0-rc8-v4.4-rc8-160107-1501-00000-rc8+ #74
         Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
         task: ffff88007c63d5c0 ti: ffff88007c210000 task.ti: ffff88007c210000
         RIP: 0010:[<ffffffff8118998c>]  [<ffffffff8118998c>] put_page+0x5c/0x60
         RSP: 0018:ffff88007c213e00  EFLAGS: 00010246
         Call Trace:
           put_hwpoison_page+0x4e/0x80
           soft_offline_page+0x501/0x520
           SyS_madvise+0x6bc/0x6f0
           entry_SYSCALL_64_fastpath+0x12/0x6a
         Code: 8b fc ff ff 5b 5d c3 48 89 df e8 b0 fa ff ff 48 89 df 31 f6 e8 c6 7d ff ff 5b 5d c3 48 c7 c6 08 54 a2 81 48 89 df e8 a4 c5 01 00 <0f> 0b 66 90 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 8b 47
         RIP  [<ffffffff8118998c>] put_page+0x5c/0x60
          RSP <ffff88007c213e00>
      
      The root cause resides in get_any_page() which retries to get a refcount
      of the page to be soft-offlined.  This function calls
      put_hwpoison_page(), expecting that the target page is putback to LRU
      list.  But it can be also freed to buddy.  So the second check need to
      care about such case.
      
      Fixes: af8fae7c ("mm/memory-failure.c: clean up soft_offline_page()")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: <stable@vger.kernel.org>	[3.9+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a30cd72c
    • Kyeongdon Kim's avatar
      zram: try vmalloc() after kmalloc() · ce516bcf
      Kyeongdon Kim authored
      [ Upstream commit d913897a ]
      
      When we're using LZ4 multi compression streams for zram swap, we found
      out page allocation failure message in system running test.  That was
      not only once, but a few(2 - 5 times per test).  Also, some failure
      cases were continually occurring to try allocation order 3.
      
      In order to make parallel compression private data, we should call
      kzalloc() with order 2/3 in runtime(lzo/lz4).  But if there is no order
      2/3 size memory to allocate in that time, page allocation fails.  This
      patch makes to use vmalloc() as fallback of kmalloc(), this prevents
      page alloc failure warning.
      
      After using this, we never found warning message in running test, also
      It could reduce process startup latency about 60-120ms in each case.
      
      For reference a call trace :
      
          Binder_1: page allocation failure: order:3, mode:0x10c0d0
          CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty #20
          Call trace:
            dump_backtrace+0x0/0x270
            show_stack+0x10/0x1c
            dump_stack+0x1c/0x28
            warn_alloc_failed+0xfc/0x11c
            __alloc_pages_nodemask+0x724/0x7f0
            __get_free_pages+0x14/0x5c
            kmalloc_order_trace+0x38/0xd8
            zcomp_lz4_create+0x2c/0x38
            zcomp_strm_alloc+0x34/0x78
            zcomp_strm_multi_find+0x124/0x1ec
            zcomp_strm_find+0xc/0x18
            zram_bvec_rw+0x2fc/0x780
            zram_make_request+0x25c/0x2d4
            generic_make_request+0x80/0xbc
            submit_bio+0xa4/0x15c
            __swap_writepage+0x218/0x230
            swap_writepage+0x3c/0x4c
            shrink_page_list+0x51c/0x8d0
            shrink_inactive_list+0x3f8/0x60c
            shrink_lruvec+0x33c/0x4cc
            shrink_zone+0x3c/0x100
            try_to_free_pages+0x2b8/0x54c
            __alloc_pages_nodemask+0x514/0x7f0
            __get_free_pages+0x14/0x5c
            proc_info_read+0x50/0xe4
            vfs_read+0xa0/0x12c
            SyS_read+0x44/0x74
          DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
               0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB
      
      [minchan@kernel.org: change vmalloc gfp and adding comment about gfp]
      [sergey.senozhatsky@gmail.com: tweak comments and styles]
      Signed-off-by: default avatarKyeongdon Kim <kyeongdon.kim@lge.com>
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ce516bcf
    • Sergey Senozhatsky's avatar
      zram/zcomp: use GFP_NOIO to allocate streams · 40942069
      Sergey Senozhatsky authored
      [ Upstream commit 3d5fe03a ]
      
      We can end up allocating a new compression stream with GFP_KERNEL from
      within the IO path, which may result is nested (recursive) IO
      operations.  That can introduce problems if the IO path in question is a
      reclaimer, holding some locks that will deadlock nested IOs.
      
      Allocate streams and working memory using GFP_NOIO flag, forbidding
      recursive IO and FS operations.
      
      An example:
      
        inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
        git/20158 [HC0[0]:SC0[0]:HE1:SE1] takes:
         (jbd2_handle){+.+.?.}, at:  start_this_handle+0x4ca/0x555
        {IN-RECLAIM_FS-W} state was registered at:
           __lock_acquire+0x8da/0x117b
           lock_acquire+0x10c/0x1a7
           start_this_handle+0x52d/0x555
           jbd2__journal_start+0xb4/0x237
           __ext4_journal_start_sb+0x108/0x17e
           ext4_dirty_inode+0x32/0x61
           __mark_inode_dirty+0x16b/0x60c
           iput+0x11e/0x274
           __dentry_kill+0x148/0x1b8
           shrink_dentry_list+0x274/0x44a
           prune_dcache_sb+0x4a/0x55
           super_cache_scan+0xfc/0x176
           shrink_slab.part.14.constprop.25+0x2a2/0x4d3
           shrink_zone+0x74/0x140
           kswapd+0x6b7/0x930
           kthread+0x107/0x10f
           ret_from_fork+0x3f/0x70
        irq event stamp: 138297
        hardirqs last  enabled at (138297):  debug_check_no_locks_freed+0x113/0x12f
        hardirqs last disabled at (138296):  debug_check_no_locks_freed+0x33/0x12f
        softirqs last  enabled at (137818):  __do_softirq+0x2d3/0x3e9
        softirqs last disabled at (137813):  irq_exit+0x41/0x95
      
                     other info that might help us debug this:
         Possible unsafe locking scenario:
               CPU0
               ----
          lock(jbd2_handle);
          <Interrupt>
            lock(jbd2_handle);
      
                      *** DEADLOCK ***
        5 locks held by git/20158:
         #0:  (sb_writers#7){.+.+.+}, at: [<ffffffff81155411>] mnt_want_write+0x24/0x4b
         #1:  (&type->i_mutex_dir_key#2/1){+.+.+.}, at: [<ffffffff81145087>] lock_rename+0xd9/0xe3
         #2:  (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [<ffffffff8114f8e2>] lock_two_nondirectories+0x3f/0x6b
         #3:  (&sb->s_type->i_mutex_key#11/4){+.+.+.}, at: [<ffffffff8114f909>] lock_two_nondirectories+0x66/0x6b
         #4:  (jbd2_handle){+.+.?.}, at: [<ffffffff811e31db>] start_this_handle+0x4ca/0x555
      
                     stack backtrace:
        CPU: 2 PID: 20158 Comm: git Not tainted 4.1.0-rc7-next-20150615-dbg-00016-g8bdf555-dirty #211
        Call Trace:
          dump_stack+0x4c/0x6e
          mark_lock+0x384/0x56d
          mark_held_locks+0x5f/0x76
          lockdep_trace_alloc+0xb2/0xb5
          kmem_cache_alloc_trace+0x32/0x1e2
          zcomp_strm_alloc+0x25/0x73 [zram]
          zcomp_strm_multi_find+0xe7/0x173 [zram]
          zcomp_strm_find+0xc/0xe [zram]
          zram_bvec_rw+0x2ca/0x7e0 [zram]
          zram_make_request+0x1fa/0x301 [zram]
          generic_make_request+0x9c/0xdb
          submit_bio+0xf7/0x120
          ext4_io_submit+0x2e/0x43
          ext4_bio_write_page+0x1b7/0x300
          mpage_submit_page+0x60/0x77
          mpage_map_and_submit_buffers+0x10f/0x21d
          ext4_writepages+0xc8c/0xe1b
          do_writepages+0x23/0x2c
          __filemap_fdatawrite_range+0x84/0x8b
          filemap_flush+0x1c/0x1e
          ext4_alloc_da_blocks+0xb8/0x117
          ext4_rename+0x132/0x6dc
          ? mark_held_locks+0x5f/0x76
          ext4_rename2+0x29/0x2b
          vfs_rename+0x540/0x636
          SyS_renameat2+0x359/0x44d
          SyS_rename+0x1e/0x20
          entry_SYSCALL_64_fastpath+0x12/0x6f
      
      [minchan@kernel.org: add stable mark]
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Kyeongdon Kim <kyeongdon.kim@lge.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      40942069
    • xuejiufei's avatar
      ocfs2/dlm: ignore cleaning the migration mle that is inuse · 4fd204b4
      xuejiufei authored
      [ Upstream commit bef5502d ]
      
      We have found that migration source will trigger a BUG that the refcount
      of mle is already zero before put when the target is down during
      migration.  The situation is as follows:
      
      dlm_migrate_lockres
        dlm_add_migration_mle
        dlm_mark_lockres_migrating
        dlm_get_mle_inuse
        <<<<<< Now the refcount of the mle is 2.
        dlm_send_one_lockres and wait for the target to become the
        new master.
        <<<<<< o2hb detect the target down and clean the migration
        mle. Now the refcount is 1.
      
      dlm_migrate_lockres woken, and put the mle twice when found the target
      goes down which trigger the BUG with the following message:
      
        "ERROR: bad mle: ".
      Signed-off-by: default avatarJiufei Xue <xuejiufei@huawei.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4fd204b4
    • Sergey Senozhatsky's avatar
      scripts/bloat-o-meter: fix python3 syntax error · e560fd7f
      Sergey Senozhatsky authored
      [ Upstream commit 72214a24 ]
      
      In Python3+ print is a function so the old syntax is not correct
      anymore:
      
        $ ./scripts/bloat-o-meter vmlinux.o vmlinux.o.old
          File "./scripts/bloat-o-meter", line 61
            print "add/remove: %s/%s grow/shrink: %s/%s up/down: %s/%s (%s)" % \
                                                                           ^
        SyntaxError: invalid syntax
      
      Fix by calling print as a function.
      
      Tested on python 2.7.11, 3.5.1
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e560fd7f
    • Laura Abbott's avatar
      dma-debug: switch check from _text to _stext · 7c02e800
      Laura Abbott authored
      [ Upstream commit ea535e41 ]
      
      In include/asm-generic/sections.h:
      
        /*
         * Usage guidelines:
         * _text, _data: architecture specific, don't use them in
         * arch-independent code
         * [_stext, _etext]: contains .text.* sections, may also contain
         * .rodata.*
         *                   and/or .init.* sections
      
      _text is not guaranteed across architectures.  Architectures such as ARM
      may reuse parts which are not actually text and erroneously trigger a bug.
      Switch to using _stext which is guaranteed to contain text sections.
      
      Came out of https://lkml.kernel.org/g/<567B1176.4000106@redhat.com>
      Signed-off-by: default avatarLaura Abbott <labbott@fedoraproject.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7c02e800
    • Sudip Mukherjee's avatar
      m32r: fix m32104ut_defconfig build fail · 795ea8b4
      Sudip Mukherjee authored
      [ Upstream commit 601f1db6 ]
      
      The build of m32104ut_defconfig for m32r arch was failing for long long
      time with the error:
      
        ERROR: "memory_start" [fs/udf/udf.ko] undefined!
        ERROR: "memory_end" [fs/udf/udf.ko] undefined!
        ERROR: "memory_end" [drivers/scsi/sg.ko] undefined!
        ERROR: "memory_start" [drivers/scsi/sg.ko] undefined!
        ERROR: "memory_end" [drivers/i2c/i2c-dev.ko] undefined!
        ERROR: "memory_start" [drivers/i2c/i2c-dev.ko] undefined!
      
      As done in other architectures export the symbols to fix the error.
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarSudip Mukherjee <sudip@vectorindia.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      795ea8b4
    • Vasily Averin's avatar
      cifs_dbg() outputs an uninitialized buffer in cifs_readdir() · f04fed1a
      Vasily Averin authored
      [ Upstream commit 01b9b0b2 ]
      
      In some cases tmp_bug can be not filled in cifs_filldir and stay uninitialized,
      therefore its printk with "%s" modifier can leak content of kernelspace memory.
      If old content of this buffer does not contain '\0' access bejond end of
      allocated object can crash the host.
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarSteve French <sfrench@localhost.localdomain>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f04fed1a
    • Rabin Vincent's avatar
      cifs: fix race between call_async() and reconnect() · 6b05c541
      Rabin Vincent authored
      [ Upstream commit 820962dc ]
      
      cifs_call_async() queues the MID to the pending list and calls
      smb_send_rqst().  If smb_send_rqst() performs a partial send, it sets
      the tcpStatus to CifsNeedReconnect and returns an error code to
      cifs_call_async().  In this case, cifs_call_async() removes the MID
      from the list and returns to the caller.
      
      However, cifs_call_async() releases the server mutex _before_ removing
      the MID.  This means that a cifs_reconnect() can race with this function
      and manage to remove the MID from the list and delete the entry before
      cifs_call_async() calls cifs_delete_mid().  This leads to various
      crashes due to the use after free in cifs_delete_mid().
      
      Task1				Task2
      
      cifs_call_async():
       - rc = -EAGAIN
       - mutex_unlock(srv_mutex)
      
      				cifs_reconnect():
      				 - mutex_lock(srv_mutex)
      				 - mutex_unlock(srv_mutex)
      				 - list_delete(mid)
      				 - mid->callback()
      				 	cifs_writev_callback():
      				 		- mutex_lock(srv_mutex)
      						- delete(mid)
      				 		- mutex_unlock(srv_mutex)
      
       - cifs_delete_mid(mid) <---- use after free
      
      Fix this by removing the MID in cifs_call_async() before releasing the
      srv_mutex.  Also hold the srv_mutex in cifs_reconnect() until the MIDs
      are moved out of the pending list.
      Signed-off-by: default avatarRabin Vincent <rabin.vincent@axis.com>
      Acked-by: default avatarShirish Pargaonkar <shirishpargaonkar@gmail.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: default avatarSteve French <sfrench@localhost.localdomain>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6b05c541
    • Jamie Bainbridge's avatar
      cifs: Ratelimit kernel log messages · d2c62a7f
      Jamie Bainbridge authored
      [ Upstream commit ec7147a9 ]
      
      Under some conditions, CIFS can repeatedly call the cifs_dbg() logging
      wrapper. If done rapidly enough, the console framebuffer can softlockup
      or "rcu_sched self-detected stall". Apply the built-in log ratelimiters
      to prevent such hangs.
      Signed-off-by: default avatarJamie Bainbridge <jamie.bainbridge@gmail.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d2c62a7f
    • Andy Shevchenko's avatar
      cifs: convert printk(LEVEL...) to pr_<level> · f8d91e02
      Andy Shevchenko authored
      [ Upstream commit 0b456f04 ]
      
      The useful macros embed message level in the name. Thus, it cleans up the code
      a bit. In cases when it was plain printk() the conversion was done to info
      level.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarSteve French <steve.french@primarydata.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f8d91e02
    • Andy Shevchenko's avatar
      cifs: convert to print_hex_dump() instead of custom implementation · c4785348
      Andy Shevchenko authored
      [ Upstream commit 55d83e0d ]
      
      This patch converts custom dumper to use native print_hex_dump() instead. The
      cifs_dump_mem() will have an offsets per each line which differs it from the
      original code.
      
      In the dump_smb() we may use native print_hex_dump() as well. It will show
      slightly different output in ASCII part when character is unprintable,
      otherwise it keeps same structure.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarSteve French <steve.french@primarydata.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      c4785348
    • Dmitry V. Levin's avatar
      sparc64: fix incorrect sign extension in sys_sparc64_personality · f0a1bc01
      Dmitry V. Levin authored
      [ Upstream commit 525fd5a9 ]
      
      The value returned by sys_personality has type "long int".
      It is saved to a variable of type "int", which is not a problem
      yet because the type of task_struct->pesonality is "unsigned int".
      The problem is the sign extension from "int" to "long int"
      that happens on return from sys_sparc64_personality.
      
      For example, a userspace call personality((unsigned) -EINVAL) will
      result to any subsequent personality call, including absolutely
      harmless read-only personality(0xffffffff) call, failing with
      errno set to EINVAL.
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f0a1bc01
    • Carlo Caione's avatar
      mmc: core: Enable tuning according to the actual timing · b5797fc9
      Carlo Caione authored
      [ Upstream commit e10c3219 ]
      
      While in sdhci_execute_tuning() the choice whether or not to enable the
      tuning is done on the actual timing, in the mmc_sdio_init_uhs_card() the
      check is done on the capability of the card.
      
      This difference is causing some issues with some SDIO cards in DDR50
      mode where the CDM19 is wrongly issued.
      
      With this patch we modify the check in both
      mmc_(sd|sdio)_init_uhs_card() functions to take the proper decision
      only according to the actual timing specification.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarCarlo Caione <carlo@endlessm.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b5797fc9
    • Weijun Yang's avatar
      mmc: core: enable CMD19 tuning for DDR50 mode · 36d2e814
      Weijun Yang authored
      [ Upstream commit 4324f6de ]
      
      As SD Specifications Part1 Physical Layer Specification Version
      3.01 says, CMD19 tuning is available for unlocked cards in transfer
      state of 1.8V signaling mode. The small difference between v3.00
      and 3.01 spec means that CMD19 tuning is also available for DDR50
      mode.
      Signed-off-by: default avatarWeijun Yang <york.yang@csr.com>
      Signed-off-by: default avatarBarry Song <Baohua.Song@csr.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      36d2e814
    • Adrian Hunter's avatar
      mmc: core: Simplify by adding mmc_execute_tuning() · ac26a195
      Adrian Hunter authored
      [ Upstream commit 63e415c6 ]
      
      For each MMC, SD and SDIO there is code that
      holds the clock, calls ops->execute_tuning, and
      releases the clock. Simplify the code a bit by
      providing a separate function to do that.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ac26a195
    • Andrew Gabbasov's avatar
      mmc: core: Fix error paths and messages in mmc_init_card · 025335d3
      Andrew Gabbasov authored
      [ Upstream commit 4b75bffc ]
      
      In mmc_init_card function some of the branches in error handling paths
      go to "err" label, which skips removing of newly allocated card structure,
      that will actually not be used. Fix that by using proper "free_card" label.
      
      Also, some messages in these branches are reported as warnings,
      although the operation processing is not continued. Change these
      messages to error level.
      Signed-off-by: default avatarAndrew Gabbasov <andrew_gabbasov@mentor.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      025335d3
    • Linus Walleij's avatar
      mmc: mmci: fix an ages old detection error · 164d37bd
      Linus Walleij authored
      [ Upstream commit 0bcb7efd ]
      
      commit 4956e109 ("ARM: 6244/1: mmci: add variant data and default
      MCICLOCK support") added variant data for ARM, U300 and Ux500 variants.
      The Nomadik NHK8815/8820 variant was erroneously labeled as a U300
      variant, and when the proper Nomadik variant was later introduced in
      commit 34fd4213 ("ARM: 7378/1: mmci: add support for the Nomadik MMCI
      variant") this was not fixes. Let's say this fixes the latter commit as
      there was no proper Nomadik support until then.
      
      Cc: stable@vger.kernel.org
      Fixes: 34fd4213 ("ARM: 7378/1: mmci: add support for the Nomadik...")
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      164d37bd
    • Mans Rullgard's avatar
      dmaengine: dw: fix cyclic transfer callbacks · e64af167
      Mans Rullgard authored
      [ Upstream commit 2895b2ca ]
      
      Cyclic transfer callbacks rely on block completion interrupts which were
      disabled in commit ff7b05f2 ("dmaengine/dw_dmac: Don't handle block
      interrupts").  This re-enables block interrupts so the cyclic callbacks
      can work.  Other transfer types are not affected as they set the INT_EN
      bit only on the last block.
      
      Fixes: ff7b05f2 ("dmaengine/dw_dmac: Don't handle block interrupts")
      Signed-off-by: default avatarMans Rullgard <mans@mansr.com>
      Reviewed-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e64af167
    • Mans Rullgard's avatar
      dmaengine: dw: fix cyclic transfer setup · 6e58c401
      Mans Rullgard authored
      [ Upstream commit df3bb8a0 ]
      
      Commit 61e183f8 ("dmaengine/dw_dmac: Reconfigure interrupt and
      chan_cfg register on resume") moved some channel initialisation to
      a new function which must be called before starting a transfer.
      
      This updates dw_dma_cyclic_start() to use dwc_dostart() like the other
      modes, thus ensuring dwc_initialize() gets called and removing some code
      duplication.
      
      Fixes: 61e183f8 ("dmaengine/dw_dmac: Reconfigure interrupt and chan_cfg register on resume")
      Signed-off-by: default avatarMans Rullgard <mans@mansr.com>
      Reviewed-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6e58c401
    • Jarkko Nikula's avatar
      dmaengine: dw: Make error prints unique. Part #1 · 013ee093
      Jarkko Nikula authored
      [ Upstream commit 550da64b ]
      
      The same error message is printed from different functions. Add a function
      name to error message in order to make it easier to grep error from sources.
      Signed-off-by: default avatarJarkko Nikula <jarkko.nikula@linux.intel.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      013ee093
    • Greg Kurz's avatar
      KVM: PPC: Fix ONE_REG AltiVec support · 9e775319
      Greg Kurz authored
      [ Upstream commit b4d7f161 ]
      
      The get and set operations got exchanged by mistake when moving the
      code from book3s.c to powerpc.c.
      
      Fixes: 3840edc8
      Cc: stable@vger.kernel.org # 3.18+
      Signed-off-by: default avatarGreg Kurz <gkurz@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9e775319
    • Helge Deller's avatar
      parisc: Fix __ARCH_SI_PREAMBLE_SIZE · a68d6a47
      Helge Deller authored
      [ Upstream commit e60fc5aa ]
      
      On a 64bit kernel build the compiler aligns the _sifields union in the
      struct siginfo_t on a 64bit address. The __ARCH_SI_PREAMBLE_SIZE define
      compensates for this alignment and thus fixes the wait testcase of the
      strace package.
      
      The symptoms of a wrong __ARCH_SI_PREAMBLE_SIZE value is that
      _sigchld.si_stime variable is missed to be copied and thus after a
      copy_siginfo() will have uninitialized values.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a68d6a47
    • Minchan Kim's avatar
      virtio_balloon: fix race between migration and ballooning · 43d64871
      Minchan Kim authored
      [ Upstream commit 21ea9fb6 ]
      
      In balloon_page_dequeue, pages_lock should cover the loop
      (ie, list_for_each_entry_safe). Otherwise, the cursor page could
      be isolated by compaction and then list_del by isolation could
      poison the page->lru.{prev,next} so the loop finally could
      access wrong address like this. This patch fixes the bug.
      
      general protection fault: 0000 [#1] SMP
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 2 PID: 82 Comm: vballoon Not tainted 4.4.0-rc5-mm1-access_bit+ #1906
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff8800a7ff0000 ti: ffff8800a7fec000 task.ti: ffff8800a7fec000
      RIP: 0010:[<ffffffff8115e754>]  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
      RSP: 0018:ffff8800a7fefdc0  EFLAGS: 00010246
      RAX: ffff88013fff9a70 RBX: ffffea000056fe00 RCX: 0000000000002b7d
      RDX: ffff88013fff9a70 RSI: ffffea000056fe00 RDI: ffff88013fff9a68
      RBP: ffff8800a7fefde8 R08: ffffea000056fda0 R09: 0000000000000000
      R10: ffff8800a7fefd90 R11: 0000000000000001 R12: dead0000000000e0
      R13: ffffea000056fe20 R14: ffff880138809070 R15: ffff880138809060
      FS:  0000000000000000(0000) GS:ffff88013fc40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00007f229c10e000 CR3: 00000000b8b53000 CR4: 00000000000006a0
      Stack:
       0000000000000100 ffff880138809088 ffff880138809000 ffff880138809060
       0000000000000046 ffff8800a7fefe28 ffffffff812c86d3 ffff880138809020
       ffff880138809000 fffffffffff91900 0000000000000100 ffff880138809060
      Call Trace:
       [<ffffffff812c86d3>] leak_balloon+0x93/0x1a0
       [<ffffffff812c8bc7>] balloon+0x217/0x2a0
       [<ffffffff8143739e>] ? __schedule+0x31e/0x8b0
       [<ffffffff81078160>] ? abort_exclusive_wait+0xb0/0xb0
       [<ffffffff812c89b0>] ? update_balloon_stats+0xf0/0xf0
       [<ffffffff8105b6e9>] kthread+0xc9/0xe0
       [<ffffffff8105b620>] ? kthread_park+0x60/0x60
       [<ffffffff8143b4af>] ret_from_fork+0x3f/0x70
       [<ffffffff8105b620>] ? kthread_park+0x60/0x60
      Code: 8d 60 e0 0f 84 af 00 00 00 48 8b 43 20 a8 01 75 3b 48 89 d8 f0 0f ba 28 00 72 10 48 8b 03 f6 c4 08 75 2f 48 89 df e8 8c 83 f9 ff <49> 8b 44 24 20 4d 8d 6c 24 20 48 83 e8 20 4d 39 f5 74 7a 4c 89
      RIP  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
       RSP <ffff8800a7fefdc0>
      ---[ end trace 43cf28060d708d5f ]---
      Kernel panic - not syncing: Fatal exception
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Kernel Offset: disabled
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarRafael Aquini <aquini@redhat.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      43d64871
    • Benjamin Tissoires's avatar
      Input: elantech - mark protocols v2 and v3 as semi-mt · 419c98a6
      Benjamin Tissoires authored
      [ Upstream commit 6544a1df ]
      
      When using a protocol v2 or v3 hardware, elantech uses the function
      elantech_report_semi_mt_data() to report data. This devices are rather
      creepy because if num_finger is 3, (x2,y2) is (0,0). Yes, only one valid
      touch is reported.
      
      Anyway, userspace (libinput) is now confused by these (0,0) touches,
      and detect them as palm, and rejects them.
      
      Commit 3c0213d1 ("Input: elantech - fix semi-mt protocol for v3 HW")
      was sufficient enough for xf86-input-synaptics and libinput before it has
      palm rejection. Now we need to actually tell libinput that this device is
      a semi-mt one and it should not rely on the actual values of the 2 touches.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      419c98a6
    • Dave Chinner's avatar
      xfs: handle dquot buffer readahead in log recovery correctly · 178f7ba7
      Dave Chinner authored
      [ Upstream commit 7d6a13f0 ]
      
      When we do dquot readahead in log recovery, we do not use a verifier
      as the underlying buffer may not have dquots in it. e.g. the
      allocation operation hasn't yet been replayed. Hence we do not want
      to fail recovery because we detect an operation to be replayed has
      not been run yet. This problem was addressed for inodes in commit
      d8914002 ("xfs: inode buffers may not be valid during recovery
      readahead") but the problem was not recognised to exist for dquots
      and their buffers as the dquot readahead did not have a verifier.
      
      The result of not using a verifier is that when the buffer is then
      next read to replay a dquot modification, the dquot buffer verifier
      will only be attached to the buffer if *readahead is not complete*.
      Hence we can read the buffer, replay the dquot changes and then add
      it to the delwri submission list without it having a verifier
      attached to it. This then generates warnings in xfs_buf_ioapply(),
      which catches and warns about this case.
      
      Fix this and make it handle the same readahead verifier error cases
      as for inode buffers by adding a new readahead verifier that has a
      write operation as well as a read operation that marks the buffer as
      not done if any corruption is detected.  Also make sure we don't run
      readahead if the dquot buffer has been marked as cancelled by
      recovery.
      
      This will result in readahead either succeeding and the buffer
      having a valid write verifier, or readahead failing and the buffer
      state requiring the subsequent read to resubmit the IO with the new
      verifier.  In either case, this will result in the buffer always
      ending up with a valid write verifier on it.
      
      Note: we also need to fix the inode buffer readahead error handling
      to mark the buffer with EIO. Brian noticed the code I copied from
      there wrong during review, so fix it at the same time. Add comments
      linking the two functions that handle readahead verifier errors
      together so we don't forget this behavioural link in future.
      
      cc: <stable@vger.kernel.org> # 3.12 - current
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      178f7ba7
    • Dave Chinner's avatar
      xfs: inode recovery readahead can race with inode buffer creation · cc824d71
      Dave Chinner authored
      [ Upstream commit b79f4a1c ]
      
      When we do inode readahead in log recovery, we do can do the
      readahead before we've replayed the icreate transaction that stamps
      the buffer with inode cores. The inode readahead verifier catches
      this and marks the buffer as !done to indicate that it doesn't yet
      contain valid inodes.
      
      In adding buffer error notification  (i.e. setting b_error = -EIO at
      the same time as as we clear the done flag) to such a readahead
      verifier failure, we can then get subsequent inode recovery failing
      with this error:
      
      XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32
      
      This occurs when readahead completion races with icreate item replay
      such as:
      
      	inode readahead
      		find buffer
      		lock buffer
      		submit RA io
      	....
      	icreate recovery
      	    xfs_trans_get_buffer
      		find buffer
      		lock buffer
      		<blocks on RA completion>
      	.....
      	<ra completion>
      		fails verifier
      		clear XBF_DONE
      		set bp->b_error = -EIO
      		release and unlock buffer
      	<icreate gains lock>
      	icreate initialises buffer
      	marks buffer as done
      	adds buffer to delayed write queue
      	releases buffer
      
      At this point, we have an initialised inode buffer that is up to
      date but has an -EIO state registered against it. When we finally
      get to recovering an inode in that buffer:
      
      	inode item recovery
      	    xfs_trans_read_buffer
      		find buffer
      		lock buffer
      		sees XBF_DONE is set, returns buffer
      	    sees bp->b_error is set
      		fail log recovery!
      
      Essentially, we need xfs_trans_get_buf_map() to clear the error status of
      the buffer when doing a lookup. This function returns uninitialised
      buffers, so the buffer returned can not be in an error state and
      none of the code that uses this function expects b_error to be set
      on return. Indeed, there is an ASSERT(!bp->b_error); in the
      transaction case in xfs_trans_get_buf_map() that would have caught
      this if log recovery used transactions....
      
      This patch firstly changes the inode readahead failure to set -EIO
      on the buffer, and secondly changes xfs_buf_get_map() to never
      return a buffer with an error state set so this first change doesn't
      cause unexpected log recovery failures.
      
      cc: <stable@vger.kernel.org> # 3.12 - current
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      cc824d71