1. 29 Jun, 2015 19 commits
  2. 23 Jun, 2015 21 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.0.6 · a0ce8894
      Greg Kroah-Hartman authored
      a0ce8894
    • Chris Mason's avatar
      Btrfs: fix regression in raid level conversion · e8b62da4
      Chris Mason authored
      commit 153c35b6 upstream.
      
      Commit 2f081088 changed
      btrfs_set_block_group_ro to avoid trying to allocate new chunks with the
      new raid profile during conversion.  This fixed failures when there was
      no space on the drive to allocate a new chunk, but the metadata
      reserves were sufficient to continue the conversion.
      
      But this ended up causing a regression when the drive had plenty of
      space to allocate new chunks, mostly because reduce_alloc_profile isn't
      using the new raid profile.
      
      Fixing btrfs_reduce_alloc_profile is a bigger patch.  For now, do a
      partial revert of 2f081088, and don't error out if we hit ENOSPC.
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Tested-by: default avatarDave Sterba <dsterba@suse.cz>
      Reported-by: default avatarHolger Hoffstaette <holger.hoffstaette@googlemail.com>
      [adapted for stable kernel branch, v4.0.5]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8b62da4
    • Chris Mason's avatar
      Btrfs: fix uninit variable in clone ioctl · 1e24644a
      Chris Mason authored
      commit de249e66 upstream.
      
      Commit 0d97a64e0 creates a new variable but doesn't always set it up.
      This puts it back to the original method (key.offset + 1) for the cases
      not covered by Filipe's new logic.
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e24644a
    • Filipe Manana's avatar
      Btrfs: fix range cloning when same inode used as source and destination · 64b02777
      Filipe Manana authored
      commit df858e76 upstream.
      
      While searching for extents to clone we might find one where we only use
      a part of it coming from its tail. If our destination inode is the same
      the source inode, we end up removing the tail part of the extent item and
      insert after a new one that point to the same extent with an adjusted
      key file offset and data offset. After this we search for the next extent
      item in the fs/subvol tree with a key that has an offset incremented by
      one. But this second search leaves us at the new extent item we inserted
      previously, and since that extent item has a non-zero data offset, it
      it can make us call btrfs_drop_extents with an empty range (start == end)
      which causes the following warning:
      
      [23978.537119] WARNING: CPU: 6 PID: 16251 at fs/btrfs/file.c:550 btrfs_drop_extent_cache+0x43/0x385 [btrfs]()
      (...)
      [23978.557266] Call Trace:
      [23978.557978]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
      [23978.559191]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
      [23978.560699]  [<ffffffffa047f0ea>] ? btrfs_drop_extent_cache+0x43/0x385 [btrfs]
      [23978.562389]  [<ffffffff8104544d>] warn_slowpath_null+0x1a/0x1c
      [23978.563613]  [<ffffffffa047f0ea>] btrfs_drop_extent_cache+0x43/0x385 [btrfs]
      [23978.565103]  [<ffffffff810e3a18>] ? time_hardirqs_off+0x15/0x28
      [23978.566294]  [<ffffffff81079ff8>] ? trace_hardirqs_off+0xd/0xf
      [23978.567438]  [<ffffffffa047f73d>] __btrfs_drop_extents+0x6b/0x9e1 [btrfs]
      [23978.568702]  [<ffffffff8107c03f>] ? trace_hardirqs_on+0xd/0xf
      [23978.569763]  [<ffffffff811441c0>] ? ____cache_alloc+0x69/0x2eb
      [23978.570817]  [<ffffffff81142269>] ? virt_to_head_page+0x9/0x36
      [23978.571872]  [<ffffffff81143c15>] ? cache_alloc_debugcheck_after.isra.42+0x16c/0x1cb
      [23978.573466]  [<ffffffff811420d5>] ? kmemleak_alloc_recursive.constprop.52+0x16/0x18
      [23978.574962]  [<ffffffffa0480d07>] btrfs_drop_extents+0x66/0x7f [btrfs]
      [23978.576179]  [<ffffffffa049aa35>] btrfs_clone+0x516/0xaf5 [btrfs]
      [23978.577311]  [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs]
      [23978.578520]  [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs]
      [23978.580282]  [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs]
      (...)
      [23978.591887] ---[ end trace 988ec2a653d03ed3 ]---
      
      Then we attempt to insert a new extent item with a key that already
      exists, which makes btrfs_insert_empty_item return -EEXIST resulting in
      abortion of the current transaction:
      
      [23978.594355] WARNING: CPU: 6 PID: 16251 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
      (...)
      [23978.622589] Call Trace:
      [23978.623181]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
      [23978.624359]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
      [23978.625573]  [<ffffffffa044ab6c>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs]
      [23978.626971]  [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
      [23978.628003]  [<ffffffff8108a6c8>] ? vprintk_default+0x1d/0x1f
      [23978.629138]  [<ffffffffa044ab6c>] __btrfs_abort_transaction+0x52/0x114 [btrfs]
      [23978.630528]  [<ffffffffa049ad1b>] btrfs_clone+0x7fc/0xaf5 [btrfs]
      [23978.631635]  [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs]
      [23978.632886]  [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs]
      [23978.634119]  [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs]
      (...)
      [23978.647714] ---[ end trace 988ec2a653d03ed4 ]---
      
      This is wrong because we should not process the extent item that we just
      inserted previously, and instead process the extent item that follows it
      in the tree
      
      For example for the test case I wrote for fstests:
      
         bs=$((64 * 1024))
         mkfs.btrfs -f -l $bs -O ^no-holes /dev/sdc
         mount /dev/sdc /mnt
      
         xfs_io -f -c "pwrite -S 0xaa $(($bs * 2)) $(($bs * 2))" /mnt/foo
      
         $CLONER_PROG -s $((3 * $bs)) -d $((267 * $bs)) -l 0 /mnt/foo /mnt/foo
         $CLONER_PROG -s $((217 * $bs)) -d $((95 * $bs)) -l 0 /mnt/foo /mnt/foo
      
      The second clone call fails with -EEXIST, because when we process the
      first extent item (offset 262144), we drop part of it (counting from the
      end) and then insert a new extent item with a key greater then the key we
      found. The next time we search the tree we search for a key with offset
      262144 + 1, which leaves us at the new extent item we have just inserted
      but we think it refers to an extent that we need to clone.
      
      Fix this by ensuring the next search key uses an offset corresponding to
      the offset of the key we found previously plus the data length of the
      corresponding extent item. This ensures we skip new extent items that we
      inserted and works for the case of implicit holes too (NO_HOLES feature).
      
      A test case for fstests follows soon.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      64b02777
    • Jeff Mahoney's avatar
      btrfs: cleanup orphans while looking up default subvolume · 845f2c2f
      Jeff Mahoney authored
      commit 727b9784 upstream.
      
      Orphans in the fs tree are cleaned up via open_ctree and subvolume
      orphans are cleaned via btrfs_lookup_dentry -- except when a default
      subvolume is in use.  The name for the default subvolume uses a manual
      lookup that doesn't trigger orphan cleanup and needs to trigger it
      manually as well. This doesn't apply to the remount case since the
      subvolumes are cleaned up by walking the root radix tree.
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      845f2c2f
    • Chengyu Song's avatar
      btrfs: incorrect handling for fiemap_fill_next_extent return · 2c45304d
      Chengyu Song authored
      commit 26e726af upstream.
      
      fiemap_fill_next_extent returns 0 on success, -errno on error, 1 if this was
      the last extent that will fit in user array. If 1 is returned, the return
      value may eventually returned to user space, which should not happen, according
      to manpage of ioctl.
      Signed-off-by: default avatarChengyu Song <csong84@gatech.edu>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c45304d
    • Filipe Manana's avatar
      Btrfs: send, don't leave without decrementing clone root's send_progress · 3a1e1ca1
      Filipe Manana authored
      commit 2f1f465a upstream.
      
      If the clone root was not readonly or the dead flag was set on it, we were
      leaving without decrementing the root's send_progress counter (and before
      we just incremented it). If a concurrent snapshot deletion was in progress
      and ended up being aborted, it would be impossible to later attempt to
      delete again the snapshot, since the root's send_in_progress counter could
      never go back to 0.
      
      We were also setting clone_sources_to_rollback to i + 1 too early - if we
      bailed out because the clone root we got is not readonly or flagged as dead
      we ended up later derreferencing a null pointer because we didn't assign
      the clone root to sctx->clone_roots[i].root:
      
      		for (i = 0; sctx && i < clone_sources_to_rollback; i++)
      			btrfs_root_dec_send_in_progress(
      					sctx->clone_roots[i].root);
      
      So just don't increment the send_in_progress counter if the root is readonly
      or flagged as dead.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a1e1ca1
    • Filipe Manana's avatar
      Btrfs: send, add missing check for dead clone root · 146a4565
      Filipe Manana authored
      commit 5cc2b17e upstream.
      
      After we locked the root's root item, a concurrent snapshot deletion
      call might have set the dead flag on it. So check if the dead flag
      is set and abort if it is, just like we do for the parent root.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      146a4565
    • Oleg Nesterov's avatar
      x86/vdso: Fix 'make bzImage' on older distros · d5709e64
      Oleg Nesterov authored
      commit ef7254a5 upstream.
      
      Change HOST_EXTRACFLAGS to include arch/x86/include/uapi along
      with include/uapi.
      
      This looks more consistent, and this fixes "make bzImage" on my
      old distro which doesn't have asm/bitsperlong.h in /usr/include/.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 6f121e54 ("x86, vdso: Reimplement vdso.so preparation in build-time C")
      Link: http://lkml.kernel.org/r/1431332153-18566-6-git-send-email-bp@alien8.de
      Link: http://lkml.kernel.org/r/20150507165835.GB18652@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5709e64
    • Tommi Kyntola's avatar
      x86/vdso: Fix the x86 vdso2c tool includes · 5b4b2604
      Tommi Kyntola authored
      commit 0a4f59d6 upstream.
      
      The build-time tool arch/x86/vdso/vdso2c.c includes <linux/elf.h>,
      but cannot find it, unless the build host happens to provide it.
      
      It should be reading the uapi linux/elf.h
      
      This build regression came along with the vdso2c changes between
      v3.15 and v3.16.
      Signed-off-by: default avatarTommi Kyntola <tommi.kyntola@gmail.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Link: http://lkml.kernel.org/r/1525002.3cJ7BySVpA@musta
      Link: http://lkml.kernel.org/r/efe1ec29eda830b1d0030882706f3dac99ce1f73.1427482099.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b4b2604
    • Axel Lin's avatar
      irqchip: sunxi-nmi: Fix off-by-one error in irq iterator · 80e2a770
      Axel Lin authored
      commit febe0696 upstream.
      
      Fixes: 6058bb36 'ARM: sun7i/sun6i: irqchip: Add irqchip driver for NMI controller'
      Signed-off-by: default avatarAxel Lin <axel.lin@ingics.com>
      Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
      Cc: Carlo Caione <carlo@caione.org>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Link: http://lkml.kernel.org/r/1433684009.9134.1.camel@ingics.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80e2a770
    • Johannes Berg's avatar
      cfg80211: wext: clear sinfo struct before calling driver · 48baed92
      Johannes Berg authored
      commit 9c5a18a3 upstream.
      
      Until recently, mac80211 overwrote all the statistics it could
      provide when getting called, but it now relies on the struct
      having been zeroed by the caller. This was always the case in
      nl80211, but wext used a static struct which could even cause
      values from one device leak to another.
      
      Using a static struct is OK (as even documented in a comment)
      since the whole usage of this function and its return value is
      always locked under RTNL. Not clearing the struct for calling
      the driver has always been wrong though, since drivers were
      free to only fill values they could report, so calling this
      for one device and then for another would always have leaked
      values from one to the other.
      
      Fix this by initializing the structure in question before the
      driver method call.
      
      This fixes https://bugzilla.kernel.org/show_bug.cgi?id=99691Reported-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Reported-by: default avatarAlexander Kaltsas <alexkaltsas@gmail.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      48baed92
    • Ming Lei's avatar
      blk-mq: free hctx->ctxs in queue's release handler · 3804579f
      Ming Lei authored
      commit c3b4afca upstream.
      
      Now blk_cleanup_queue() can be called before calling
      del_gendisk()[1], inside which hctx->ctxs is touched
      from blk_mq_unregister_hctx(), but the variable has
      been freed by blk_cleanup_queue() at that time.
      
      So this patch moves freeing of hctx->ctxs into queue's
      release handler for fixing the oops reported by Stefan.
      
      [1], 6cd18e71 (block: destroy bdi before blockdev is
      unregistered)
      Reported-by: default avatarStefan Seyfried <stefan.seyfried@googlemail.com>
      Cc: NeilBrown <neilb@suse.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <tom.leiming@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3804579f
    • Mel Gorman's avatar
      sched, numa: do not hint for NUMA balancing on VM_MIXEDMAP mappings · 0ffafd0b
      Mel Gorman authored
      commit 8e76d4ee upstream.
      
      Jovi Zhangwei reported the following problem
      
        Below kernel vm bug can be triggered by tcpdump which mmaped a lot of pages
        with GFP_COMP flag.
      
        [Mon May 25 05:29:33 2015] page:ffffea0015414000 count:66 mapcount:1 mapping:          (null) index:0x0
        [Mon May 25 05:29:33 2015] flags: 0x20047580004000(head)
        [Mon May 25 05:29:33 2015] page dumped because: VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page))
        [Mon May 25 05:29:33 2015] ------------[ cut here ]------------
        [Mon May 25 05:29:33 2015] kernel BUG at mm/migrate.c:1661!
        [Mon May 25 05:29:33 2015] invalid opcode: 0000 [#1] SMP
      
      In this case it was triggered by running tcpdump but it's not necessary
      reproducible on all systems.
      
        sudo tcpdump -i bond0.100 'tcp port 4242' -c 100000000000 -w 4242.pcap
      
      Compound pages cannot be migrated and it was not expected that such pages
      be marked for NUMA balancing.  This did not take into account that drivers
      such as net/packet/af_packet.c may insert compound pages into userspace
      with vm_insert_page.  This patch tells the NUMA balancing protection
      scanner to skip all VM_MIXEDMAP mappings which avoids the possibility that
      compound pages are marked for migration.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reported-by: default avatarJovi Zhangwei <jovi@cloudflare.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ffafd0b
    • NeilBrown's avatar
      md: don't return 0 from array_state_store · 4715a2ff
      NeilBrown authored
      commit c008f1d3 upstream.
      
      Returning zero from a 'store' function is bad.
      The return value should be either len length of the string
      or an error.
      
      So use 'len' if 'err' is zero.
      
      Fixes: 6791875e ("md: make reconfig_mutex optional for writes to md sysfs files.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4715a2ff
    • NeilBrown's avatar
      md: Close race when setting 'action' to 'idle'. · ad48fa9a
      NeilBrown authored
      commit 8e8e2518 upstream.
      
      Checking ->sync_thread without holding the mddev_lock()
      isn't really safe, even after flushing the workqueue which
      ensures md_start_sync() has been run.
      
      While this code is waiting for the lock, md_check_recovery could reap
      the thread itself, and then start another thread (e.g. recovery might
      finish, then reshape starts).  When this thread gets the lock
      md_start_sync() hasn't run so it doesn't get reaped, but
      MD_RECOVERY_RUNNING gets cleared.  This allows two threads to start
      which leads to confusion.
      
      So don't both if MD_RECOVERY_RUNNING isn't set, but if it is do
      the flush and the test and the reap all under the mddev_lock to
      avoid any race with md_check_recovery.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Fixes: 6791875e ("md: make reconfig_mutex optional for writes to md sysfs files.")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ad48fa9a
    • Gu Zheng's avatar
      mm/memory_hotplug.c: set zone->wait_table to null after freeing it · 5dbedee4
      Gu Zheng authored
      commit 85bd8399 upstream.
      
      Izumi found the following oops when hot re-adding a node:
      
          BUG: unable to handle kernel paging request at ffffc90008963690
          IP: __wake_up_bit+0x20/0x70
          Oops: 0000 [#1] SMP
          CPU: 68 PID: 1237 Comm: rs:main Q:Reg Not tainted 4.1.0-rc5 #80
          Hardware name: FUJITSU PRIMEQUEST2800E/SB, BIOS PRIMEQUEST 2000 Series BIOS Version 1.87 04/28/2015
          task: ffff880838df8000 ti: ffff880017b94000 task.ti: ffff880017b94000
          RIP: 0010:[<ffffffff810dff80>]  [<ffffffff810dff80>] __wake_up_bit+0x20/0x70
          RSP: 0018:ffff880017b97be8  EFLAGS: 00010246
          RAX: ffffc90008963690 RBX: 00000000003c0000 RCX: 000000000000a4c9
          RDX: 0000000000000000 RSI: ffffea101bffd500 RDI: ffffc90008963648
          RBP: ffff880017b97c08 R08: 0000000002000020 R09: 0000000000000000
          R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a0797c73800
          R13: ffffea101bffd500 R14: 0000000000000001 R15: 00000000003c0000
          FS:  00007fcc7ffff700(0000) GS:ffff880874800000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: ffffc90008963690 CR3: 0000000836761000 CR4: 00000000001407e0
          Call Trace:
            unlock_page+0x6d/0x70
            generic_write_end+0x53/0xb0
            xfs_vm_write_end+0x29/0x80 [xfs]
            generic_perform_write+0x10a/0x1e0
            xfs_file_buffered_aio_write+0x14d/0x3e0 [xfs]
            xfs_file_write_iter+0x79/0x120 [xfs]
            __vfs_write+0xd4/0x110
            vfs_write+0xac/0x1c0
            SyS_write+0x58/0xd0
            system_call_fastpath+0x12/0x76
          Code: 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 45 f8 31 c0 48 8d 47 48 <48> 39 47 48 48 c7 45 e8 00 00 00 00 48 c7 45 f0 00 00 00 00 48
          RIP  [<ffffffff810dff80>] __wake_up_bit+0x20/0x70
           RSP <ffff880017b97be8>
          CR2: ffffc90008963690
      
      Reproduce method (re-add a node)::
        Hot-add nodeA --> remove nodeA --> hot-add nodeA (panic)
      
      This seems an use-after-free problem, and the root cause is
      zone->wait_table was not set to *NULL* after free it in
      try_offline_node.
      
      When hot re-add a node, we will reuse the pgdat of it, so does the zone
      struct, and when add pages to the target zone, it will init the zone
      first (including the wait_table) if the zone is not initialized.  The
      judgement of zone initialized is based on zone->wait_table:
      
      	static inline bool zone_is_initialized(struct zone *zone)
      	{
      		return !!zone->wait_table;
      	}
      
      so if we do not set the zone->wait_table to *NULL* after free it, the
      memory hotplug routine will skip the init of new zone when hot re-add
      the node, and the wait_table still points to the freed memory, then we
      will access the invalid address when trying to wake up the waiting
      people after the i/o operation with the page is done, such as mentioned
      above.
      Signed-off-by: default avatarGu Zheng <guz.fnst@cn.fujitsu.com>
      Reported-by: default avatarTaku Izumi <izumi.taku@jp.fujitsu.com>
      Reviewed by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5dbedee4
    • Yingjoe Chen's avatar
      arm64: dts: mt8173-evb: fix model name · 134e0ffd
      Yingjoe Chen authored
      commit 692ef3ee upstream.
      
      Model name in mt8173-evb.dts doesn't follow dts convention (it should
      be human readable model name). Fix it.
      
      Fixes: b3a37248 ("arm64: dts: Add mediatek MT8173 SoC and evaluation board dts and Makefile")
      Signed-off-by: default avatarYingjoe Chen <yingjoe.chen@mediatek.com>
      Signed-off-by: default avatarMatthias Brugger <matthias.bgg@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      134e0ffd
    • Thomas Petazzoni's avatar
      Revert "bus: mvebu-mbus: make sure SDRAM CS for DMA don't overlap the MBus bridge window" · 16c1a8f3
      Thomas Petazzoni authored
      commit 885dbd15 upstream.
      
      This reverts commit 1737cac6 ("bus: mvebu-mbus: make sure SDRAM CS
      for DMA don't overlap the MBus bridge window"), because it breaks DMA
      on platforms having more than 2 GB of RAM.
      
      This commit changed the information reported to DMA masters device
      drivers through the mv_mbus_dram_info() function so that the returned
      DRAM ranges do not overlap with I/O windows.
      
      This was necessary as a preparation to support the new CESA Crypto
      Engine driver, which will use DMA for cryptographic operations. But
      since it does DMA with the SRAM which is mapped as an I/O window,
      having DRAM ranges overlapping with I/O windows was problematic.
      
      To solve this, the above mentioned commit changed the mvebu-mbus to
      adjust the DRAM ranges so that they don't overlap with the I/O
      windows. However, by doing this, we re-adjust the DRAM ranges in a way
      that makes them have a size that is no longer a power of two. While
      this is perfectly fine for the Crypto Engine, which supports DRAM
      ranges with a granularity of 64 KB, it breaks basically all other DMA
      masters, which expect power of two sizes for the DRAM ranges.
      
      Due to this, if the installed system memory is 4 GB, in two
      chip-selects of 2 GB, the second DRAM range will be reduced from 2 GB
      to a little bit less than 2 GB to not overlap with the I/O windows, in
      a way that results in a DRAM range that doesn't have a power of two
      size. This means that whenever you do a DMA transfer with an address
      located in the [ 2 GB ; 4 GB ] area, it will freeze the system. Any
      serious DMA activity like simply running:
      
        for i in $(seq 1 64) ; do dd if=/dev/urandom of=file$i bs=1M count=16 ; done
      
      in an ext3 partition mounted over a SATA drive will freeze the system.
      
      Since the new CESA crypto driver that uses DMA has not been merged
      yet, the easiest fix is to simply revert this commit. A follow-up
      commit will introduce a different solution for the CESA crypto driver.
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Fixes: 1737cac6 ("bus: mvebu-mbus: make sure SDRAM CS for DMA don't overlap the MBus bridge window")
      Signed-off-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      16c1a8f3
    • Nicolas Schichan's avatar
      bus: mvebu-mbus: do not set WIN_CTRL_SYNCBARRIER on non io-coherent platforms. · e2045b79
      Nicolas Schichan authored
      commit 8c9e06e6 upstream.
      
      Commit a0b5cd4a ("bus: mvebu-mbus: use automatic I/O
      synchronization barriers") enabled the usage of automatic I/O
      synchronization barriers by enabling bit WIN_CTRL_SYNCBARRIER in the
      control registers of MBus windows, but on non io-coherent platforms
      (orion5x, kirkwood and dove) the WIN_CTRL_SYNCBARRIER bit in
      the window control register is either reserved (all windows except 6
      and 7) or enables read-only protection (windows 6 and 7).
      Signed-off-by: default avatarNicolas Schichan <nschichan@freebox.fr>
      Reviewed-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Fixes: a0b5cd4a ("bus: mvebu-mbus: use automatic I/O synchronization barriers")
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2045b79
    • Nadav Haklai's avatar
      ata: ahci_mvebu: Fix wrongly set base address for the MBus window setting · 7dcb6917
      Nadav Haklai authored
      commit e96998fc upstream.
      
      According to the Armada 38x datasheet, the window base address
      registers value is set in bits [31:4] of the register and corresponds
      to the transaction address bits [47:20].
      
      Therefore, the 32bit base address value should be shifted right by
      20bits and left by 4bits, resulting in 16 bit shift right.
      
      The bug as not been noticed yet because if the memory available on
      the platform is less than 2GB, then the base address is zero.
      
      [gregory.clement@free-electrons.com: add extra-explanation]
      
      Fixes: a3464ed2 (ata: ahci_mvebu: new driver for Marvell Armada 380
      AHCI interfaces)
      Signed-off-by: default avatarNadav Haklai <nadavh@marvell.com>
      Reviewed-by: default avatarOmri Itach <omrii@marvell.com>
      Signed-off-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7dcb6917