1. 06 Aug, 2015 15 commits
    • Giuseppe Cavallaro's avatar
      drivers: clk: st: Fix flexgen lock init · 2f502e43
      Giuseppe Cavallaro authored
      commit 0f4f2afd upstream.
      
      While proving lock, the following warning happens
      and it is fixed after initializing lock in the setup
      function
      
      INFO: trying to register non-static key.
      the code is fine but needs lockdep annotation.
      turning off the locking correctness validator.
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.27-02861-g39df285-dirty #33
      [<c00154ac>] (unwind_backtrace+0x0/0xf4) from [<c0011b50>] (show_stack+0x10/0x14)
      [<c0011b50>] (show_stack+0x10/0x14) from [<c00689ac>] (__lock_acquire+0x900/0xb14)
      [<c00689ac>] (__lock_acquire+0x900/0xb14) from [<c0069394>] (lock_acquire+0x68/0x7c)
      [<c0069394>] (lock_acquire+0x68/0x7c) from [<c04958f8>] (_raw_spin_lock_irqsave+0x48/0x5c)
      [<c04958f8>] (_raw_spin_lock_irqsave+0x48/0x5c) from [<c0381e6c>] (clk_gate_endisable+0x28/0x88)
      [<c0381e6c>] (clk_gate_endisable+0x28/0x88) from [<c0381ee0>] (clk_gate_enable+0xc/0x14)
      [<c0381ee0>] (clk_gate_enable+0xc/0x14) from [<c0386c68>] (flexgen_enable+0x28/0x40)
      [<c0386c68>] (flexgen_enable+0x28/0x40) from [<c037f260>] (__clk_enable+0x5c/0x9c)
      [<c037f260>] (__clk_enable+0x5c/0x9c) from [<c037f558>] (clk_enable+0x18/0x2c)
      [<c037f558>] (clk_enable+0x18/0x2c) from [<c064a1dc>] (st_lpc_of_register+0xc0/0x248)
      [<c064a1dc>] (st_lpc_of_register+0xc0/0x248) from [<c0649e44>] (clocksource_of_init+0x34/0x58)
      [<c0649e44>] (clocksource_of_init+0x34/0x58) from [<c0637ddc>] (sti_timer_init+0x10/0x18)
      [<c0637ddc>] (sti_timer_init+0x10/0x18) from [<c06343f8>] (time_init+0x20/0x30)
      [<c06343f8>] (time_init+0x20/0x30) from [<c0632984>] (start_kernel+0x20c/0x2e8)
      [<c0632984>] (start_kernel+0x20c/0x2e8) from [<40008074>] (0x40008074)
      Signed-off-by: default avatarGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: default avatarGabriel Fernandez <gabriel.fernandez@linaro.org>
      Fixes: b1165170 ("clk: st: STiH407: Support for Flexgen Clocks")
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      2f502e43
    • Stephen Boyd's avatar
      ARM: 8393/1: smp: Fix suspicious RCU usage with ipi tracepoints · e266acbc
      Stephen Boyd authored
      commit 398f7456 upstream.
      
      John Stultz reports an RCU splat on boot with ARM ipi trace
      events enabled.
      
      ===============================
      [ INFO: suspicious RCU usage. ]
      4.1.0-rc7-00033-gb5bed2f #153 Not tainted
      -------------------------------
      include/trace/events/ipi.h:68 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      RCU used illegally from idle CPU!
      rcu_scheduler_active = 1, debug_locks = 0
      RCU used illegally from extended quiescent state!
      no locks held by swapper/0/0.
      
      stack backtrace:
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.0-rc7-00033-gb5bed2f #153
      Hardware name: Qualcomm (Flattened Device Tree)
      [<c0216b08>] (unwind_backtrace) from [<c02136e8>] (show_stack+0x10/0x14)
      [<c02136e8>] (show_stack) from [<c075e678>] (dump_stack+0x70/0xbc)
      [<c075e678>] (dump_stack) from [<c0215a80>] (handle_IPI+0x428/0x604)
      [<c0215a80>] (handle_IPI) from [<c020942c>] (gic_handle_irq+0x54/0x5c)
      [<c020942c>] (gic_handle_irq) from [<c0766604>] (__irq_svc+0x44/0x7c)
      Exception stack(0xc09f3f48 to 0xc09f3f90)
      3f40:                   00000001 00000001 00000000 c09f73b8 c09f4528 c0a5de9c
      3f60: c076b4f0 00000000 00000000 c09ef108 c0a5cec1 00000001 00000000 c09f3f90
      3f80: c026bf60 c0210ab8 20000113 ffffffff
      [<c0766604>] (__irq_svc) from [<c0210ab8>] (arch_cpu_idle+0x20/0x3c)
      [<c0210ab8>] (arch_cpu_idle) from [<c02647f0>] (cpu_startup_entry+0x2c0/0x5dc)
      [<c02647f0>] (cpu_startup_entry) from [<c099bc1c>] (start_kernel+0x358/0x3c4)
      [<c099bc1c>] (start_kernel) from [<8020807c>] (0x8020807c)
      
      At this point in the IPI handling path we haven't called
      irq_enter() yet, so RCU doesn't know that we're about to exit
      idle and properly warns that we're using RCU from an idle CPU.
      Use trace_ipi_entry_rcuidle() instead of trace_ipi_entry() so
      that RCU is informed about our exit from idle.
      
      Fixes: 365ec7b1 ("ARM: add IPI tracepoints")
      Reported-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Tested-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e266acbc
    • Gabriel Fernandez's avatar
      drivers: clk: st: Fix mux bit-setting for Cortex A9 clocks · 8a92e0c5
      Gabriel Fernandez authored
      commit 3be6d8ce upstream.
      
      This patch fixes the mux bit-setting for ClockgenA9.
      Signed-off-by: default avatarGabriel Fernandez <gabriel.fernandez@linaro.org>
      Fixes: 13e6f2da ("clk: st: STiH407: Support for A9 MUX Clocks")
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8a92e0c5
    • Pankaj Dev's avatar
      drivers: clk: st: Incorrect register offset used for lock_status · ddac306d
      Pankaj Dev authored
      commit 56551da9 upstream.
      
      Incorrect register offset used for sthi407 clockgenC
      Signed-off-by: default avatarPankaj Dev <pankaj.dev@st.com>
      Signed-off-by: default avatarGabriel Fernandez <gabriel.fernandez@linaro.org>
      Fixes: 51306d56 ("clk: st: STiH407: Support for clockgenC0")
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ddac306d
    • Hai Li's avatar
      clk: qcom: Use parent rate when set rate to pixel RCG clock · 547dca92
      Hai Li authored
      commit 6d451367 upstream.
      
      Since the parent rate has been recalculated, pixel RCG clock
      should rely on it to find the correct M/N values during set_rate,
      instead of calling __clk_round_rate() to its parent again.
      Signed-off-by: default avatarHai Li <hali@codeaurora.org>
      Tested-by: default avatarArchit Taneja <architt@codeaurora.org>
      Fixes: 99cbd064 ("clk: qcom: Support display RCG clocks")
      [sboyd@codeaurora.org: Silenced unused parent variable warning]
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      [ kamal: backport to 3.19-stable: context ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      547dca92
    • Al Viro's avatar
      freeing unlinked file indefinitely delayed · f644ddac
      Al Viro authored
      commit 75a6f82a upstream.
      
      	Normally opening a file, unlinking it and then closing will have
      the inode freed upon close() (provided that it's not otherwise busy and
      has no remaining links, of course).  However, there's one case where that
      does *not* happen.  Namely, if you open it by fhandle with cold dcache,
      then unlink() and close().
      
      	In normal case you get d_delete() in unlink(2) notice that dentry
      is busy and unhash it; on the final dput() it will be forcibly evicted from
      dcache, triggering iput() and inode removal.  In this case, though, we end
      up with *two* dentries - disconnected (created by open-by-fhandle) and
      regular one (used by unlink()).  The latter will have its reference to inode
      dropped just fine, but the former will not - it's considered hashed (it
      is on the ->s_anon list), so it will stay around until the memory pressure
      will finally do it in.  As the result, we have the final iput() delayed
      indefinitely.  It's trivial to reproduce -
      
      void flush_dcache(void)
      {
              system("mount -o remount,rw /");
      }
      
      static char buf[20 * 1024 * 1024];
      
      main()
      {
              int fd;
              union {
                      struct file_handle f;
                      char buf[MAX_HANDLE_SZ];
              } x;
              int m;
      
              x.f.handle_bytes = sizeof(x);
              chdir("/root");
              mkdir("foo", 0700);
              fd = open("foo/bar", O_CREAT | O_RDWR, 0600);
              close(fd);
              name_to_handle_at(AT_FDCWD, "foo/bar", &x.f, &m, 0);
              flush_dcache();
              fd = open_by_handle_at(AT_FDCWD, &x.f, O_RDWR);
              unlink("foo/bar");
              write(fd, buf, sizeof(buf));
              system("df .");			/* 20Mb eaten */
              close(fd);
              system("df .");			/* should've freed those 20Mb */
              flush_dcache();
              system("df .");			/* should be the same as #2 */
      }
      
      will spit out something like
      Filesystem     1K-blocks   Used Available Use% Mounted on
      /dev/root         322023 303843      1131 100% /
      Filesystem     1K-blocks   Used Available Use% Mounted on
      /dev/root         322023 303843      1131 100% /
      Filesystem     1K-blocks   Used Available Use% Mounted on
      /dev/root         322023 283282     21692  93% /
      - inode gets freed only when dentry is finally evicted (here we trigger
      than by remount; normally it would've happened in response to memory
      pressure hell knows when).
      Acked-by: default avatarJ. Bruce Fields <bfields@fieldses.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      [ kamal: backport to 3.19-stable: no fast_dput() ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f644ddac
    • Al Viro's avatar
      9p: don't leave a half-initialized inode sitting around · 43af4b20
      Al Viro authored
      commit 0a73d0a2 upstream.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      43af4b20
    • John David Anglin's avatar
      parisc: Fix some PTE/TLB race conditions and optimize __flush_tlb_range based on timing results · fbe677d6
      John David Anglin authored
      commit 01ab6057 upstream.
      
      The increased use of pdtlb/pitlb instructions seemed to increase the
      frequency of random segmentation faults building packages. Further, we
      had a number of cases where TLB inserts would repeatedly fail and all
      forward progress would stop. The Haskell ghc package caused a lot of
      trouble in this area. The final indication of a race in pte handling was
      this syslog entry on sibaris (C8000):
      
       swap_free: Unused swap offset entry 00000004
       BUG: Bad page map in process mysqld  pte:00000100 pmd:019bbec5
       addr:00000000ec464000 vm_flags:00100073 anon_vma:0000000221023828 mapping: (null) index:ec464
       CPU: 1 PID: 9176 Comm: mysqld Not tainted 4.0.0-2-parisc64-smp #1 Debian 4.0.5-1
       Backtrace:
        [<0000000040173eb0>] show_stack+0x20/0x38
        [<0000000040444424>] dump_stack+0x9c/0x110
        [<00000000402a0d38>] print_bad_pte+0x1a8/0x278
        [<00000000402a28b8>] unmap_single_vma+0x3d8/0x770
        [<00000000402a4090>] zap_page_range+0xf0/0x198
        [<00000000402ba2a4>] SyS_madvise+0x404/0x8c0
      
      Note that the pte value is 0 except for the accessed bit 0x100. This bit
      shouldn't be set without the present bit.
      
      It should be noted that the madvise system call is probably a trigger for many
      of the random segmentation faults.
      
      In looking at the kernel code, I found the following problems:
      
      1) The pte_clear define didn't take TLB lock when clearing a pte.
      2) We didn't test pte present bit inside lock in exception support.
      3) The pte and tlb locks needed to merged in order to ensure consistency
      between page table and TLB. This also has the effect of serializing TLB
      broadcasts on SMP systems.
      
      The attached change implements the above and a few other tweaks to try
      to improve performance. Based on the timing code, TLB purges are very
      slow (e.g., ~ 209 cycles per page on rp3440). Thus, I think it
      beneficial to test the split_tlb variable to avoid duplicate purges.
      Probably, all PA 2.0 machines have combined TLBs.
      
      I dropped using __flush_tlb_range in flush_tlb_mm as I realized all
      applications and most threads have a stack size that is too large to
      make this useful. I added some comments to this effect.
      
      Since implementing 1 through 3, I haven't had any random segmentation
      faults on mx3210 (rp3440) in about one week of building code and running
      as a Debian buildd.
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      fbe677d6
    • Daniel Axtens's avatar
      cxl: Check if afu is not null in cxl_slbia · 6005d1af
      Daniel Axtens authored
      commit 2c069a11 upstream.
      
      The pointer to an AFU in the adapter's list of AFUs can be null
      if we're in the process of removing AFUs. The afu_list_lock
      doesn't guard against this.
      
      Say we have 2 slices, and we're in the process of removing cxl.
       - We remove the AFUs in order (see cxl_remove). In cxl_remove_afu
         for AFU 0, we take the lock, set adapter->afu[0] = NULL, and
         release the lock.
       - Then we get an slbia. In cxl_slbia we take the lock, and set
         afu = adapter->afu[0], which is NULL.
       - Therefore our attempt to check afu->enabled will blow up.
      
      Therefore, check if afu is a null pointer before dereferencing it.
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Acked-by: default avatarMichael Neuling <mikey@neuling.org>
      Acked-by: default avatarIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6005d1af
    • Joe Perches's avatar
      hpfs: hpfs_error: Remove static buffer, use vsprintf extension %pV instead · bb7728b8
      Joe Perches authored
      commit a28e4b2b upstream.
      
      Removing unnecessary static buffers is good.
      Use the vsprintf %pV extension instead.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarMikulas Patocka <mikulas@twibright.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      bb7728b8
    • Sanidhya Kashyap's avatar
      hpfs: kstrdup() out of memory handling · 2fab688e
      Sanidhya Kashyap authored
      commit ce657611 upstream.
      
      There is a possibility of nothing being allocated to the new_opts in
      case of memory pressure, therefore return ENOMEM for such case.
      Signed-off-by: default avatarSanidhya Kashyap <sanidhya.gatech@gmail.com>
      Signed-off-by: default avatarMikulas Patocka <mikulas@twibright.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      2fab688e
    • Paul Moore's avatar
      selinux: don't waste ebitmap space when importing NetLabel categories · d80ad5a2
      Paul Moore authored
      commit 33246035 upstream.
      
      At present we don't create efficient ebitmaps when importing NetLabel
      category bitmaps.  This can present a problem when comparing ebitmaps
      since ebitmap_cmp() is very strict about these things and considers
      these wasteful ebitmaps not equal when compared to their more
      efficient counterparts, even if their values are the same.  This isn't
      likely to cause problems on 64-bit systems due to a bit of luck on
      how NetLabel/CIPSO works and the default ebitmap size, but it can be
      a problem on 32-bit systems.
      
      This patch fixes this problem by being a bit more intelligent when
      importing NetLabel category bitmaps by skipping over empty sections
      which should result in a nice, efficient ebitmap.
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d80ad5a2
    • Kirill A. Shutemov's avatar
      mm: avoid setting up anonymous pages into file mapping · b2ec585b
      Kirill A. Shutemov authored
      commit 6b7339f4 upstream.
      
      Reading page fault handler code I've noticed that under right
      circumstances kernel would map anonymous pages into file mappings: if
      the VMA doesn't have vm_ops->fault() and the VMA wasn't fully populated
      on ->mmap(), kernel would handle page fault to not populated pte with
      do_anonymous_page().
      
      Let's change page fault handler to use do_anonymous_page() only on
      anonymous VMA (->vm_ops == NULL) and make sure that the VMA is not
      shared.
      
      For file mappings without vm_ops->fault() or shred VMA without vm_ops,
      page fault on pte_none() entry would lead to SIGBUS.
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [ kamal: backport to 3.19-stable: s/do_fault/do_linear_fault/ ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b2ec585b
    • Grigori Goronzy's avatar
      drm/radeon: unpin cursor BOs on suspend and pin them again on resume (v2) · e198d1d8
      Grigori Goronzy authored
      commit f3cbb17b upstream.
      
      Everything is evicted from VRAM before suspend, so we need to make
      sure all BOs are unpinned and re-pinned after resume. Fixes broken
      mouse cursor after resume introduced by commit b9729b17.
      
      [Michel Dänzer: Add pinning BOs on resume]
      
      v2:
      [Alex Deucher: merge cursor unpin into fb unpin loop]
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=100541
      Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
      Signed-off-by: default avatarGrigori Goronzy <greg@chown.ath.cx>
      Signed-off-by: default avatarMichel Dänzer <michel.daenzer@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e198d1d8
    • Michel Dänzer's avatar
      drm/radeon: Clean up reference counting and pinning of the cursor BOs · 997f42e2
      Michel Dänzer authored
      commit cd404af0 upstream.
      
      Take a GEM reference for and pin the new cursor BO, unpin and drop the
      GEM reference for the old cursor BO in radeon_crtc_cursor_set2, and use
      radeon_crtc->cursor_addr in radeon_set_cursor.
      
      This fixes radeon_cursor_reset accidentally incrementing the cursor BO
      pin count, and cleans up the code a little.
      Reviewed-by: default avatarGrigori Goronzy <greg@chown.ath.cx>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarMichel Dänzer <michel.daenzer@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      997f42e2
  2. 05 Aug, 2015 14 commits
    • Grigori Goronzy's avatar
      drm/radeon: fix HDP flushing · 8b72decc
      Grigori Goronzy authored
      commit 54e03986 upstream.
      
      This was regressed by commit 39e7f6f8, although I don't know of any
      actual issues caused by it.
      
      The storage domain is read without TTM locking now, but the lock
      never helped to prevent any races.
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarGrigori Goronzy <greg@chown.ath.cx>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8b72decc
    • Ian Munsie's avatar
      cxl: Fix off by one error allowing subsequent mmap page to be accessed · 8c980662
      Ian Munsie authored
      commit 10a5894f upstream.
      
      It was discovered that if a process mmaped their problem state area they
      were able to access one page more than expected, potentially allowing
      them to access the problem state area of an unrelated process.
      
      This was due to a simple off by one error in the mmap fault handler
      introduced in 0712dc7e ("cxl: Fix issues
      when unmapping contexts"), which is fixed in this patch.
      
      Fixes: 0712dc7e ("cxl: Fix issues when unmapping contexts")
      Signed-off-by: default avatarIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8c980662
    • Shreyas B. Prabhu's avatar
      powerpc/powernv: Fix race in updating core_idle_state · 55fbd98c
      Shreyas B. Prabhu authored
      commit b32aadc1 upstream.
      
      core_idle_state is maintained for each core. It uses 0-7 bits to track
      whether a thread in the core has entered fastsleep or winkle. 8th bit is
      used as a lock bit.
      The lock bit is set in these 2 scenarios-
       - The thread is first in subcore to wakeup from sleep/winkle.
       - If its the last thread in the core about to enter sleep/winkle
      
      While the lock bit is set, if any other thread in the core wakes up, it
      loops until the lock bit is cleared before proceeding in the wakeup
      path. This helps prevent race conditions w.r.t fastsleep workaround and
      prevents threads from switching to process context before core/subcore
      resources are restored.
      
      But, in the path to sleep/winkle entry, we currently don't check for
      lock-bit. This exposes us to following race when running with subcore
      on-
      
      First thread in the subcorea		Another thread in the same
      waking up		   		core entering sleep/winkle
      
      lwarx   r15,0,r14
      ori     r15,r15,PNV_CORE_IDLE_LOCK_BIT
      stwcx.  r15,0,r14
      [Code to restore subcore state]
      
      						lwarx   r15,0,r14
      						[clear thread bit]
      						stwcx.  r15,0,r14
      
      andi.   r15,r15,PNV_CORE_IDLE_THREAD_BITS
      stw     r15,0(r14)
      
      Here, after the thread entering sleep clears its thread bit in
      core_idle_state, the value is overwritten by the thread waking up.
      In such cases when the core enters fastsleep, code mistakes an idle
      thread as running. Because of this, the first thread waking up from
      fastsleep which is supposed to resync timebase skips it. So we can
      end up having a core with stale timebase value.
      
      This patch fixes the above race by looping on the lock bit even while
      entering the idle states.
      Signed-off-by: default avatarShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Fixes: 7b54e9f213f76 'powernv/powerpc: Add winkle support for offline cpus'
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      55fbd98c
    • Rafael J. Wysocki's avatar
      ACPI / PNP: Reserve ACPI resources at the fs_initcall_sync stage · 39893e3b
      Rafael J. Wysocki authored
      commit 0294112e upstream.
      
      This effectively reverts the following three commits:
      
       7bc10388 ACPI / resources: free memory on error in add_region_before()
       0f1b414d ACPI / PNP: Avoid conflicting resource reservations
       b9a5e5e1 ACPI / init: Fix the ordering of acpi_reserve_resources()
      
      (commit b9a5e5e1 introduced regressions some of which, but not
      all, were addressed by commit 0f1b414d and commit 7bc10388
      was a fixup on top of the latter) and causes ACPI fixed hardware
      resources to be reserved at the fs_initcall_sync stage of system
      initialization.
      
      The story is as follows.  First, a boot regression was reported due
      to an apparent resource reservation ordering change after a commit
      that shouldn't lead to such changes.  Investigation led to the
      conclusion that the problem happened because acpi_reserve_resources()
      was executed at the device_initcall() stage of system initialization
      which wasn't strictly ordered with respect to driver initialization
      (and with respect to the initialization of the pcieport driver in
      particular), so a random change causing the device initcalls to be
      run in a different order might break things.
      
      The response to that was to attempt to run acpi_reserve_resources()
      as soon as we knew that ACPI would be in use (commit b9a5e5e1).
      However, that turned out to be too early, because it caused resource
      reservations made by the PNP system driver to fail on at least one
      system and that failure was addressed by commit 0f1b414d.
      
      That fix still turned out to be insufficient, though, because
      calling acpi_reserve_resources() before the fs_initcall stage of
      system initialization caused a boot regression to happen on the
      eCAFE EC-800-H20G/S netbook.  That meant that we only could call
      acpi_reserve_resources() at the fs_initcall initialization stage
      or later, but then we might just as well call it after the PNP
      initalization in which case commit 0f1b414d wouldn't be
      necessary any more.
      
      For this reason, the changes made by commit 0f1b414d are reverted
      (along with a memory leak fixup on top of that commit), the changes
      made by commit b9a5e5e1 that went too far are reverted too and
      acpi_reserve_resources() is changed into fs_initcall_sync, which
      will cause it to be executed after the PNP subsystem initialization
      (which is an fs_initcall) and before device initcalls (including
      the pcieport driver initialization) which should avoid the initial
      issue.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=100581
      Link: http://marc.info/?t=143092384600002&r=1&w=2
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=99831
      Link: http://marc.info/?t=143389402600001&r=1&w=2
      Fixes: b9a5e5e1 "ACPI / init: Fix the ordering of acpi_reserve_resources()"
      Reported-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      39893e3b
    • Roger Quadros's avatar
      ARM: dts: am57xx-beagle-x15: Provide supply for usb2_phy2 · 934ab6eb
      Roger Quadros authored
      commit 9ab402ae upstream.
      
      Without this USB2 breaks if USB1 is disabled or USB1
      initializes after USB2 e.g. due to deferred probing.
      
      Fixes: 5a0f93c6 ("ARM: dts: Add am57xx-beagle-x15")
      Signed-off-by: default avatarRoger Quadros <rogerq@ti.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      934ab6eb
    • Michal Hocko's avatar
      ext4: replace open coded nofail allocation in ext4_free_blocks() · ef505188
      Michal Hocko authored
      commit 7444a072 upstream.
      
      ext4_free_blocks is looping around the allocation request and mimics
      __GFP_NOFAIL behavior without any allocation fallback strategy. Let's
      remove the open coded loop and replace it with __GFP_NOFAIL. Without the
      flag the allocator has no way to find out never-fail requirement and
      cannot help in any way.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ef505188
    • Eryu Guan's avatar
      ext4: correctly migrate a file with a hole at the beginning · 47eb4b36
      Eryu Guan authored
      commit 8974fec7 upstream.
      
      Currently ext4_ind_migrate() doesn't correctly handle a file which
      contains a hole at the beginning of the file.  This caused the migration
      to be done incorrectly, and then if there is a subsequent following
      delayed allocation write to the "hole", this would reclaim the same data
      blocks again and results in fs corruption.
      
        # assmuing 4k block size ext4, with delalloc enabled
        # skip the first block and write to the second block
        xfs_io -fc "pwrite 4k 4k" -c "fsync" /mnt/ext4/testfile
      
        # converting to indirect-mapped file, which would move the data blocks
        # to the beginning of the file, but extent status cache still marks
        # that region as a hole
        chattr -e /mnt/ext4/testfile
      
        # delayed allocation writes to the "hole", reclaim the same data block
        # again, results in i_blocks corruption
        xfs_io -c "pwrite 0 4k" /mnt/ext4/testfile
        umount /mnt/ext4
        e2fsck -nf /dev/sda6
        ...
        Inode 53, i_blocks is 16, should be 8.  Fix? no
        ...
      Signed-off-by: default avatarEryu Guan <guaneryu@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      47eb4b36
    • Eryu Guan's avatar
      ext4: be more strict when migrating to non-extent based file · ca842bc9
      Eryu Guan authored
      commit d6f123a9 upstream.
      
      Currently the check in ext4_ind_migrate() is not enough before doing the
      real conversion:
      
      a) delayed allocated extents could bypass the check on eh->eh_entries
         and eh->eh_depth
      
      This can be demonstrated by this script
      
        xfs_io -fc "pwrite 0 4k" -c "pwrite 8k 4k" /mnt/ext4/testfile
        chattr -e /mnt/ext4/testfile
      
      where testfile has two extents but still be converted to non-extent
      based file format.
      
      b) only extent length is checked but not the offset, which would result
         in data lose (delalloc) or fs corruption (nodelalloc), because
         non-extent based file only supports at most (12 + 2^10 + 2^20 + 2^30)
         blocks
      
      This can be demostrated by
      
        xfs_io -fc "pwrite 5T 4k" /mnt/ext4/testfile
        chattr -e /mnt/ext4/testfile
        sync
      
      If delalloc is enabled, dmesg prints
        EXT4-fs warning (device dm-4): ext4_block_to_path:105: block 1342177280 > max in inode 53
        EXT4-fs (dm-4): Delayed block allocation failed for inode 53 at logical offset 1342177280 with max blocks 1 with error 5
        EXT4-fs (dm-4): This should not happen!! Data will be lost
      
      If delalloc is disabled, e2fsck -nf shows corruption
        Inode 53, i_size is 5497558142976, should be 4096.  Fix? no
      
      Fix the two issues by
      
      a) forcing all delayed allocation blocks to be allocated before checking
         eh->eh_depth and eh->eh_entries
      b) limiting the last logical block of the extent is within direct map
      Signed-off-by: default avatarEryu Guan <guaneryu@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ca842bc9
    • Lukas Czerner's avatar
      ext4: fix reservation release on invalidatepage for delalloc fs · bbc80299
      Lukas Czerner authored
      commit 9705acd6 upstream.
      
      On delalloc enabled file system on invalidatepage operation
      in ext4_da_page_release_reservation() we want to clear the delayed
      buffer and remove the extent covering the delayed buffer from the extent
      status tree.
      
      However currently there is a bug where on the systems with page size >
      block size we will always remove extents from the start of the page
      regardless where the actual delayed buffers are positioned in the page.
      This leads to the errors like this:
      
      EXT4-fs warning (device loop0): ext4_da_release_space:1225:
      ext4_da_release_space: ino 13, to_free 1 with only 0 reserved data
      blocks
      
      This however can cause data loss on writeback time if the file system is
      in ENOSPC condition because we're releasing reservation for someones
      else delayed buffer.
      
      Fix this by only removing extents that corresponds to the part of the
      page we want to invalidate.
      
      This problem is reproducible by the following fio receipt (however I was
      only able to reproduce it with fio-2.1 or older.
      
      [global]
      bs=8k
      iodepth=1024
      iodepth_batch=60
      randrepeat=1
      size=1m
      directory=/mnt/test
      numjobs=20
      [job1]
      ioengine=sync
      bs=1k
      direct=1
      rw=randread
      filename=file1:file2
      [job2]
      ioengine=libaio
      rw=randwrite
      direct=1
      filename=file1:file2
      [job3]
      bs=1k
      ioengine=posixaio
      rw=randwrite
      direct=1
      filename=file1:file2
      [job5]
      bs=1k
      ioengine=sync
      rw=randread
      filename=file1:file2
      [job7]
      ioengine=libaio
      rw=randwrite
      filename=file1:file2
      [job8]
      ioengine=posixaio
      rw=randwrite
      filename=file1:file2
      [job10]
      ioengine=mmap
      rw=randwrite
      bs=1k
      filename=file1:file2
      [job11]
      ioengine=mmap
      rw=randwrite
      direct=1
      filename=file1:file2
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      bbc80299
    • Nikolay Borisov's avatar
      ext4: avoid deadlocks in the writeback path by using sb_getblk_gfp · 1fc56e88
      Nikolay Borisov authored
      commit c45653c3 upstream.
      
      Switch ext4 to using sb_getblk_gfp with GFP_NOFS added to fix possible
      deadlocks in the page writeback path.
      Signed-off-by: default avatarNikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1fc56e88
    • Nikolay Borisov's avatar
      bufferhead: Add _gfp version for sb_getblk() · 52a0b6ce
      Nikolay Borisov authored
      commit bd7ade3c upstream.
      
      sb_getblk() is used during ext4 (and possibly other FSes) writeback
      paths. Sometimes such path require allocating memory and guaranteeing
      that such allocation won't block. Currently, however, there is no way
      to provide user flags for sb_getblk which could lead to deadlocks.
      
      This patch implements a sb_getblk_gfp with the only difference it can
      accept user-provided GFP flags.
      Signed-off-by: default avatarNikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      52a0b6ce
    • Filipe Manana's avatar
      Btrfs: fix fsync data loss after append write · 895ee048
      Filipe Manana authored
      commit e4545de5 upstream.
      
      If we do an append write to a file (which increases its inode's i_size)
      that does not have the flag BTRFS_INODE_NEEDS_FULL_SYNC set in its inode,
      and the previous transaction added a new hard link to the file, which sets
      the flag BTRFS_INODE_COPY_EVERYTHING in the file's inode, and then fsync
      the file, the inode's new i_size isn't logged. This has the consequence
      that after the fsync log is replayed, the file size remains what it was
      before the append write operation, which means users/applications will
      not be able to read the data that was successsfully fsync'ed before.
      
      This happens because neither the inode item nor the delayed inode get
      their i_size updated when the append write is made - doing so would
      require starting a transaction in the buffered write path, something that
      we do not do intentionally for performance reasons.
      
      Fix this by making sure that when the flag BTRFS_INODE_COPY_EVERYTHING is
      set the inode is logged with its current i_size (log the in-memory inode
      into the log tree).
      
      This issue is not a recent regression and is easy to reproduce with the
      following test case for fstests:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
      
        here=`pwd`
        tmp=/tmp/$$
        status=1	# failure is the default!
      
        _cleanup()
        {
                _cleanup_flakey
                rm -f $tmp.*
        }
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
        . ./common/dmflakey
      
        # real QA test starts here
        _supported_fs generic
        _supported_os Linux
        _need_to_be_root
        _require_scratch
        _require_dm_flakey
        _require_metadata_journaling $SCRATCH_DEV
      
        _crash_and_mount()
        {
                # Simulate a crash/power loss.
                _load_flakey_table $FLAKEY_DROP_WRITES
                _unmount_flakey
                # Allow writes again and mount. This makes the fs replay its fsync log.
                _load_flakey_table $FLAKEY_ALLOW_WRITES
                _mount_flakey
        }
      
        rm -f $seqres.full
      
        _scratch_mkfs >> $seqres.full 2>&1
        _init_flakey
        _mount_flakey
      
        # Create the test file with some initial data and then fsync it.
        # The fsync here is only needed to trigger the issue in btrfs, as it causes the
        # the flag BTRFS_INODE_NEEDS_FULL_SYNC to be removed from the btrfs inode.
        $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 32k" \
                        -c "fsync" \
                        $SCRATCH_MNT/foo | _filter_xfs_io
        sync
      
        # Add a hard link to our file.
        # On btrfs this sets the flag BTRFS_INODE_COPY_EVERYTHING on the btrfs inode,
        # which is a necessary condition to trigger the issue.
        ln $SCRATCH_MNT/foo $SCRATCH_MNT/bar
      
        # Sync the filesystem to force a commit of the current btrfs transaction, this
        # is a necessary condition to trigger the bug on btrfs.
        sync
      
        # Now append more data to our file, increasing its size, and fsync the file.
        # In btrfs because the inode flag BTRFS_INODE_COPY_EVERYTHING was set and the
        # write path did not update the inode item in the btree nor the delayed inode
        # item (in memory struture) in the current transaction (created by the fsync
        # handler), the fsync did not record the inode's new i_size in the fsync
        # log/journal. This made the data unavailable after the fsync log/journal is
        # replayed.
        $XFS_IO_PROG -c "pwrite -S 0xbb 32K 32K" \
                     -c "fsync" \
                     $SCRATCH_MNT/foo | _filter_xfs_io
      
        echo "File content after fsync and before crash:"
        od -t x1 $SCRATCH_MNT/foo
      
        _crash_and_mount
      
        echo "File content after crash and log replay:"
        od -t x1 $SCRATCH_MNT/foo
      
        status=0
        exit
      
      The expected file output before and after the crash/power failure expects the
      appended data to be available, which is:
      
        0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
        *
        0100000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
        *
        0200000
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      895ee048
    • Filipe Manana's avatar
      Btrfs: fix race between caching kthread and returning inode to inode cache · ff30e038
      Filipe Manana authored
      commit ae9d8f17 upstream.
      
      While the inode cache caching kthread is calling btrfs_unpin_free_ino(),
      we could have a concurrent call to btrfs_return_ino() that adds a new
      entry to the root's free space cache of pinned inodes. This concurrent
      call does not acquire the fs_info->commit_root_sem before adding a new
      entry if the caching state is BTRFS_CACHE_FINISHED, which is a problem
      because the caching kthread calls btrfs_unpin_free_ino() after setting
      the caching state to BTRFS_CACHE_FINISHED and therefore races with
      the task calling btrfs_return_ino(), which is adding a new entry, while
      the former (caching kthread) is navigating the cache's rbtree, removing
      and freeing nodes from the cache's rbtree without acquiring the spinlock
      that protects the rbtree.
      
      This race resulted in memory corruption due to double free of struct
      btrfs_free_space objects because both tasks can end up doing freeing the
      same objects. Note that adding a new entry can result in merging it with
      other entries in the cache, in which case those entries are freed.
      This is particularly important as btrfs_free_space structures are also
      used for the block group free space caches.
      
      This memory corruption can be detected by a debugging kernel, which
      reports it with the following trace:
      
      [132408.501148] slab error in verify_redzone_free(): cache `btrfs_free_space': double free detected
      [132408.505075] CPU: 15 PID: 12248 Comm: btrfs-ino-cache Tainted: G        W       4.1.0-rc5-btrfs-next-10+ #1
      [132408.505075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
      [132408.505075]  ffff880023e7d320 ffff880163d73cd8 ffffffff8145eec7 ffffffff81095dce
      [132408.505075]  ffff880009735d40 ffff880163d73ce8 ffffffff81154e1e ffff880163d73d68
      [132408.505075]  ffffffff81155733 ffffffffa054a95a ffff8801b6099f00 ffffffffa0505b5f
      [132408.505075] Call Trace:
      [132408.505075]  [<ffffffff8145eec7>] dump_stack+0x4f/0x7b
      [132408.505075]  [<ffffffff81095dce>] ? console_unlock+0x356/0x3a2
      [132408.505075]  [<ffffffff81154e1e>] __slab_error.isra.28+0x25/0x36
      [132408.505075]  [<ffffffff81155733>] __cache_free+0xe2/0x4b6
      [132408.505075]  [<ffffffffa054a95a>] ? __btrfs_add_free_space+0x2f0/0x343 [btrfs]
      [132408.505075]  [<ffffffffa0505b5f>] ? btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
      [132408.505075]  [<ffffffff810f3b30>] ? time_hardirqs_off+0x15/0x28
      [132408.505075]  [<ffffffff81084d42>] ? trace_hardirqs_off+0xd/0xf
      [132408.505075]  [<ffffffff811563a1>] ? kfree+0xb6/0x14e
      [132408.505075]  [<ffffffff811563d0>] kfree+0xe5/0x14e
      [132408.505075]  [<ffffffffa0505b5f>] btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
      [132408.505075]  [<ffffffffa0505e08>] caching_kthread+0x29e/0x2d9 [btrfs]
      [132408.505075]  [<ffffffffa0505b6a>] ? btrfs_unpin_free_ino+0x99/0x99 [btrfs]
      [132408.505075]  [<ffffffff8106698f>] kthread+0xef/0xf7
      [132408.505075]  [<ffffffff810f3b08>] ? time_hardirqs_on+0x15/0x28
      [132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
      [132408.505075]  [<ffffffff814653d2>] ret_from_fork+0x42/0x70
      [132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
      [132408.505075] ffff880023e7d320: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b.
      [132409.501654] slab: double free detected in cache 'btrfs_free_space', objp ffff880023e7d320
      [132409.503355] ------------[ cut here ]------------
      [132409.504241] kernel BUG at mm/slab.c:2571!
      
      Therefore fix this by having btrfs_unpin_free_ino() acquire the lock
      that protects the rbtree while doing the searches and removing entries.
      
      Fixes: 1c70d8fb ("Btrfs: fix inode caching vs tree log")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ff30e038
    • Filipe Manana's avatar
      Btrfs: use kmem_cache_free when freeing entry in inode cache · 4899b628
      Filipe Manana authored
      commit c3f4a168 upstream.
      
      The free space entries are allocated using kmem_cache_zalloc(),
      through __btrfs_add_free_space(), therefore we should use
      kmem_cache_free() and not kfree() to avoid any confusion and
      any potential problem. Looking at the kfree() definition at
      mm/slab.c it has the following comment:
      
        /*
         * (...)
         *
         * Don't free memory not originally allocated by kmalloc()
         * or you will run into trouble.
         */
      
      So better be safe and use kmem_cache_free().
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      4899b628
  3. 04 Aug, 2015 1 commit
  4. 30 Jul, 2015 9 commits
  5. 24 Jul, 2015 1 commit