1. 01 Aug, 2012 9 commits
    • Mel Gorman's avatar
      mm: reduce the amount of work done when updating min_free_kbytes · 71a07f4c
      Mel Gorman authored
      commit 938929f1 upstream.
      
      Stable note: Fixes https://bugzilla.novell.com/show_bug.cgi?id=726210 .
              Large machines with 1TB or more of RAM take a long time to boot
              without this patch and may spew out soft lockup warnings.
      
      When min_free_kbytes is updated, some pageblocks are marked
      MIGRATE_RESERVE.  Ordinarily, this work is unnoticable as it happens early
      in boot but on large machines with 1TB of memory, this has been reported
      to delay boot times, probably due to the NUMA distances involved.
      
      The bulk of the work is due to calling calling pageblock_is_reserved() an
      unnecessary amount of times and accessing far more struct page metadata
      than is necessary.  This patch significantly reduces the amount of work
      done by setup_zone_migrate_reserve() improving boot times on 1TB machines.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71a07f4c
    • Mel Gorman's avatar
      mm: memory hotplug: Check if pages are correctly reserved on a per-section basis · 1126e709
      Mel Gorman authored
      commit 2bbcb878 upstream.
      
      Stable note: Fixes https://bugzilla.novell.com/show_bug.cgi?id=721039 .
              Without the patch, memory hot-add can fail for kernel configurations
              that do not set CONFIG_SPARSEMEM_VMEMMAP.
      
      (Resending as I am not seeing it in -next so maybe it got lost)
      
      mm: memory hotplug: Check if pages are correctly reserved on a per-section basis
      
      It is expected that memory being brought online is PageReserved
      similar to what happens when the page allocator is being brought up.
      Memory is onlined in "memory blocks" which consist of one or more
      sections. Unfortunately, the code that verifies PageReserved is
      currently assuming that the memmap backing all these pages is virtually
      contiguous which is only the case when CONFIG_SPARSEMEM_VMEMMAP is set.
      As a result, memory hot-add is failing on those configurations with
      the message;
      
      kernel: section number XXX page number 256 not reserved, was it already online?
      
      This patch updates the PageReserved check to lookup struct page once
      per section to guarantee the correct struct page is being checked.
      
      [Check pages within sections properly: rientjes@google.com]
      [original patch by: nfont@linux.vnet.ibm.com]
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Tested-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      1126e709
    • Dimitri Sivanich's avatar
      mm/vmstat.c: cache align vm_stat · 9116bc4f
      Dimitri Sivanich authored
      commit a1cb2c60 upstream.
      
      Stable note: Not tracked on Bugzilla. This patch is known to make a big
              difference to tmpfs performance on larger machines.
      
      This was found to adversely affect tmpfs I/O performance.
      
      Tests run on a 640 cpu UV system.
      
      With 120 threads doing parallel writes, each to different tmpfs mounts:
      No patch:		~300 MB/sec
      With vm_stat alignment:	~430 MB/sec
      Signed-off-by: default avatarDimitri Sivanich <sivanich@sgi.com>
      Acked-by: default avatarChristoph Lameter <cl@gentwo.org>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      9116bc4f
    • Mikulas Patocka's avatar
      dm raid1: fix crash with mirror recovery and discard · fbb41f55
      Mikulas Patocka authored
      commit 751f188d upstream.
      
      This patch fixes a crash when a discard request is sent during mirror
      recovery.
      
      Firstly, some background.  Generally, the following sequence happens during
      mirror synchronization:
      - function do_recovery is called
      - do_recovery calls dm_rh_recovery_prepare
      - dm_rh_recovery_prepare uses a semaphore to limit the number
        simultaneously recovered regions (by default the semaphore value is 1,
        so only one region at a time is recovered)
      - dm_rh_recovery_prepare calls __rh_recovery_prepare,
        __rh_recovery_prepare asks the log driver for the next region to
        recover. Then, it sets the region state to DM_RH_RECOVERING. If there
        are no pending I/Os on this region, the region is added to
        quiesced_regions list. If there are pending I/Os, the region is not
        added to any list. It is added to the quiesced_regions list later (by
        dm_rh_dec function) when all I/Os finish.
      - when the region is on quiesced_regions list, there are no I/Os in
        flight on this region. The region is popped from the list in
        dm_rh_recovery_start function. Then, a kcopyd job is started in the
        recover function.
      - when the kcopyd job finishes, recovery_complete is called. It calls
        dm_rh_recovery_end. dm_rh_recovery_end adds the region to
        recovered_regions or failed_recovered_regions list (depending on
        whether the copy operation was successful or not).
      
      The above mechanism assumes that if the region is in DM_RH_RECOVERING
      state, no new I/Os are started on this region. When I/O is started,
      dm_rh_inc_pending is called, which increases reg->pending count. When
      I/O is finished, dm_rh_dec is called. It decreases reg->pending count.
      If the count is zero and the region was in DM_RH_RECOVERING state,
      dm_rh_dec adds it to the quiesced_regions list.
      
      Consequently, if we call dm_rh_inc_pending/dm_rh_dec while the region is
      in DM_RH_RECOVERING state, it could be added to quiesced_regions list
      multiple times or it could be added to this list when kcopyd is copying
      data (it is assumed that the region is not on any list while kcopyd does
      its jobs). This results in memory corruption and crash.
      
      There already exist bypasses for REQ_FLUSH requests: REQ_FLUSH requests
      do not belong to any region, so they are always added to the sync list
      in do_writes. dm_rh_inc_pending does not increase count for REQ_FLUSH
      requests. In mirror_end_io, dm_rh_dec is never called for REQ_FLUSH
      requests. These bypasses avoid the crash possibility described above.
      
      These bypasses were improperly implemented for REQ_DISCARD when
      the mirror target gained discard support in commit
      5fc2ffea (dm raid1: support discard).
      
      In do_writes, REQ_DISCARD requests is always added to the sync queue and
      immediately dispatched (even if the region is in DM_RH_RECOVERING).  However,
      dm_rh_inc and dm_rh_dec is called for REQ_DISCARD resusts.  So it violates the
      rule that no I/Os are started on DM_RH_RECOVERING regions, and causes the list
      corruption described above.
      
      This patch changes it so that REQ_DISCARD requests follow the same path
      as REQ_FLUSH. This avoids the crash.
      
      Reference: https://bugzilla.redhat.com/837607Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fbb41f55
    • Artem Bityutskiy's avatar
      UBIFS: fix a bug in empty space fix-up · cd050f56
      Artem Bityutskiy authored
      commit c6727932 upstream.
      
      UBIFS has a feature called "empty space fix-up" which is a quirk to work-around
      limitations of dumb flasher programs. Namely, of those flashers that are unable
      to skip NAND pages full of 0xFFs while flashing, resulting in empty space at
      the end of half-filled eraseblocks to be unusable for UBIFS. This feature is
      relatively new (introduced in v3.0).
      
      The fix-up routine (fixup_free_space()) is executed only once at the very first
      mount if the superblock has the 'space_fixup' flag set (can be done with -F
      option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and
      writes it back to the same LEB. The routine assumes the image is pristine and
      does not have anything in the journal.
      
      There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly.
      All but one LEB of the log of a pristine file-system are empty. And one
      contains just a commit start node. And 'fixup_free_space()' just unmapped this
      LEB, which resulted in wiping the commit start node. As a result, some users
      were unable to mount the file-system next time with the following symptom:
      
      UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node
      UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log at LEB 3:0
      
      The root-cause of this bug was that 'fixup_free_space()' wrongly assumed
      that the beginning of empty space in the log head (c->lhead_offs) was known
      on mount. However, it is not the case - it was always 0. UBIFS does not store
      in it the master node and finds out by scanning the log on every mount.
      
      The fix is simple - just pass commit start node size instead of 0 to
      'fixup_leb()'.
      Signed-off-by: default avatarArtem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
      Reported-by: default avatarIwo Mergler <Iwo.Mergler@netcommwireless.com>
      Tested-by: default avatarIwo Mergler <Iwo.Mergler@netcommwireless.com>
      Reported-by: default avatarJames Nute <newten82@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd050f56
    • David Daney's avatar
      MIPS: Properly align the .data..init_task section. · 689415c1
      David Daney authored
      commit 7b1c0d26 upstream.
      
      Improper alignment can lead to unbootable systems and/or random
      crashes.
      
      [ralf@linux-mips.org: This is a lond standing bug since
      6eb10bc9 (kernel.org) rsp.
      c422a10917f75fd19fa7fe070aaaa23e384dae6f (lmo) [MIPS: Clean up linker script
      using new linker script macros.] so dates back to 2.6.32.]
      Signed-off-by: default avatarDavid Daney <david.daney@cavium.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/3881/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      689415c1
    • Aaditya Kumar's avatar
      mm: fix lost kswapd wakeup in kswapd_stop() · 6d40de83
      Aaditya Kumar authored
      commit 1c7e7f6c upstream.
      
      Offlining memory may block forever, waiting for kswapd() to wake up
      because kswapd() does not check the event kthread->should_stop before
      sleeping.
      
      The proper pattern, from Documentation/memory-barriers.txt, is:
      
         ---  waker  ---
         event_indicated = 1;
         wake_up_process(event_daemon);
      
         ---  sleeper  ---
         for (;;) {
            set_current_state(TASK_UNINTERRUPTIBLE);
            if (event_indicated)
               break;
            schedule();
         }
      
         set_current_state() may be wrapped by:
            prepare_to_wait();
      
      In the kswapd() case, event_indicated is kthread->should_stop.
      
        === offlining memory (waker) ===
         kswapd_stop()
            kthread_stop()
               kthread->should_stop = 1
               wake_up_process()
               wait_for_completion()
      
        ===  kswapd_try_to_sleep (sleeper) ===
         kswapd_try_to_sleep()
            prepare_to_wait()
                 .
                 .
            schedule()
                 .
                 .
            finish_wait()
      
      The schedule() needs to be protected by a test of kthread->should_stop,
      which is wrapped by kthread_should_stop().
      
      Reproducer:
         Do heavy file I/O in background.
         Do a memory offline/online in a tight loop
      Signed-off-by: default avatarAaditya Kumar <aaditya.kumar@ap.sony.com>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d40de83
    • John Stultz's avatar
      ntp: Fix STA_INS/DEL clearing bug · dccecc64
      John Stultz authored
      commit 6b1859db upstream.
      
      In commit 6b43ae8a, I
      introduced a bug that kept the STA_INS or STA_DEL bit
      from being cleared from time_status via adjtimex()
      without forcing STA_PLL first.
      
      Usually once the STA_INS is set, it isn't cleared
      until the leap second is applied, so its unlikely this
      affected anyone. However during testing I noticed it
      took some effort to cancel a leap second once STA_INS
      was set.
      Signed-off-by: default avatarJohn Stultz <johnstul@us.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Link: http://lkml.kernel.org/r/1342156917-25092-2-git-send-email-john.stultz@linaro.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dccecc64
    • Jeff Layton's avatar
      cifs: always update the inode cache with the results from a FIND_* · adccea44
      Jeff Layton authored
      commit cd60042c upstream.
      
      When we get back a FIND_FIRST/NEXT result, we have some info about the
      dentry that we use to instantiate a new inode. We were ignoring and
      discarding that info when we had an existing dentry in the cache.
      
      Fix this by updating the inode in place when we find an existing dentry
      and the uniqueid is the same.
      Reported-and-Tested-by: default avatarAndrew Bartlett <abartlet@samba.org>
      Reported-by: default avatarBill Robertson <bill_robertson@debortoli.com.au>
      Reported-by: default avatarDion Edwards <dion_edwards@debortoli.com.au>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      adccea44
  2. 19 Jul, 2012 24 commits
  3. 16 Jul, 2012 7 commits