1. 17 Jan, 2014 3 commits
    • Viresh Kumar's avatar
      cpufreq: stats: remove hotplug notifiers · 027cc2e4
      Viresh Kumar authored
      Either CPUs are hot-unplugged or suspend/resume occurs, cpufreq core
      will send notifications to cpufreq-stats and stats structure and sysfs
      entries would be correctly handled..
      
      And so we don't actually need hotcpu notifiers in cpufreq-stats anymore.
      We were only handling cpu hot-unplug events here and that are already
      taken care of by POLICY notifiers.
      Acked-by: default avatarNicolas Pitre <nico@linaro.org>
      Tested-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      027cc2e4
    • Viresh Kumar's avatar
      cpufreq: stats: handle cpufreq_unregister_driver() and suspend/resume properly · fcd7af91
      Viresh Kumar authored
      There are several problems with cpufreq stats in the way it handles
      cpufreq_unregister_driver() and suspend/resume..
      
       - We must not lose data collected so far when suspend/resume happens
         and so stats directories must not be removed/allocated during these
         operations, which is done currently.
      
       - cpufreq_stat has registered notifiers with both cpufreq and hotplug.
         It adds sysfs stats directory with a cpufreq notifier: CPUFREQ_NOTIFY
         and removes this directory with a notifier from hotplug core.
      
         In case cpufreq_unregister_driver() is called (on rmmod cpufreq driver),
         stats directories per cpu aren't removed as CPUs are still online. The
         only call cpufreq_stats gets is cpufreq_stats_update_policy_cpu() for
         all CPUs except the last of each policy. And pointer to stat information
         is stored in the entry for last CPU in the per-cpu cpufreq_stats_table.
         But policy structure would be freed inside cpufreq core and so that will
         result in memory leak inside cpufreq stats (as we are never freeing
         memory for stats).
      
         Now if we again insert the module cpufreq_register_driver() will be
         called and we will again allocate stats data and put it on for first
         CPU of every policy.  In case we only have a single CPU per policy, we
         will return with a error from cpufreq_stats_create_table() due to this
         code:
      
      	if (per_cpu(cpufreq_stats_table, cpu))
      		return -EBUSY;
      
         And so probably cpufreq stats directory would not show up anymore (as
         it was added inside last policies->kobj which doesn't exist anymore).
         I haven't tested it, though. Also the values in stats files wouldn't
         be refreshed as we are using the earlier stats structure.
      
       - CPUFREQ_NOTIFY is called from cpufreq_set_policy() which is called for
         scenarios where we don't really want cpufreq_stat_notifier_policy() to get
         called. For example whenever we are changing anything related to a policy:
         min/max/current freq, etc. cpufreq_set_policy() is called and so cpufreq
         stats is notified. Where we don't do any useful stuff other than simply
         returning with -EBUSY from cpufreq_stats_create_table(). And so this
         isn't the right notifier that cpufreq stats..
      
       Due to all above reasons this patch does following changes:
       - Add new notifiers CPUFREQ_CREATE_POLICY and CPUFREQ_REMOVE_POLICY,
         which are only called when policy is created/destroyed. They aren't
         called for suspend/resume paths..
       - Use these notifiers in cpufreq_stat_notifier_policy() to create/destory
         stats sysfs entries. And so cpufreq_unregister_driver() or suspend/resume
         shouldn't be a problem for cpufreq_stats.
       - Return early from cpufreq_stat_cpu_callback() for suspend/resume sequence,
         so that we don't free stats structure.
      Acked-by: default avatarNicolas Pitre <nico@linaro.org>
      Tested-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      fcd7af91
    • Paul Bolle's avatar
      cpufreq: speedstep: remove unused speedstep_get_state · 5650cef2
      Paul Bolle authored
      The only caller of speedstep_get_state() was removed in commit d4019f0a
      ("cpufreq: move freq change notifications to cpufreq core"). So building
      speedstep-smi.o now triggers a GCC warning:
          drivers/cpufreq/speedstep-smi.c:148:12: warning: 'speedstep_get_state' defined but not used [-Wunused-function]
      
      Remove this unused function.
      Signed-off-by: default avatarPaul Bolle <pebolle@tiscali.nl>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5650cef2
  2. 14 Jan, 2014 1 commit
  3. 06 Jan, 2014 20 commits
  4. 05 Jan, 2014 1 commit
  5. 04 Jan, 2014 1 commit
  6. 03 Jan, 2014 2 commits
    • Linus Torvalds's avatar
      Merge tag 'for-v3.13-fixes' of git://git.infradead.org/battery-2.6 · 9a2f1aad
      Linus Torvalds authored
      Pull battery fixes from Anton Vorontsov:
       "Two fixes:
      
         - fix build error caused by max17042_battery conversion to the regmap
           API.
      
         - fix kernel oops when booting with wakeup_source_activate enabled"
      
      * tag 'for-v3.13-fixes' of git://git.infradead.org/battery-2.6:
        max17042_battery: Fix build errors caused by missing REGMAP_I2C config
        power_supply: Fix Oops from NULL pointer dereference from wakeup_source_activate
      9a2f1aad
    • Linus Torvalds's avatar
      Merge tag 'pm+acpi-3.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 23e8e590
      Linus Torvalds authored
      Pull ACPI and PM fixes and new device IDs from Rafael Wysocki:
       "These commits, except for one, are regression fixes and the remaining
        one fixes a divide error leading to a kernel panic.  The majority of
        the regressions fixed here were introduced during the 3.12 cycle, one
        of them is from this cycle and one is older.
      
        Specifics:
      
         - VGA switcheroo was broken for some users as a result of the
           ACPI-based PCI hotplug (ACPIPHP) changes in 3.12, because some
           previously ignored hotplug events started to be handled.  The fix
           causes them to be ignored again.
      
         - There are two more issues related to cpufreq's suspend/resume
           handling changes from the 3.12 cycle addressed by Viresh Kumar's
           fixes.
      
         - intel_pstate triggers a divide error in a timer function if the
           P-state information it needs is missing during initialization.
           This leads to kernel panics on nested KVM clients and is fixed by
           failing the initialization cleanly in those cases.
      
         - PCI initalization code changes during the 3.9 cycle uncovered BIOS
           issues related to ACPI wakeup notifications (some BIOSes send them
           for devices that aren't supposed to support ACPI wakeup).  Work
           around them by installing an ACPI wakeup notify handler for all PCI
           devices with ACPI support.
      
         - The Calxeda cpuilde driver's probe function is tagged as __init,
           which is incorrect and causes a section mismatch to occur during
           build.  Fix from Andre Przywara removes the __init tag from there.
      
         - During the 3.12 cycle ACPIPHP started to print warnings about
           missing _ADR for devices that legitimately don't have it.  Fix from
           Toshi Kani makes it only print the warnings where they make sense"
      
      * tag 'pm+acpi-3.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPIPHP / radeon / nouveau: Fix VGA switcheroo problem related to hotplug
        intel_pstate: Fail initialization if P-state information is missing
        ARM/cpuidle: remove __init tag from Calxeda cpuidle probe function
        PCI / ACPI: Install wakeup notify handlers for all PCI devs with ACPI
        cpufreq: preserve user_policy across suspend/resume
        cpufreq: Clean up after a failing light-weight initialization
        ACPI / PCI / hotplug: Avoid warning when _ADR not present
      23e8e590
  7. 02 Jan, 2014 12 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/virt/kvm/kvm · 7a262d2e
      Linus Torvalds authored
      Pull kvm bugfixes from Marcelo Tosatti.
      
      * git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: nVMX: Unconditionally uninit the MMU on nested vmexit
        KVM: x86: Fix APIC map calculation after re-enabling
      7a262d2e
    • Linus Torvalds's avatar
      Merge branch 'akpm' (incoming from Andrew) · 06f055f3
      Linus Torvalds authored
      Merge patches from Andrew Morton:
       "Ten fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        epoll: do not take the nested ep->mtx on EPOLL_CTL_DEL
        sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c
        drivers/dma/ioat/dma.c: check DMA mapping error in ioat_dma_self_test()
        mm/memory-failure.c: transfer page count from head page to tail page after split thp
        MAINTAINERS: set up proper record for Xilinx Zynq
        mm: remove bogus warning in copy_huge_pmd()
        memcg: fix memcg_size() calculation
        mm: fix use-after-free in sys_remap_file_pages
        mm: munlock: fix deadlock in __munlock_pagevec()
        mm: munlock: fix a bug where THP tail page is encountered
      06f055f3
    • Jason Baron's avatar
      epoll: do not take the nested ep->mtx on EPOLL_CTL_DEL · 4ff36ee9
      Jason Baron authored
      The EPOLL_CTL_DEL path of epoll contains a classic, ab-ba deadlock.
      That is, epoll_ctl(a, EPOLL_CTL_DEL, b, x), will deadlock with
      epoll_ctl(b, EPOLL_CTL_DEL, a, x).  The deadlock was introduced with
      commmit 67347fe4 ("epoll: do not take global 'epmutex' for simple
      topologies").
      
      The acquistion of the ep->mtx for the destination 'ep' was added such
      that a concurrent EPOLL_CTL_ADD operation would see the correct state of
      the ep (Specifically, the check for '!list_empty(&f.file->f_ep_links')
      
      However, by simply not acquiring the lock, we do not serialize behind
      the ep->mtx from the add path, and thus may perform a full path check
      when if we had waited a little longer it may not have been necessary.
      However, this is a transient state, and performing the full loop
      checking in this case is not harmful.
      
      The important point is that we wouldn't miss doing the full loop
      checking when required, since EPOLL_CTL_ADD always locks any 'ep's that
      its operating upon.  The reason we don't need to do lock ordering in the
      add path, is that we are already are holding the global 'epmutex'
      whenever we do the double lock.  Further, the original posting of this
      patch, which was tested for the intended performance gains, did not
      perform this additional locking.
      Signed-off-by: default avatarJason Baron <jbaron@akamai.com>
      Cc: Nathan Zimmer <nzimmer@sgi.com>
      Cc: Eric Wong <normalperson@yhbt.net>
      Cc: Nelson Elhage <nelhage@nelhage.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4ff36ee9
    • Nobuhiro Iwamatsu's avatar
      sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c · ad70b029
      Nobuhiro Iwamatsu authored
      Min_low_pfn and max_low_pfn were used in pfn_valid macro if defined
      CONFIG_FLATMEM.  When the functions that use the pfn_valid is used in
      driver module, max_low_pfn and min_low_pfn is to undefined, and fail to
      build.
      
        ERROR: "min_low_pfn" [drivers/block/aoe/aoe.ko] undefined!
        ERROR: "max_low_pfn" [drivers/block/aoe/aoe.ko] undefined!
        make[2]: *** [__modpost] Error 1
        make[1]: *** [modules] Error 2
      
      This patch fix this problem.
      Signed-off-by: default avatarNobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
      Cc: Kuninori Morimoto <kuninori.morimoto.gx@gmail.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ad70b029
    • Jiang Liu's avatar
      drivers/dma/ioat/dma.c: check DMA mapping error in ioat_dma_self_test() · 3532e566
      Jiang Liu authored
      Check DMA mapping return values in function ioat_dma_self_test() to get
      rid of following warning message.
      
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 1203 at lib/dma-debug.c:937 check_unmap+0x4c0/0x9a0()
        ioatdma 0000:00:04.0: DMA-API: device driver failed to check map error[device address=0x000000085191b000] [size=2000 bytes] [mapped as single]
        Modules linked in: ioatdma(+) mac_hid wmi acpi_pad lp parport hidd_generic usbhid hid ixgbe isci dca libsas ahci ptp libahci scsi_transport_sas meegaraid_sas pps_core mdio
        CPU: 0 PID: 1203 Comm: systemd-udevd Not tainted 3.13.0-rc4+ #8
        Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRIVTIIN1.86B.0044.L09.1311181644 11/18/2013
        Call Trace:
          dump_stack+0x4d/0x66
          warn_slowpath_common+0x7d/0xa0
          warn_slowpath_fmt+0x4c/0x50
          check_unmap+0x4c0/0x9a0
          debug_dma_unmap_page+0x81/0x90
          ioat_dma_self_test+0x3d2/0x680 [ioatdma]
          ioat3_dma_self_test+0x12/0x30 [ioatdma]
          ioat_probe+0xf4/0x110 [ioatdma]
          ioat3_dma_probe+0x268/0x410 [ioatdma]
          ioat_pci_probe+0x122/0x1b0 [ioatdma]
          local_pci_probe+0x45/0xa0
          pci_device_probe+0xd9/0x130
          driver_probe_device+0x171/0x490
          __driver_attach+0x93/0xa0
          bus_for_each_dev+0x6b/0xb0
          driver_attach+0x1e/0x20
          bus_add_driver+0x1f8/0x2b0
          driver_register+0x81/0x110
          __pci_register_driver+0x60/0x70
          ioat_init_module+0x89/0x1000 [ioatdma]
          do_one_initcall+0xe2/0x250
          load_module+0x2313/0x2a00
          SyS_init_module+0xd9/0x130
          system_call_fastpath+0x1a/0x1f
        ---[ end trace 990c591681d27c31 ]---
        Mapped at:
          debug_dma_map_page+0xbe/0x180
          ioat_dma_self_test+0x1ab/0x680 [ioatdma]
          ioat3_dma_self_test+0x12/0x30 [ioatdma]
          ioat_probe+0xf4/0x110 [ioatdma]
          ioat3_dma_probe+0x268/0x410 [ioatdma]
      Signed-off-by: default avatarJiang Liu <jiang.liu@linux.intel.com>
      Cc: Vinod Koul <vinod.koul@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3532e566
    • Naoya Horiguchi's avatar
      mm/memory-failure.c: transfer page count from head page to tail page after split thp · a3e0f9e4
      Naoya Horiguchi authored
      Memory failures on thp tail pages cause kernel panic like below:
      
         mce: [Hardware Error]: Machine check events logged
         MCE exception done on CPU 7
         BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
         IP: [<ffffffff811b7cd1>] dequeue_hwpoisoned_huge_page+0x131/0x1e0
         PGD bae42067 PUD ba47d067 PMD 0
         Oops: 0000 [#1] SMP
        ...
         CPU: 7 PID: 128 Comm: kworker/7:2 Tainted: G   M       O 3.13.0-rc4-131217-1558-00003-g83b7df08e462 #25
        ...
         Call Trace:
           me_huge_page+0x3e/0x50
           memory_failure+0x4bb/0xc20
           mce_process_work+0x3e/0x70
           process_one_work+0x171/0x420
           worker_thread+0x11b/0x3a0
           ? manage_workers.isra.25+0x2b0/0x2b0
           kthread+0xe4/0x100
           ? kthread_create_on_node+0x190/0x190
           ret_from_fork+0x7c/0xb0
           ? kthread_create_on_node+0x190/0x190
        ...
         RIP   dequeue_hwpoisoned_huge_page+0x131/0x1e0
         CR2: 0000000000000058
      
      The reasoning of this problem is shown below:
       - when we have a memory error on a thp tail page, the memory error
         handler grabs a refcount of the head page to keep the thp under us.
       - Before unmapping the error page from processes, we split the thp,
         where page refcounts of both of head/tail pages don't change.
       - Then we call try_to_unmap() over the error page (which was a tail
         page before). We didn't pin the error page to handle the memory error,
         this error page is freed and removed from LRU list.
       - We never have the error page on LRU list, so the first page state
         check returns "unknown page," then we move to the second check
         with the saved page flag.
       - The saved page flag have PG_tail set, so the second page state check
         returns "hugepage."
       - We call me_huge_page() for freed error page, then we hit the above panic.
      
      The root cause is that we didn't move refcount from the head page to the
      tail page after split thp.  So this patch suggests to do this.
      
      This panic was introduced by commit 524fca1e ("HWPOISON: fix
      misjudgement of page_action() for errors on mlocked pages").  Note that we
      did have the same refcount problem before this commit, but it was just
      ignored because we had only first page state check which returned "unknown
      page." The commit changed the refcount problem from "doesn't work" to
      "kernel panic."
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: default avatarWanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: <stable@vger.kernel.org>	[3.9+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a3e0f9e4
    • Michal Simek's avatar
      MAINTAINERS: set up proper record for Xilinx Zynq · c2fd4e38
      Michal Simek authored
      Setup correct zynq entry.
       - Add missing cadence_ttc_timer maintainership
       - Add zynq wildcard
       - Add xilinx wildcard
      Signed-off-by: default avatarMichal Simek <michal.simek@xilinx.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c2fd4e38
    • Mel Gorman's avatar
      mm: remove bogus warning in copy_huge_pmd() · d0319bd5
      Mel Gorman authored
      Sasha Levin reported the following warning being triggered
      
        WARNING: CPU: 28 PID: 35287 at mm/huge_memory.c:887 copy_huge_pmd+0x145/ 0x3a0()
        Call Trace:
          copy_huge_pmd+0x145/0x3a0
          copy_page_range+0x3f2/0x560
          dup_mmap+0x2c9/0x3d0
          dup_mm+0xad/0x150
          copy_process+0xa68/0x12e0
          do_fork+0x96/0x270
          SyS_clone+0x16/0x20
          stub_clone+0x69/0x90
      
      This warning was introduced by "mm: numa: Avoid unnecessary disruption
      of NUMA hinting during migration" for paranoia reasons but the warning
      is bogus.  I was thinking of parallel races between NUMA hinting faults
      and forks but this warning would also be triggered by a parallel reclaim
      splitting a THP during a fork.  Remote the bogus warning.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d0319bd5
    • Vladimir Davydov's avatar
      memcg: fix memcg_size() calculation · 695c6083
      Vladimir Davydov authored
      The mem_cgroup structure contains nr_node_ids pointers to
      mem_cgroup_per_node objects, not the objects themselves.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@openvz.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      695c6083
    • Rik van Riel's avatar
      mm: fix use-after-free in sys_remap_file_pages · 4eb91982
      Rik van Riel authored
      remap_file_pages calls mmap_region, which may merge the VMA with other
      existing VMAs, and free "vma".  This can lead to a use-after-free bug.
      Avoid the bug by remembering vm_flags before calling mmap_region, and
      not trying to dereference vma later.
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: PaX Team <pageexec@freemail.hu>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4eb91982
    • Vlastimil Babka's avatar
      mm: munlock: fix deadlock in __munlock_pagevec() · 3b25df93
      Vlastimil Babka authored
      Commit 7225522b ("mm: munlock: batch non-THP page isolation and
      munlock+putback using pagevec" introduced __munlock_pagevec() to speed
      up munlock by holding lru_lock over multiple isolated pages.  Pages that
      fail to be isolated are put_page()d immediately, also within the lock.
      
      This can lead to deadlock when __munlock_pagevec() becomes the holder of
      the last page pin and put_page() leads to __page_cache_release() which
      also locks lru_lock.  The deadlock has been observed by Sasha Levin
      using trinity.
      
      This patch avoids the deadlock by deferring put_page() operations until
      lru_lock is released.  Another pagevec (which is also used by later
      phases of the function is reused to gather the pages for put_page()
      operation.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3b25df93
    • Vlastimil Babka's avatar
      mm: munlock: fix a bug where THP tail page is encountered · c424be1c
      Vlastimil Babka authored
      Since commit ff6a6da6 ("mm: accelerate munlock() treatment of THP
      pages") munlock skips tail pages of a munlocked THP page.  However, when
      the head page already has PageMlocked unset, it will not skip the tail
      pages.
      
      Commit 7225522b ("mm: munlock: batch non-THP page isolation and
      munlock+putback using pagevec") has added a PageTransHuge() check which
      contains VM_BUG_ON(PageTail(page)).  Sasha Levin found this triggered
      using trinity, on the first tail page of a THP page without PageMlocked
      flag.
      
      This patch fixes the issue by skipping tail pages also in the case when
      PageMlocked flag is unset.  There is still a possibility of race with
      THP page split between clearing PageMlocked and determining how many
      pages to skip.  The race might result in former tail pages not being
      skipped, which is however no longer a bug, as during the skip the
      PageTail flags are cleared.
      
      However this race also affects correctness of NR_MLOCK accounting, which
      is to be fixed in a separate patch.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c424be1c