1. 29 Mar, 2012 1 commit
    • Liu, Chuansheng's avatar
      x86: Preserve lazy irq disable semantics in fixup_irqs() · 99dd5497
      Liu, Chuansheng authored
      The default irq_disable() sematics are to mark the interrupt disabled,
      but keep it unmasked. If the interrupt is delivered while marked
      disabled, the low level interrupt handler masks it and marks it
      pending. This is important for detecting wakeup interrupts during
      suspend and for edge type interrupts to avoid losing interrupts.
      
      fixup_irqs() moves the interrupts away from an offlined cpu. For
      certain interrupt types it needs to mask the interrupt line before
      changing the affinity. After affinity has changed the interrupt line
      is unmasked again, but only if it is not marked disabled.
      
      This breaks the lazy irq disable semantics and causes problems in
      suspend as the interrupt can be lost or wakeup functionality is
      broken.
      
      Check irqd_irq_masked() instead of irqd_irq_disabled() because
      irqd_irq_masked() is only set, when the core code actually masked the
      interrupt line. If it's not set, we unmask the interrupt and let the
      lazy irq disable logic deal with an eventually incoming interrupt.
      
      [ tglx: Massaged changelog and added a comment ]
      Signed-off-by: default avatarliu chuansheng <chuansheng.liu@intel.com>
      Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
      Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A05DFB3@SHSMSX101.ccr.corp.intel.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      99dd5497
  2. 28 Mar, 2012 3 commits
  3. 27 Mar, 2012 1 commit
  4. 26 Mar, 2012 1 commit
  5. 24 Mar, 2012 2 commits
  6. 23 Mar, 2012 5 commits
  7. 22 Mar, 2012 27 commits
    • Dmitry Adamushko's avatar
      x86-32: Fix endless loop when processing signals for kernel tasks · 29a2e283
      Dmitry Adamushko authored
      The problem occurs on !CONFIG_VM86 kernels [1] when a kernel-mode task
      returns from a system call with a pending signal.
      
      A real-life scenario is a child of 'khelper' returning from a failed
      kernel_execve() in ____call_usermodehelper() [ kernel/kmod.c ].
      kernel_execve() fails due to a pending SIGKILL, which is the result of
      "kill -9 -1" (at least, busybox's init does it upon reboot).
      
      The loop is as follows:
      
      * syscall_exit_work:
       - work_pending:            // start_of_the_loop
       - work_notify_sig:
         - do_notify_resume()
           - do_signal()
             - if (!user_mode(regs)) return;
       - resume_userspace         // TIF_SIGPENDING is still set
       - work_pending             // so we call work_pending => goto
                                  // start_of_the_loop
      
      More information can be found in another LKML thread:
      http://www.serverphorums.com/read.php?12,457826
      
      [1] the problem was also seen on MIPS.
      Signed-off-by: default avatarDmitry Adamushko <dmitry.adamushko@gmail.com>
      Link: http://lkml.kernel.org/r/1332448765.2299.68.camel@dimm
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      29a2e283
    • H. Peter Anvin's avatar
      x86, boot: Correct CFLAGS for hostprogs · 446e1c86
      H. Peter Anvin authored
      This is a partial revert of commit:
          d40f8336 "Restrict CFLAGS for hostprogs"
      
      The endian-manipulation macros in tools/include need <linux/types.h>,
      but the hostprogs in arch/x86/boot need several headers from the
      kernel build tree, which means we have to add the kernel headers to
      the include path.  This picks up <linux/types.h> from the kernel tree,
      which gives a warning.
      
      Since this use of <linux/types.h> is intentional, add
      -D__EXPORTED_HEADERS__ to the command line to silence the warning.
      
      A better way to fix this would be to always install the exported
      kernel headers into $(objtree)/usr/include as a standard part of the
      kernel build, but that is a lot more involved.
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarMatt Fleming <matt.fleming@intel.com>
      Link: http://lkml.kernel.org/r/1330436245-24875-5-git-send-email-matt@console-pimps.orgSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      446e1c86
    • Thierry Reding's avatar
      x86-32: Fix typo for mq_getsetattr in syscall table · 13354dc4
      Thierry Reding authored
      Syscall 282 was mistakenly named mq_getsetaddr instead of mq_getsetattr.
      When building uClibc against the Linux kernel this would result in a
      shared library that doesn't provide the mq_getattr() and mq_setattr()
      functions.
      Signed-off-by: default avatarThierry Reding <thierry.reding@avionic-design.de>
      Link: http://lkml.kernel.org/r/1332366608-2695-2-git-send-email-thierry.reding@avionic-design.de
      Cc: <stable@vger.kernel.org> v3.3
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      13354dc4
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 28f23d1f
      Linus Torvalds authored
      Pull x86 "urgent" leftovers from Ingo Molnar:
       "Pending x86/urgent bits that were not high prio enough to warrant
        -rc-less v3.3-final inclusion."
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86, efi: Fix pointer math issue in handle_ramdisks()
        x86/ioapic: Add register level checks to detect bogus io-apic entries
        x86, mce: Fix rcu splat in drain_mce_log_buffer()
        x86, memblock: Move mem_hole_size() to .init
      28f23d1f
    • Linus Torvalds's avatar
      Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 23904815
      Linus Torvalds authored
      Pull x86 platform changes from Ingo Molnar.
      
      Removes the Moorestown platform that nobody ever used.
      
      * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/platform: Move APIC ID validity check into platform APIC code
        x86/olpc/xo15/sci: Enable lid close wakeup control
        x86/geode/net5501: Add platform driver for Soekris Engineering net5501
        x86/geode/alix2: Supplement driver to include GPIO button support
        x86/mid/powerbtn: Use MSIC read/write instead of ipc_scu
        x86/mid/thermal: Turn off thermistor
        x86/mid/thermal: Add msic_thermal alias
        x86/mid/thermal: Convert to use Intel MSIC API
        x86/mid/scu_ipc: Remove Moorestown support
        x86/mid: Kill off Moorestown
        x86/mrst: Add msic_thermal platform support
        x86/config: Select MSIC MFD driver on Intel Medfield platform
        x86/mid: Remove Intel Moorestown
        x86/mrst: Set ISA bus type for fake MP IRQs
        x86/ioapic: Use legacy_pic to set correct gsi-irq mapping
      23904815
    • Linus Torvalds's avatar
      Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 754b9800
      Linus Torvalds authored
      Pull MCE changes from Ingo Molnar.
      
      * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mce: Fix return value of mce_chrdev_read() when erst is disabled
        x86/mce: Convert static array of pointers to per-cpu variables
        x86/mce: Replace hard coded hex constants with symbolic defines
        x86/mce: Recognise machine check bank signature for data path error
        x86/mce: Handle "action required" errors
        x86/mce: Add mechanism to safely save information in MCE handler
        x86/mce: Create helper function to save addr/misc when needed
        HWPOISON: Add code to handle "action required" errors.
        HWPOISON: Clean up memory_failure() vs. __memory_failure()
      754b9800
    • Linus Torvalds's avatar
      Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 35cb8d9e
      Linus Torvalds authored
      Pull x86/fpu changes from Ingo Molnar.
      
      * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        i387: Split up <asm/i387.h> into exported and internal interfaces
        i387: Uninline the generic FP helpers that we expose to kernel modules
      35cb8d9e
    • Linus Torvalds's avatar
      Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 02c50256
      Linus Torvalds authored
      Pull x86/build changes from Ingo Molnar.
      
      * 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86, build: Fix portability issues when cross-building
        x86, tools: Remove unneeded header files from tools/build.c
        USB: ffs-test: Don't duplicate {get,put}_unaligned*() functions
        x86, efi: Fix endian issues and unaligned accesses
        x86, boot: Restrict CFLAGS for hostprogs
        x86, mkpiggy: Don't open code put_unaligned_le32()
        x86, relocs: Don't open code put_unaligned_le32()
        tools/include: Add byteshift headers for endian access
      02c50256
    • Linus Torvalds's avatar
      Merge branch 'x86-eficross-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f06fc0c0
      Linus Torvalds authored
      Pull x86/eficross (booting 32/64-bit kernel from 64/32-bit EFI) from Ingo Molnar
      
      * 'x86-eficross-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86, efi: Allow basic init with mixed 32/64-bit efi/kernel
        x86, efi: Add basic error handling
        x86, efi: Cleanup config table walking
        x86, efi: Convert printk to pr_*()
        x86, efi: Refactor efi_init() a bit
      f06fc0c0
    • Linus Torvalds's avatar
      Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4c64616b
      Linus Torvalds authored
      Pull x86/debug changes from Ingo Molnar.
      
      * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Fix section warnings
        x86-64: Fix CFI data for common_interrupt()
        x86: Properly _init-annotate NMI selftest code
        x86/debug: Fix/improve the show_msr=<cpus> debug print out
      4c64616b
    • Linus Torvalds's avatar
      Merge branches 'x86-cpu-for-linus', 'x86-boot-for-linus',... · c5c7fb8f
      Linus Torvalds authored
      Merge branches 'x86-cpu-for-linus', 'x86-boot-for-linus', 'x86-cpufeature-for-linus', 'x86-process-for-linus' and 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
      
      Pull trivial x86 branches from Ingo Molnar: small one-liners to fix up
      details.
      
      * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Remove some noise from boot log when starting cpus
      
      * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86, boot: Fix port argument to inl() function
      
      * 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86, cpufeature: Add CPU features from Intel document 319433-012A
      
      * 'x86-process-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86_64: Record stack pointer before task execution begins
      
      * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/UV: Lower UV rtc clocksource rating
      c5c7fb8f
    • Linus Torvalds's avatar
      Merge branch 'x86-atomic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1b674bf1
      Linus Torvalds authored
      Pull x86/atomic changes from Ingo Molnar.
      
      * 'x86-atomic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: atomic64 assembly improvements
        x86: Adjust asm constraints in atomic64 wrappers
      1b674bf1
    • Linus Torvalds's avatar
      Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e17fdf5c
      Linus Torvalds authored
      Pull x86/asm changes from Ingo Molnar
      
      * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Include probe_roms.h in probe_roms.c
        x86/32: Print control and debug registers for kerenel context
        x86: Tighten dependencies of CPU_SUP_*_32
        x86/numa: Improve internode cache alignment
        x86: Fix the NMI nesting comments
        x86-64: Improve insn scheduling in SAVE_ARGS_IRQ
        x86-64: Fix CFI annotations for NMI nesting code
        bitops: Add missing parentheses to new get_order macro
        bitops: Optimise get_order()
        bitops: Adjust the comment on get_order() to describe the size==0 case
        x86/spinlocks: Eliminate TICKET_MASK
        x86-64: Handle byte-wise tail copying in memcpy() without a loop
        x86-64: Fix memcpy() to support sizes of 4Gb and above
        x86-64: Fix memset() to support sizes of 4Gb and above
        x86-64: Slightly shorten copy_page()
      e17fdf5c
    • Linus Torvalds's avatar
      Merge branch 'akpm' (Andrew's patch-bomb) · 95211279
      Linus Torvalds authored
      Merge first batch of patches from Andrew Morton:
       "A few misc things and all the MM queue"
      
      * emailed from Andrew Morton <akpm@linux-foundation.org>: (92 commits)
        memcg: avoid THP split in task migration
        thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE
        memcg: clean up existing move charge code
        mm/memcontrol.c: remove unnecessary 'break' in mem_cgroup_read()
        mm/memcontrol.c: remove redundant BUG_ON() in mem_cgroup_usage_unregister_event()
        mm/memcontrol.c: s/stealed/stolen/
        memcg: fix performance of mem_cgroup_begin_update_page_stat()
        memcg: remove PCG_FILE_MAPPED
        memcg: use new logic for page stat accounting
        memcg: remove PCG_MOVE_LOCK flag from page_cgroup
        memcg: simplify move_account() check
        memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat)
        memcg: kill dead prev_priority stubs
        memcg: remove PCG_CACHE page_cgroup flag
        memcg: let css_get_next() rely upon rcu_read_lock()
        cgroup: revert ss_id_lock to spinlock
        idr: make idr_get_next() good for rcu_read_lock()
        memcg: remove unnecessary thp check in page stat accounting
        memcg: remove redundant returns
        memcg: enum lru_list lru
        ...
      95211279
    • Linus Torvalds's avatar
      Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · 5375871d
      Linus Torvalds authored
      Pull powerpc merge from Benjamin Herrenschmidt:
       "Here's the powerpc batch for this merge window.  It is going to be a
        bit more nasty than usual as in touching things outside of
        arch/powerpc mostly due to the big iSeriesectomy :-) We finally got
        rid of the bugger (legacy iSeries support) which was a PITA to
        maintain and that nobody really used anymore.
      
        Here are some of the highlights:
      
         - Legacy iSeries is gone.  Thanks Stephen ! There's still some bits
           and pieces remaining if you do a grep -ir series arch/powerpc but
           they are harmless and will be removed in the next few weeks
           hopefully.
      
         - The 'fadump' functionality (Firmware Assisted Dump) replaces the
           previous (equivalent) "pHyp assisted dump"...  it's a rewrite of a
           mechanism to get the hypervisor to do crash dumps on pSeries, the
           new implementation hopefully being much more reliable.  Thanks
           Mahesh Salgaonkar.
      
         - The "EEH" code (pSeries PCI error handling & recovery) got a big
           spring cleaning, motivated by the need to be able to implement a
           new backend for it on top of some new different type of firwmare.
      
           The work isn't complete yet, but a good chunk of the cleanups is
           there.  Note that this adds a field to struct device_node which is
           not very nice and which Grant objects to.  I will have a patch soon
           that moves that to a powerpc private data structure (hopefully
           before rc1) and we'll improve things further later on (hopefully
           getting rid of the need for that pointer completely).  Thanks Gavin
           Shan.
      
         - I dug into our exception & interrupt handling code to improve the
           way we do lazy interrupt handling (and make it work properly with
           "edge" triggered interrupt sources), and while at it found & fixed
           a wagon of issues in those areas, including adding support for page
           fault retry & fatal signals on page faults.
      
         - Your usual random batch of small fixes & updates, including a bunch
           of new embedded boards, both Freescale and APM based ones, etc..."
      
      I fixed up some conflicts with the generalized irq-domain changes from
      Grant Likely, hopefully correctly.
      
      * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (141 commits)
        powerpc/ps3: Do not adjust the wrapper load address
        powerpc: Remove the rest of the legacy iSeries include files
        powerpc: Remove the remaining CONFIG_PPC_ISERIES pieces
        init: Remove CONFIG_PPC_ISERIES
        powerpc: Remove FW_FEATURE ISERIES from arch code
        tty/hvc_vio: FW_FEATURE_ISERIES is no longer selectable
        powerpc/spufs: Fix double unlocks
        powerpc/5200: convert mpc5200 to use of_platform_populate()
        powerpc/mpc5200: add options to mpc5200_defconfig
        powerpc/mpc52xx: add a4m072 board support
        powerpc/mpc5200: update mpc5200_defconfig to fit for charon board
        Documentation/powerpc/mpc52xx.txt: Checkpatch cleanup
        powerpc/44x: Add additional device support for APM821xx SoC and Bluestone board
        powerpc/44x: Add support PCI-E for APM821xx SoC and Bluestone board
        MAINTAINERS: Update PowerPC 4xx tree
        powerpc/44x: The bug fixed support for APM821xx SoC and Bluestone board
        powerpc: document the FSL MPIC message register binding
        powerpc: add support for MPIC message register API
        powerpc/fsl: Added aliased MSIIR register address to MSI node in dts
        powerpc/85xx: mpc8548cds - add 36-bit dts
        ...
      5375871d
    • Linus Torvalds's avatar
      Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · b57cb723
      Linus Torvalds authored
      Pull m68knommu arch updates from Greg Ungerer:
       "Includes a cleanup of the non-MMU linker script (it now almost
        exclusively uses the well defined linker script support macros and
        definitions).  Some more merging of MMU and non-MMU common files
        (specifically the arch process.c, ptrace and time.c).  And a big
        cleanup of the massively duplicated ColdFire device definition code.
      
        Overall we remove about 2000 lines of code, and end up with a single
        set of platform device definitions for the serial ports, ethernet
        ports and QSPI ports common in most ColdFire SoCs.
      
        I expect you will get a merge conflict on arch/m68k/kernel/process.c,
        in cpu_idle().  It should be relatively strait forward to fixup."
      
      And cpu_idle() conflict resolution was indeed trivial (merging the
      nommu/mmu versions of process.c trivially conflicting with the
      conversion to use the schedule_preempt_disabled() helper function)
      
      * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: (57 commits)
        m68knommu: factor more common ColdFire cpu reset code
        m68knommu: make 528x CPU reset register addressing consistent
        m68knommu: make 527x CPU reset register addressing consistent
        m68knommu: make 523x CPU reset register addressing consistent
        m68knommu: factor some common ColdFire cpu reset code
        m68knommu: move old ColdFire timers init from CPU init to timers code
        m68knommu: clean up init code in ColdFire 532x startup
        m68knommu: clean up init code in ColdFire 528x startup
        m68knommu: clean up init code in ColdFire 523x startup
        m68knommu: merge common ColdFire QSPI platform setup code
        m68knommu: make 532x QSPI platform addressing consistent
        m68knommu: make 528x QSPI platform addressing consistent
        m68knommu: make 527x QSPI platform addressing consistent
        m68knommu: make 5249 QSPI platform addressing consistent
        m68knommu: make 523x QSPI platform addressing consistent
        m68knommu: make 520x QSPI platform addressing consistent
        m68knommu: merge common ColdFire FEC platform setup code
        m68knommu: make 532x FEC platform addressing consistent
        m68knommu: make 528x FEC platform addressing consistent
        m68knommu: make 527x FEC platform addressing consistent
        ...
      b57cb723
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw · ad12ab25
      Linus Torvalds authored
      Pull gfs2 changes from Steven Whitehouse.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
        GFS2: Change truncate page allocation to be GFP_NOFS
        GFS2: call gfs2_write_alloc_required for each chunk
        GFS2: Clean up log flush header writing
        GFS2: Remove a __GFP_NOFAIL allocation
        GFS2: Flush pending glock work when evicting an inode
        GFS2: make sure rgrps are up to date in func gfs2_blk2rgrpd
        GFS2: Eliminate sd_rindex_mutex
        GFS2: Unlock rindex mutex on glock error
        GFS2: Make bd_cmp() static
        GFS2: Sort the ordered write list
        GFS2: FITRIM ioctl support
        GFS2: Move two functions from log.c to lops.c
        GFS2: glock statistics gathering
      ad12ab25
    • Naoya Horiguchi's avatar
      memcg: avoid THP split in task migration · 12724850
      Naoya Horiguchi authored
      Currently we can't do task migration among memory cgroups without THP
      split, which means processes heavily using THP experience large overhead
      in task migration.  This patch introduces the code for moving charge of
      THP and makes THP more valuable.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarHillf Danton <dhillf@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      12724850
    • Naoya Horiguchi's avatar
      thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE · d8c37c48
      Naoya Horiguchi authored
      These macros will be used in a later patch, where all usages are expected
      to be optimized away without #ifdef CONFIG_TRANSPARENT_HUGEPAGE.  But to
      detect unexpected usages, we convert the existing BUG() to BUILD_BUG().
      
      [akpm@linux-foundation.org: fix build in mm/pgtable-generic.c]
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarHillf Danton <dhillf@gmail.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d8c37c48
    • Naoya Horiguchi's avatar
      memcg: clean up existing move charge code · 8d32ff84
      Naoya Horiguchi authored
      - Replace lengthy function name is_target_pte_for_mc() with a shorter
        one in order to avoid ugly line breaks.
      
      - explicitly use MC_TARGET_* instead of simply using integers.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: default avatarHillf Danton <dhillf@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8d32ff84
    • Jeff Liu's avatar
    • Anton Vorontsov's avatar
      mm/memcontrol.c: remove redundant BUG_ON() in mem_cgroup_usage_unregister_event() · 45f3e385
      Anton Vorontsov authored
      In the following code:
      
      	if (type == _MEM)
      		thresholds = &memcg->thresholds;
      	else if (type == _MEMSWAP)
      		thresholds = &memcg->memsw_thresholds;
      	else
      		BUG();
      
      	BUG_ON(!thresholds);
      
      The BUG_ON() seems redundant.
      Signed-off-by: default avatarAnton Vorontsov <anton.vorontsov@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      45f3e385
    • Andrew Morton's avatar
      mm/memcontrol.c: s/stealed/stolen/ · 13fd1dd9
      Andrew Morton authored
      A grammatical fix.
      
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      13fd1dd9
    • KAMEZAWA Hiroyuki's avatar
      memcg: fix performance of mem_cgroup_begin_update_page_stat() · 4331f7d3
      KAMEZAWA Hiroyuki authored
      mem_cgroup_begin_update_page_stat() should be very fast because it's
      called very frequently.  Now, it needs to look up page_cgroup and its
      memcg....this is slow.
      
      This patch adds a global variable to check "any memcg is moving or not".
      With this, the caller doesn't need to visit page_cgroup and memcg.
      
      Here is a test result.  A test program makes page faults onto a file,
      MAP_SHARED and makes each page's page_mapcount(page) > 1, and free the
      range by madvise() and page fault again.  This program causes 26214400
      times of page fault onto a file(size was 1G.) and shows shows the cost of
      mem_cgroup_begin_update_page_stat().
      
      Before this patch for mem_cgroup_begin_update_page_stat()
      
          [kamezawa@bluextal test]$ time ./mmap 1G
      
          real    0m21.765s
          user    0m5.999s
          sys     0m15.434s
      
          27.46%     mmap  mmap               [.] reader
          21.15%     mmap  [kernel.kallsyms]  [k] page_fault
           9.17%     mmap  [kernel.kallsyms]  [k] filemap_fault
           2.96%     mmap  [kernel.kallsyms]  [k] __do_fault
           2.83%     mmap  [kernel.kallsyms]  [k] __mem_cgroup_begin_update_page_stat
      
      After this patch
      
          [root@bluextal test]# time ./mmap 1G
      
          real    0m21.373s
          user    0m6.113s
          sys     0m15.016s
      
      In usual path, calls to __mem_cgroup_begin_update_page_stat() goes away.
      
      Note: we may be able to remove this optimization in future if
            we can get pointer to memcg directly from struct page.
      
      [akpm@linux-foundation.org: don't return a void]
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4331f7d3
    • KAMEZAWA Hiroyuki's avatar
      memcg: remove PCG_FILE_MAPPED · 2ff76f11
      KAMEZAWA Hiroyuki authored
      With the new lock scheme for updating memcg's page stat, we don't need a
      flag PCG_FILE_MAPPED which was duplicated information of page_mapped().
      
      [hughd@google.com: cosmetic fix]
      [hughd@google.com: add comment to MEM_CGROUP_CHARGE_TYPE_MAPPED case in __mem_cgroup_uncharge_common()]
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2ff76f11
    • KAMEZAWA Hiroyuki's avatar
      memcg: use new logic for page stat accounting · 89c06bd5
      KAMEZAWA Hiroyuki authored
      Now, page-stat-per-memcg is recorded into per page_cgroup flag by
      duplicating page's status into the flag.  The reason is that memcg has a
      feature to move a page from a group to another group and we have race
      between "move" and "page stat accounting",
      
      Under current logic, assume CPU-A and CPU-B.  CPU-A does "move" and CPU-B
      does "page stat accounting".
      
      When CPU-A goes 1st,
      
                  CPU-A                           CPU-B
                                          update "struct page" info.
          move_lock_mem_cgroup(memcg)
          see pc->flags
          copy page stat to new group
          overwrite pc->mem_cgroup.
          move_unlock_mem_cgroup(memcg)
                                          move_lock_mem_cgroup(mem)
                                          set pc->flags
                                          update page stat accounting
                                          move_unlock_mem_cgroup(mem)
      
      stat accounting is guarded by move_lock_mem_cgroup() and "move" logic
      (CPU-A) doesn't see changes in "struct page" information.
      
      But it's costly to have the same information both in 'struct page' and
      'struct page_cgroup'.  And, there is a potential problem.
      
      For example, assume we have PG_dirty accounting in memcg.
      PG_..is a flag for struct page.
      PCG_ is a flag for struct page_cgroup.
      (This is just an example. The same problem can be found in any
       kind of page stat accounting.)
      
      	  CPU-A                               CPU-B
            TestSet PG_dirty
            (delay)                        TestClear PG_dirty
                                           if (TestClear(PCG_dirty))
                                                memcg->nr_dirty--
            if (TestSet(PCG_dirty))
                memcg->nr_dirty++
      
      Here, memcg->nr_dirty = +1, this is wrong.  This race was reported by Greg
      Thelen <gthelen@google.com>.  Now, only FILE_MAPPED is supported but
      fortunately, it's serialized by page table lock and this is not real bug,
      _now_,
      
      If this potential problem is caused by having duplicated information in
      struct page and struct page_cgroup, we may be able to fix this by using
      original 'struct page' information.  But we'll have a problem in "move
      account"
      
      Assume we use only PG_dirty.
      
               CPU-A                   CPU-B
          TestSet PG_dirty
          (delay)                    move_lock_mem_cgroup()
                                     if (PageDirty(page))
                                            new_memcg->nr_dirty++
                                     pc->mem_cgroup = new_memcg;
                                     move_unlock_mem_cgroup()
          move_lock_mem_cgroup()
          memcg = pc->mem_cgroup
          new_memcg->nr_dirty++
      
      accounting information may be double-counted.  This was original reason to
      have PCG_xxx flags but it seems PCG_xxx has another problem.
      
      I think we need a bigger lock as
      
           move_lock_mem_cgroup(page)
           TestSetPageDirty(page)
           update page stats (without any checks)
           move_unlock_mem_cgroup(page)
      
      This fixes both of problems and we don't have to duplicate page flag into
      page_cgroup.  Please note: move_lock_mem_cgroup() is held only when there
      are possibility of "account move" under the system.  So, in most path,
      status update will go without atomic locks.
      
      This patch introduces mem_cgroup_begin_update_page_stat() and
      mem_cgroup_end_update_page_stat() both should be called at modifying
      'struct page' information if memcg takes care of it.  as
      
           mem_cgroup_begin_update_page_stat()
           modify page information
           mem_cgroup_update_page_stat()
           => never check any 'struct page' info, just update counters.
           mem_cgroup_end_update_page_stat().
      
      This patch is slow because we need to call begin_update_page_stat()/
      end_update_page_stat() regardless of accounted will be changed or not.  A
      following patch adds an easy optimization and reduces the cost.
      
      [akpm@linux-foundation.org: s/lock/locked/]
      [hughd@google.com: fix deadlock by avoiding stat lock when anon]
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Greg Thelen <gthelen@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      89c06bd5
    • KAMEZAWA Hiroyuki's avatar
      memcg: remove PCG_MOVE_LOCK flag from page_cgroup · 312734c0
      KAMEZAWA Hiroyuki authored
      PCG_MOVE_LOCK is used for bit spinlock to avoid race between overwriting
      pc->mem_cgroup and page statistics accounting per memcg.  This lock helps
      to avoid the race but the race is very rare because moving tasks between
      cgroup is not a usual job.  So, it seems using 1bit per page is too
      costly.
      
      This patch changes this lock as per-memcg spinlock and removes
      PCG_MOVE_LOCK.
      
      If smaller lock is required, we'll be able to add some hashes but I'd like
      to start from this.
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarGreg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      312734c0