1. 06 Nov, 2013 6 commits
    • Vineet Gupta's avatar
      ARC: cacheflush refactor #3: Unify the {d,i}cache flush leaf helpers · bd12976c
      Vineet Gupta authored
      With Line length being constant now, we can fold the 2 helpers into 1.
      This allows applying any optimizations (forthcoming) to single place.
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      bd12976c
    • Vineet Gupta's avatar
      ARC: cacheflush refactor #2: I and D caches lines to have same size · 63d2dfdb
      Vineet Gupta authored
      Having them be different seems an obscure configuration.
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      63d2dfdb
    • Vineet Gupta's avatar
      ARC: cacheflush refactor #1: push aux reg ascertaining into leaf routine · f3e4de32
      Vineet Gupta authored
      ARC dcache supports 3 ops - Inv, Flush, Flush-n-Inv.
      The programming model however provides 2 commands FLUSH, INV.
      INV will either discard or flush-n-discard (based on DT_CTRL bit)
      
      The leaf helper __dc_line_loop() used to take the AUX register
      (corresponding to the 2 commands). Now we push that to within the
      helper, paving way for code consolidations to follow.
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      f3e4de32
    • Vineet Gupta's avatar
      064a6269
    • Vineet Gupta's avatar
      ARC: Annotate some functions as static · 8e457d6a
      Vineet Gupta authored
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      8e457d6a
    • Christoph Lameter's avatar
      arc: Replace __get_cpu_var uses · 6855e95c
      Christoph Lameter authored
      __get_cpu_var() is used for multiple purposes in the kernel source. One of them is
      address calculation via the form &__get_cpu_var(x). This calculates the address for
      the instance of the percpu variable of the current processor based on an offset.
      
      Other use cases are for storing and retrieving data from the current processors percpu area.
      __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
      
      __get_cpu_var() always only does an address determination. However, store and retrieve operations
      could use a segment prefix (or global register on other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use
      optimized assembly code to read and write per cpu variables.
      
      This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr()
      or into a use of this_cpu operations that use the offset. Thereby address calcualtions are avoided
      and less registers are used when code is generated.
      
      At the end of the patchset all uses of __get_cpu_var have been removed so the macro is removed too.
      
      The patchset includes passes over all arches as well. Once these operations are used throughout then
      specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by
      f.e. using a global register that may be set to the per cpu base.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu variable.
      
      	DEFINE_PER_CPU(int, u);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(this_cpu_ptr(&x), y, sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	this_cpu_inc(y)
      Acked-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      6855e95c
  2. 03 Nov, 2013 3 commits
  3. 02 Nov, 2013 2 commits
  4. 01 Nov, 2013 20 commits
  5. 31 Oct, 2013 9 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (fixes from Andrew Morton) · 4f794ee8
      Linus Torvalds authored
      Merge four more fixes from Andrew Morton.
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        lib/scatterlist.c: don't flush_kernel_dcache_page on slab page
        mm: memcg: fix test for child groups
        mm: memcg: lockdep annotation for memcg OOM lock
        mm: memcg: use proper memcg in limit bypass
      4f794ee8
    • Ming Lei's avatar
      lib/scatterlist.c: don't flush_kernel_dcache_page on slab page · 3d77b50c
      Ming Lei authored
      Commit b1adaf65 ("[SCSI] block: add sg buffer copy helper
      functions") introduces two sg buffer copy helpers, and calls
      flush_kernel_dcache_page() on pages in SG list after these pages are
      written to.
      
      Unfortunately, the commit may introduce a potential bug:
      
       - Before sending some SCSI commands, kmalloc() buffer may be passed to
         block layper, so flush_kernel_dcache_page() can see a slab page
         finally
      
       - According to cachetlb.txt, flush_kernel_dcache_page() is only called
         on "a user page", which surely can't be a slab page.
      
       - ARCH's implementation of flush_kernel_dcache_page() may use page
         mapping information to do optimization so page_mapping() will see the
         slab page, then VM_BUG_ON() is triggered.
      
      Aaro Koskinen reported the bug on ARM/kirkwood when DEBUG_VM is enabled,
      and this patch fixes the bug by adding test of '!PageSlab(miter->page)'
      before calling flush_kernel_dcache_page().
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Reported-by: default avatarAaro Koskinen <aaro.koskinen@iki.fi>
      Tested-by: default avatarSimon Baatz <gmbnomis@gmail.com>
      Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: <stable@vger.kernel.org>	[3.2+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3d77b50c
    • Johannes Weiner's avatar
      mm: memcg: fix test for child groups · 696ac172
      Johannes Weiner authored
      When memcg code needs to know whether any given memcg has children, it
      uses the cgroup child iteration primitives and returns true/false
      depending on whether the iteration loop is executed at least once or
      not.
      
      Because a cgroup's list of children is RCU protected, these primitives
      require the RCU read-lock to be held, which is not the case for all
      memcg callers.  This results in the following splat when e.g.  enabling
      hierarchy mode:
      
        WARNING: CPU: 3 PID: 1 at kernel/cgroup.c:3043 css_next_child+0xa3/0x160()
        CPU: 3 PID: 1 Comm: systemd Not tainted 3.12.0-rc5-00117-g83f11a9c-dirty #18
        Hardware name: LENOVO 3680B56/3680B56, BIOS 6QET69WW (1.39 ) 04/26/2012
        Call Trace:
          dump_stack+0x54/0x74
          warn_slowpath_common+0x78/0xa0
          warn_slowpath_null+0x1a/0x20
          css_next_child+0xa3/0x160
          mem_cgroup_hierarchy_write+0x5b/0xa0
          cgroup_file_write+0x108/0x2a0
          vfs_write+0xbd/0x1e0
          SyS_write+0x4c/0xa0
          system_call_fastpath+0x16/0x1b
      
      In the memcg case, we only care about children when we are attempting to
      modify inheritable attributes interactively.  Racing with deletion could
      mean a spurious -EBUSY, no problem.  Racing with addition is handled
      just fine as well through the memcg_create_mutex: if the child group is
      not on the list after the mutex is acquired, it won't be initialized
      from the parent's attributes until after the unlock.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      696ac172
    • Johannes Weiner's avatar
      mm: memcg: lockdep annotation for memcg OOM lock · 0056f4e6
      Johannes Weiner authored
      The memcg OOM lock is a mutex-type lock that is open-coded due to
      memcg's special needs.  Add annotations for lockdep coverage.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0056f4e6
    • Johannes Weiner's avatar
      mm: memcg: use proper memcg in limit bypass · 3168ecbe
      Johannes Weiner authored
      Commit 84235de3 ("fs: buffer: move allocation failure loop into the
      allocator") allowed __GFP_NOFAIL allocations to bypass the limit if they
      fail to reclaim enough memory for the charge.  But because the main test
      case was on a 3.2-based system, the patch missed the fact that on newer
      kernels the charge function needs to return root_mem_cgroup when
      bypassing the limit, and not NULL.  This will corrupt whatever memory is
      at NULL + percpu pointer offset.  Fix this quickly before problems are
      reported.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3168ecbe
    • Linus Torvalds's avatar
      vfs: decrapify dput(), fix cache behavior under normal load · 358eec18
      Linus Torvalds authored
      We do not want to dirty the dentry->d_flags cacheline in dput() just to
      set the DCACHE_REFERENCED flag when it is already set in the common case
      anyway.  This way the first cacheline of the dentry (which contains the
      RCU lookup information etc) can stay shared among multiple CPU's.
      
      This finishes off some of the details of all the scalability patches
      merged during the merge window.
      
      Also don't mark dentry_kill() for inlining, since it's the uncommon path
      and inlining it just makes the common path slower due to extra function
      entry/exit overhead.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      358eec18
    • Linus Torvalds's avatar
      i915: fix compiler warning · 0baab4fd
      Linus Torvalds authored
      The last i915 drm update brought with it this annoying warning
      
        drivers/gpu/drm/i915/intel_crt.c: In function ‘intel_crt_get_config’:
        drivers/gpu/drm/i915/intel_crt.c:110:21: warning: unused variable ‘dev’ [-Wunused-variable]
          struct drm_device *dev = encoder->base.dev;
                             ^
      
      introduced by commit 7195a50b ("drm/i915: Add HSW CRT output readout
      support").
      
      Remove the offending pointless variable.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0baab4fd
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 52469b4f
      Linus Torvalds authored
      Pull NUMA balancing memory corruption fixes from Ingo Molnar:
       "So these fixes are definitely not something I'd like to sit on, but as
        I said to Mel at the KS the timing is quite tight, with Linus planning
        v3.12-final within a week.
      
        Fedora-19 is affected:
      
         comet:~> grep NUMA_BALANCING /boot/config-3.11.3-201.fc19.x86_64
      
         CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
         CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
         CONFIG_NUMA_BALANCING=y
      
        AFAICS Ubuntu will be affected as well, once it updates the kernel:
      
         hubble:~> grep NUMA_BALANCING /boot/config-3.8.0-32-generic
      
         CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
         CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
         CONFIG_NUMA_BALANCING=y
      
        These 6 commits are a minimalized set of cherry-picks needed to fix
        the memory corruption bugs.  All commits are fixes, except "mm: numa:
        Sanitize task_numa_fault() callsites" which is a cleanup that made two
        followup fixes simpler.
      
        I've done targeted testing with just this SHA1 to try to make sure
        there are no cherry-picking artifacts.  The original non-cherry-picked
        set of fixes were exposed to linux-next for a couple of weeks"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        mm: Account for a THP NUMA hinting update as one PTE update
        mm: Close races between THP migration and PMD numa clearing
        mm: numa: Sanitize task_numa_fault() callsites
        mm: Prevent parallel splits during THP migration
        mm: Wait for THP migrations to complete during NUMA hinting faults
        mm: numa: Do not account for a hinting fault if we raced
      52469b4f
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 026f8f61
      Linus Torvalds authored
      Pull input updates from Dmitry Torokhov:
       "A bit later than I would want, but the changes are very minor - a few
        new device IDs for new hardware in existing drivers, fix for battery
        in Wacom devices not be considered system battery and cause emergency
        hibernations, and a couple of other bug fixes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: ALPS - add support for model found on Dell XT2
        Input: wacom - add support for ISDv4 0x10E sensor
        Input: wacom - add support for ISDv4 0x10F sensor
        Input: wacom - export battery scope
        Input: cm109 - convert high volume dev_err() to dev_err_ratelimited()
        Input: move name/timer init to input_alloc_dev()
        Input: i8042 - i8042_flush fix for a full 8042 buffer
        Input: pxa27x_keypad - fix NULL pointer dereference
      026f8f61