1. 02 Nov, 2013 1 commit
    • Vineet Gupta's avatar
      ARC: Incorrect mm reference used in vmalloc fault handler · 9c41f4ee
      Vineet Gupta authored
      A vmalloc fault needs to sync up PGD/PTE entry from init_mm to current
      task's "active_mm".  ARC vmalloc fault handler however was using mm.
      
      A vmalloc fault for non user task context (actually pre-userland, from
      init thread's open for /dev/console) caused the handler to deref NULL mm
      (for mm->pgd)
      
      The reasons it worked so far is amazing:
      
      1. By default (!SMP), vmalloc fault handler uses a cached value of PGD.
         In SMP that MMU register is repurposed hence need for mm pointer deref.
      
      2. In pre-3.12 SMP kernel, the problem triggering vmalloc didn't exist in
         pre-userland code path - it was introduced with commit 20bafb3d
         "n_tty: Move buffers into n_tty_data"
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Cc: Gilad Ben-Yossef <gilad@benyossef.com>
      Cc: Noam Camus <noamc@ezchip.com>
      Cc: stable@vger.kernel.org    #3.10 and 3.11
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9c41f4ee
  2. 01 Nov, 2013 19 commits
  3. 31 Oct, 2013 13 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (fixes from Andrew Morton) · 4f794ee8
      Linus Torvalds authored
      Merge four more fixes from Andrew Morton.
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        lib/scatterlist.c: don't flush_kernel_dcache_page on slab page
        mm: memcg: fix test for child groups
        mm: memcg: lockdep annotation for memcg OOM lock
        mm: memcg: use proper memcg in limit bypass
      4f794ee8
    • Ming Lei's avatar
      lib/scatterlist.c: don't flush_kernel_dcache_page on slab page · 3d77b50c
      Ming Lei authored
      Commit b1adaf65 ("[SCSI] block: add sg buffer copy helper
      functions") introduces two sg buffer copy helpers, and calls
      flush_kernel_dcache_page() on pages in SG list after these pages are
      written to.
      
      Unfortunately, the commit may introduce a potential bug:
      
       - Before sending some SCSI commands, kmalloc() buffer may be passed to
         block layper, so flush_kernel_dcache_page() can see a slab page
         finally
      
       - According to cachetlb.txt, flush_kernel_dcache_page() is only called
         on "a user page", which surely can't be a slab page.
      
       - ARCH's implementation of flush_kernel_dcache_page() may use page
         mapping information to do optimization so page_mapping() will see the
         slab page, then VM_BUG_ON() is triggered.
      
      Aaro Koskinen reported the bug on ARM/kirkwood when DEBUG_VM is enabled,
      and this patch fixes the bug by adding test of '!PageSlab(miter->page)'
      before calling flush_kernel_dcache_page().
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Reported-by: default avatarAaro Koskinen <aaro.koskinen@iki.fi>
      Tested-by: default avatarSimon Baatz <gmbnomis@gmail.com>
      Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: <stable@vger.kernel.org>	[3.2+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3d77b50c
    • Johannes Weiner's avatar
      mm: memcg: fix test for child groups · 696ac172
      Johannes Weiner authored
      When memcg code needs to know whether any given memcg has children, it
      uses the cgroup child iteration primitives and returns true/false
      depending on whether the iteration loop is executed at least once or
      not.
      
      Because a cgroup's list of children is RCU protected, these primitives
      require the RCU read-lock to be held, which is not the case for all
      memcg callers.  This results in the following splat when e.g.  enabling
      hierarchy mode:
      
        WARNING: CPU: 3 PID: 1 at kernel/cgroup.c:3043 css_next_child+0xa3/0x160()
        CPU: 3 PID: 1 Comm: systemd Not tainted 3.12.0-rc5-00117-g83f11a9c-dirty #18
        Hardware name: LENOVO 3680B56/3680B56, BIOS 6QET69WW (1.39 ) 04/26/2012
        Call Trace:
          dump_stack+0x54/0x74
          warn_slowpath_common+0x78/0xa0
          warn_slowpath_null+0x1a/0x20
          css_next_child+0xa3/0x160
          mem_cgroup_hierarchy_write+0x5b/0xa0
          cgroup_file_write+0x108/0x2a0
          vfs_write+0xbd/0x1e0
          SyS_write+0x4c/0xa0
          system_call_fastpath+0x16/0x1b
      
      In the memcg case, we only care about children when we are attempting to
      modify inheritable attributes interactively.  Racing with deletion could
      mean a spurious -EBUSY, no problem.  Racing with addition is handled
      just fine as well through the memcg_create_mutex: if the child group is
      not on the list after the mutex is acquired, it won't be initialized
      from the parent's attributes until after the unlock.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      696ac172
    • Johannes Weiner's avatar
      mm: memcg: lockdep annotation for memcg OOM lock · 0056f4e6
      Johannes Weiner authored
      The memcg OOM lock is a mutex-type lock that is open-coded due to
      memcg's special needs.  Add annotations for lockdep coverage.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0056f4e6
    • Johannes Weiner's avatar
      mm: memcg: use proper memcg in limit bypass · 3168ecbe
      Johannes Weiner authored
      Commit 84235de3 ("fs: buffer: move allocation failure loop into the
      allocator") allowed __GFP_NOFAIL allocations to bypass the limit if they
      fail to reclaim enough memory for the charge.  But because the main test
      case was on a 3.2-based system, the patch missed the fact that on newer
      kernels the charge function needs to return root_mem_cgroup when
      bypassing the limit, and not NULL.  This will corrupt whatever memory is
      at NULL + percpu pointer offset.  Fix this quickly before problems are
      reported.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3168ecbe
    • Linus Torvalds's avatar
      vfs: decrapify dput(), fix cache behavior under normal load · 358eec18
      Linus Torvalds authored
      We do not want to dirty the dentry->d_flags cacheline in dput() just to
      set the DCACHE_REFERENCED flag when it is already set in the common case
      anyway.  This way the first cacheline of the dentry (which contains the
      RCU lookup information etc) can stay shared among multiple CPU's.
      
      This finishes off some of the details of all the scalability patches
      merged during the merge window.
      
      Also don't mark dentry_kill() for inlining, since it's the uncommon path
      and inlining it just makes the common path slower due to extra function
      entry/exit overhead.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      358eec18
    • Linus Torvalds's avatar
      i915: fix compiler warning · 0baab4fd
      Linus Torvalds authored
      The last i915 drm update brought with it this annoying warning
      
        drivers/gpu/drm/i915/intel_crt.c: In function ‘intel_crt_get_config’:
        drivers/gpu/drm/i915/intel_crt.c:110:21: warning: unused variable ‘dev’ [-Wunused-variable]
          struct drm_device *dev = encoder->base.dev;
                             ^
      
      introduced by commit 7195a50b ("drm/i915: Add HSW CRT output readout
      support").
      
      Remove the offending pointless variable.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0baab4fd
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 52469b4f
      Linus Torvalds authored
      Pull NUMA balancing memory corruption fixes from Ingo Molnar:
       "So these fixes are definitely not something I'd like to sit on, but as
        I said to Mel at the KS the timing is quite tight, with Linus planning
        v3.12-final within a week.
      
        Fedora-19 is affected:
      
         comet:~> grep NUMA_BALANCING /boot/config-3.11.3-201.fc19.x86_64
      
         CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
         CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
         CONFIG_NUMA_BALANCING=y
      
        AFAICS Ubuntu will be affected as well, once it updates the kernel:
      
         hubble:~> grep NUMA_BALANCING /boot/config-3.8.0-32-generic
      
         CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
         CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
         CONFIG_NUMA_BALANCING=y
      
        These 6 commits are a minimalized set of cherry-picks needed to fix
        the memory corruption bugs.  All commits are fixes, except "mm: numa:
        Sanitize task_numa_fault() callsites" which is a cleanup that made two
        followup fixes simpler.
      
        I've done targeted testing with just this SHA1 to try to make sure
        there are no cherry-picking artifacts.  The original non-cherry-picked
        set of fixes were exposed to linux-next for a couple of weeks"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        mm: Account for a THP NUMA hinting update as one PTE update
        mm: Close races between THP migration and PMD numa clearing
        mm: numa: Sanitize task_numa_fault() callsites
        mm: Prevent parallel splits during THP migration
        mm: Wait for THP migrations to complete during NUMA hinting faults
        mm: numa: Do not account for a hinting fault if we raced
      52469b4f
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 026f8f61
      Linus Torvalds authored
      Pull input updates from Dmitry Torokhov:
       "A bit later than I would want, but the changes are very minor - a few
        new device IDs for new hardware in existing drivers, fix for battery
        in Wacom devices not be considered system battery and cause emergency
        hibernations, and a couple of other bug fixes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: ALPS - add support for model found on Dell XT2
        Input: wacom - add support for ISDv4 0x10E sensor
        Input: wacom - add support for ISDv4 0x10F sensor
        Input: wacom - export battery scope
        Input: cm109 - convert high volume dev_err() to dev_err_ratelimited()
        Input: move name/timer init to input_alloc_dev()
        Input: i8042 - i8042_flush fix for a full 8042 buffer
        Input: pxa27x_keypad - fix NULL pointer dereference
      026f8f61
    • Linus Torvalds's avatar
      Merge tag 'pm+acpi-3.12-late' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · e7647027
      Linus Torvalds authored
      Pull ACPI and power management fixes from Rafael J Wysocki:
       "Last-minute ACPI and power management fixes for 3.12
      
         - Revert epoll and select commits related to the freezer, introduced
           during the 3.11 cycle, that cause mysterious user space breakage to
           occur during resume from suspend to RAM for multiple users of
           32-bit x86 systems.  Material for 3.11.y stable kernels.
      
         - Revert a recent ACPI-based PCI hotplug (ACPIPHP) commit that was
           part of boot problem fixes for one machine, but turns out to cause
           issues with hotplug on Thunderbolt chains with multiple devices.
           It also turns out to be unnecessary after another fix in the same
           area that went in later.  From Mika Westerberg"
      
      * tag 'pm+acpi-3.12-late' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        Revert "ACPI / hotplug / PCI: Avoid doing too much for spurious notifies"
        Revert "select: use freezable blocking call"
        Revert "epoll: use freezable blocking call"
      e7647027
    • Russell King's avatar
      ALSA: fix oops in snd_pcm_info() caused by ASoC DPCM · a4461f41
      Russell King authored
      Unable to handle kernel NULL pointer dereference at virtual address 00000008
      pgd = d5300000
      [00000008] *pgd=0d265831, *pte=00000000, *ppte=00000000
      Internal error: Oops: 17 [#1] PREEMPT ARM
      CPU: 0 PID: 2295 Comm: vlc Not tainted 3.11.0+ #755
      task: dee74800 ti: e213c000 task.ti: e213c000
      PC is at snd_pcm_info+0xc8/0xd8
      LR is at 0x30232065
      pc : [<c031b52c>]    lr : [<30232065>]    psr: a0070013
      sp : e213dea8  ip : d81cb0d0  fp : c05f7678
      r10: c05f7770  r9 : fffffdfd  r8 : 00000000
      r7 : d8a968a8  r6 : d8a96800  r5 : d8a96200  r4 : d81cb000
      r3 : 00000000  r2 : d81cb000  r1 : 00000001  r0 : d8a96200
      Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
      Control: 10c5387d  Table: 15300019  DAC: 00000015
      Process vlc (pid: 2295, stack limit = 0xe213c248)
      [<c031b52c>] (snd_pcm_info) from [<c031b570>] (snd_pcm_info_user+0x34/0x9c)
      [<c031b570>] (snd_pcm_info_user) from [<c03164a4>] (snd_pcm_control_ioctl+0x274/0x280)
      [<c03164a4>] (snd_pcm_control_ioctl) from [<c0311458>] (snd_ctl_ioctl+0xc0/0x55c)
      [<c0311458>] (snd_ctl_ioctl) from [<c00eca84>] (do_vfs_ioctl+0x80/0x31c)
      [<c00eca84>] (do_vfs_ioctl) from [<c00ecd5c>] (SyS_ioctl+0x3c/0x60)
      [<c00ecd5c>] (SyS_ioctl) from [<c000e500>] (ret_fast_syscall+0x0/0x48)
      Code: e1a00005 e59530dc e3a01001 e1a02004 (e5933008)
      ---[ end trace cb3d9bdb8dfefb3c ]---
      
      This is provoked when the ASoC front end is open along with its backend,
      (which causes the backend to have a runtime assigned to it) and then the
      SNDRV_CTL_IOCTL_PCM_INFO is requested for the (visible) backend device.
      
      Resolve this by ensuring that ASoC internal backend devices are not
      visible to userspace, just as the commentry for snd_pcm_new_internal()
      says it should be.
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Acked-by: default avatarMark Brown <broonie@linaro.org>
      Cc: <stable@vger.kernel.org> [v3.4+]
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      a4461f41
    • Yunkang Tang's avatar
      Input: ALPS - add support for model found on Dell XT2 · 5beea882
      Yunkang Tang authored
      This patch adds support for touchpad found on Dell XT2. It's a dual device
      with device ID: 73, 00, 14, that comply with "ALPS_PROTO_V2".
      Signed-off-by: default avatarYunkang Tang <yunkang.tang@cn.alps.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      5beea882
    • Dave Airlie's avatar
      Merge branch 'drm-fixes-3.12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes · 74c85e13
      Dave Airlie authored
      Just a few small fixes for radeon (audio regression fix,
      stability fix, and an endian bug noticed by coverity).
      
      * 'drm-fixes-3.12' of git://people.freedesktop.org/~agd5f/linux:
        drm/radeon/dpm: fix incompatible casting on big endian
        drm/radeon: disable bapm on KB
        drm/radeon: use sw CTS/N values for audio on DCE4+
      74c85e13
  4. 30 Oct, 2013 7 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (fixes from Andrew Morton) · 12aee278
      Linus Torvalds authored
      Merge three fixes from Andrew Morton.
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        memcg: use __this_cpu_sub() to dec stats to avoid incorrect subtrahend casting
        percpu: fix this_cpu_sub() subtrahend casting for unsigneds
        mm/pagewalk.c: fix walk_page_range() access of wrong PTEs
      12aee278
    • Greg Thelen's avatar
      memcg: use __this_cpu_sub() to dec stats to avoid incorrect subtrahend casting · 5e8cfc3c
      Greg Thelen authored
      As of commit 3ea67d06 ("memcg: add per cgroup writeback pages
      accounting") memcg counter errors are possible when moving charged
      memory to a different memcg.  Charge movement occurs when processing
      writes to memory.force_empty, moving tasks to a memcg with
      memcg.move_charge_at_immigrate=1, or memcg deletion.
      
      An example showing error after memory.force_empty:
      
        $ cd /sys/fs/cgroup/memory
        $ mkdir x
        $ rm /data/tmp/file
        $ (echo $BASHPID >> x/tasks && exec mmap_writer /data/tmp/file 1M) &
        [1] 13600
        $ grep ^mapped x/memory.stat
        mapped_file 1048576
        $ echo 13600 > tasks
        $ echo 1 > x/memory.force_empty
        $ grep ^mapped x/memory.stat
        mapped_file 4503599627370496
      
      mapped_file should end with 0.
        4503599627370496 == 0x10,0000,0000,0000 == 0x100,0000,0000 pages
        1048576          == 0x10,0000           == 0x100 pages
      
      This issue only affects the source memcg on 64 bit machines; the
      destination memcg counters are correct.  So the rmdir case is not too
      important because such counters are soon disappearing with the entire
      memcg.  But the memcg.force_empty and memory.move_charge_at_immigrate=1
      cases are larger problems as the bogus counters are visible for the
      (possibly long) remaining life of the source memcg.
      
      The problem is due to memcg use of __this_cpu_from(.., -nr_pages), which
      is subtly wrong because it subtracts the unsigned int nr_pages (either
      -1 or -512 for THP) from a signed long percpu counter.  When
      nr_pages=-1, -nr_pages=0xffffffff.  On 64 bit machines stat->count[idx]
      is signed 64 bit.  So memcg's attempt to simply decrement a count (e.g.
      from 1 to 0) boils down to:
      
        long count = 1
        unsigned int nr_pages = 1
        count += -nr_pages  /* -nr_pages == 0xffff,ffff */
        count is now 0x1,0000,0000 instead of 0
      
      The fix is to subtract the unsigned page count rather than adding its
      negation.  This only works once "percpu: fix this_cpu_sub() subtrahend
      casting for unsigneds" is applied to fix this_cpu_sub().
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5e8cfc3c
    • Greg Thelen's avatar
      percpu: fix this_cpu_sub() subtrahend casting for unsigneds · bd09d9a3
      Greg Thelen authored
      this_cpu_sub() is implemented as negation and addition.
      
      This patch casts the adjustment to the counter type before negation to
      sign extend the adjustment.  This helps in cases where the counter type
      is wider than an unsigned adjustment.  An alternative to this patch is
      to declare such operations unsupported, but it seemed useful to avoid
      surprises.
      
      This patch specifically helps the following example:
        unsigned int delta = 1
        preempt_disable()
        this_cpu_write(long_counter, 0)
        this_cpu_sub(long_counter, delta)
        preempt_enable()
      
      Before this change long_counter on a 64 bit machine ends with value
      0xffffffff, rather than 0xffffffffffffffff.  This is because
      this_cpu_sub(pcp, delta) boils down to this_cpu_add(pcp, -delta),
      which is basically:
        long_counter = 0 + 0xffffffff
      
      Also apply the same cast to:
        __this_cpu_sub()
        __this_cpu_sub_return()
        this_cpu_sub_return()
      
      All percpu_test.ko passes, especially the following cases which
      previously failed:
      
        l -= ui_one;
        __this_cpu_sub(long_counter, ui_one);
        CHECK(l, long_counter, -1);
      
        l -= ui_one;
        this_cpu_sub(long_counter, ui_one);
        CHECK(l, long_counter, -1);
        CHECK(l, long_counter, 0xffffffffffffffff);
      
        ul -= ui_one;
        __this_cpu_sub(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, -1);
        CHECK(ul, ulong_counter, 0xffffffffffffffff);
      
        ul = this_cpu_sub_return(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, 2);
      
        ul = __this_cpu_sub_return(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, 1);
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bd09d9a3
    • Chen LinX's avatar
      mm/pagewalk.c: fix walk_page_range() access of wrong PTEs · 3017f079
      Chen LinX authored
      When walk_page_range walk a memory map's page tables, it'll skip
      VM_PFNMAP area, then variable 'next' will to assign to vma->vm_end, it
      maybe larger than 'end'.  In next loop, 'addr' will be larger than
      'next'.  Then in /proc/XXXX/pagemap file reading procedure, the 'addr'
      will growing forever in pagemap_pte_range, pte_to_pagemap_entry will
      access the wrong pte.
      
        BUG: Bad page map in process procrank  pte:8437526f pmd:785de067
        addr:9108d000 vm_flags:00200073 anon_vma:f0d99020 mapping:  (null) index:9108d
        CPU: 1 PID: 4974 Comm: procrank Tainted: G    B   W  O 3.10.1+ #1
        Call Trace:
          dump_stack+0x16/0x18
          print_bad_pte+0x114/0x1b0
          vm_normal_page+0x56/0x60
          pagemap_pte_range+0x17a/0x1d0
          walk_page_range+0x19e/0x2c0
          pagemap_read+0x16e/0x200
          vfs_read+0x84/0x150
          SyS_read+0x4a/0x80
          syscall_call+0x7/0xb
      Signed-off-by: default avatarLiu ShuoX <shuox.liu@intel.com>
      Signed-off-by: default avatarChen LinX <linx.z.chen@intel.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: <stable@vger.kernel.org>	[3.10.x+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3017f079
    • Russell King's avatar
      mm: list_lru: fix almost infinite loop causing effective livelock · c56b097a
      Russell King authored
      I've seen a fair number of issues with kswapd and other processes
      appearing to get stuck in v3.12-rc.  Using sysrq-p many times seems to
      indicate that it gets stuck somewhere in list_lru_walk_node(), called
      from prune_icache_sb() and super_cache_scan().
      
      I never seem to be able to trigger a calltrace for functions above that
      point.
      
      So I decided to add the following to super_cache_scan():
      
          @@ -81,10 +81,14 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
                  inodes = list_lru_count_node(&sb->s_inode_lru, sc->nid);
                  dentries = list_lru_count_node(&sb->s_dentry_lru, sc->nid);
                  total_objects = dentries + inodes + fs_objects + 1;
          +printk("%s:%u: %s: dentries %lu inodes %lu total %lu\n", current->comm, current->pid, __func__, dentries, inodes, total_objects);
      
                  /* proportion the scan between the caches */
                  dentries = mult_frac(sc->nr_to_scan, dentries, total_objects);
                  inodes = mult_frac(sc->nr_to_scan, inodes, total_objects);
          +printk("%s:%u: %s: dentries %lu inodes %lu\n", current->comm, current->pid, __func__, dentries, inodes);
          +BUG_ON(dentries == 0);
          +BUG_ON(inodes == 0);
      
                  /*
                   * prune the dcache first as the icache is pinned by it, then
          @@ -99,7 +103,7 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
                          freed += sb->s_op->free_cached_objects(sb, fs_objects,
                                                                 sc->nid);
                  }
          -
          +printk("%s:%u: %s: dentries %lu inodes %lu freed %lu\n", current->comm, current->pid, __func__, dentries, inodes, freed);
                  drop_super(sb);
                  return freed;
           }
      
      and shortly thereafter, having applied some pressure, I got this:
      
          update-apt-xapi:1616: super_cache_scan: dentries 25632 inodes 2 total 25635
          update-apt-xapi:1616: super_cache_scan: dentries 1023 inodes 0
          ------------[ cut here ]------------
          Kernel BUG at c0101994 [verbose debug info unavailable]
          Internal error: Oops - BUG: 0 [#3] SMP ARM
          Modules linked in: fuse rfcomm bnep bluetooth hid_cypress
          CPU: 0 PID: 1616 Comm: update-apt-xapi Tainted: G      D      3.12.0-rc7+ #154
          task: daea1200 ti: c3bf8000 task.ti: c3bf8000
          PC is at super_cache_scan+0x1c0/0x278
          LR is at trace_hardirqs_on+0x14/0x18
          Process update-apt-xapi (pid: 1616, stack limit = 0xc3bf8240)
          ...
          Backtrace:
            (super_cache_scan) from [<c00cd69c>] (shrink_slab+0x254/0x4c8)
            (shrink_slab) from [<c00d09a0>] (try_to_free_pages+0x3a0/0x5e0)
            (try_to_free_pages) from [<c00c59cc>] (__alloc_pages_nodemask+0x5)
            (__alloc_pages_nodemask) from [<c00e07c0>] (__pte_alloc+0x2c/0x13)
            (__pte_alloc) from [<c00e3a70>] (handle_mm_fault+0x84c/0x914)
            (handle_mm_fault) from [<c001a4cc>] (do_page_fault+0x1f0/0x3bc)
            (do_page_fault) from [<c001a7b0>] (do_translation_fault+0xac/0xb8)
            (do_translation_fault) from [<c000840c>] (do_DataAbort+0x38/0xa0)
            (do_DataAbort) from [<c00133f8>] (__dabt_usr+0x38/0x40)
      
      Notice that we had a very low number of inodes, which were reduced to
      zero my mult_frac().
      
      Now, prune_icache_sb() calls list_lru_walk_node() passing that number of
      inodes (0) into that as the number of objects to scan:
      
          long prune_icache_sb(struct super_block *sb, unsigned long nr_to_scan,
                               int nid)
          {
                  LIST_HEAD(freeable);
                  long freed;
      
                  freed = list_lru_walk_node(&sb->s_inode_lru, nid, inode_lru_isolate,
                                                 &freeable, &nr_to_scan);
      
      which does:
      
          unsigned long
          list_lru_walk_node(struct list_lru *lru, int nid, list_lru_walk_cb isolate,
                             void *cb_arg, unsigned long *nr_to_walk)
          {
      
                  struct list_lru_node    *nlru = &lru->node[nid];
                  struct list_head *item, *n;
                  unsigned long isolated = 0;
      
                  spin_lock(&nlru->lock);
          restart:
                  list_for_each_safe(item, n, &nlru->list) {
                          enum lru_status ret;
      
                          /*
                           * decrement nr_to_walk first so that we don't livelock if we
                           * get stuck on large numbesr of LRU_RETRY items
                           */
                          if (--(*nr_to_walk) == 0)
                                  break;
      
      So, if *nr_to_walk was zero when this function was entered, that means
      we're wanting to operate on (~0UL)+1 objects - which might as well be
      infinite.
      
      Clearly this is not correct behaviour.  If we think about the behaviour
      of this function when *nr_to_walk is 1, then clearly it's wrong - we
      decrement first and then test for zero - which results in us doing
      nothing at all.  A post-decrement would give the desired behaviour -
      we'd try to walk one object and one object only if *nr_to_walk were one.
      
      It also gives the correct behaviour for zero - we exit at this point.
      
      Fixes: 5cedf721 ("list_lru: fix broken LRU_RETRY behaviour")
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      [ Modified to make sure we never underflow the count: this function gets
        called in a loop, so the 0 -> ~0ul transition is dangerous  - Linus ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c56b097a
    • Linus Torvalds's avatar
      Merge tag 'tty-3.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · ced5d6b5
      Linus Torvalds authored
      Pull serial fixes from Greg KH:
       "Here are 3 tiny fixes that are needed for 3.12-final for some serial
        drivers.
      
        One of them is a revert of a broken patch, and two others are fixes
        for reported bugs.  All of these have been in linux-next for a while,
        I forgot I had not sent them to you yet, my fault"
      
      (Actually, Greg, you _had_ sent two of the three, so this pulls in just
      one actual new fix)
      
      * tag 'tty-3.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        tty/serial: at91: fix uart/usart selection for older products
      ced5d6b5
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · b8cab706
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Mainly Intel regression fixes and quirks, along with a simple one
        liner to fix rendernodes ioctl access (off by default, but testers
        want to test it)"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
        drm: allow DRM_IOCTL_VERSION on render-nodes
        drm/i915: Fix the PPT fdi lane bifurcate state handling on ivb
        drm/i915: No LVDS hardware on Intel D410PT and D425KT
        drm/i915/dp: workaround BIOS eDP bpp clamping issue
        drm/i915: Add HSW CRT output readout support
        drm/i915: Add support for pipe_bpp readout
      b8cab706