1. 22 Jan, 2010 4 commits
    • Daisuke Nishimura's avatar
      memcg: ensure list is empty at rmdir · 4334ab76
      Daisuke Nishimura authored
      commit fce66477 upstream.
      
      Current mem_cgroup_force_empty() only ensures mem->res.usage == 0 on
      success.  But this doesn't guarantee memcg's LRU is really empty, because
      there are some cases in which !PageCgrupUsed pages exist on memcg's LRU.
      
      For example:
      - Pages can be uncharged by its owner process while they are on LRU.
      - race between mem_cgroup_add_lru_list() and __mem_cgroup_uncharge_common().
      
      So there can be a case in which the usage is zero but some of the LRUs are not empty.
      
      OTOH, mem_cgroup_del_lru_list(), which can be called asynchronously with
      rmdir, accesses the mem_cgroup, so this access can cause a problem if it
      races with rmdir because the mem_cgroup might have been freed by rmdir.
      
      Actually, I saw a bug which seems to be caused by this race.
      
      	[1530745.949906] BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
      	[1530745.950651] IP: [<ffffffff810fbc11>] mem_cgroup_del_lru_list+0x30/0x80
      	[1530745.950651] PGD 3863de067 PUD 3862c7067 PMD 0
      	[1530745.950651] Oops: 0002 [#1] SMP
      	[1530745.950651] last sysfs file: /sys/devices/system/cpu/cpu7/cache/index1/shared_cpu_map
      	[1530745.950651] CPU 3
      	[1530745.950651] Modules linked in: configs ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp nfsd nfs_acl auth_rpcgss exportfs autofs4 hidp rfcomm l2cap crc16 bluetooth lockd sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_multipath scsi_dh video output sbs sbshc battery ac lp kvm_intel kvm sg ide_cd_mod cdrom serio_raw tpm_tis tpm tpm_bios acpi_memhotplug button parport_pc parport rtc_cmos rtc_core rtc_lib e1000 i2c_i801 i2c_core pcspkr dm_region_hash dm_log dm_mod ata_piix libata shpchp megaraid_mbox sd_mod scsi_mod megaraid_mm ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: freq_table]
      	[1530745.950651] Pid: 19653, comm: shmem_test_02 Tainted: G   M       2.6.32-mm1-00701-g2b04386 #3 Express5800/140Rd-4 [N8100-1065]
      	[1530745.950651] RIP: 0010:[<ffffffff810fbc11>]  [<ffffffff810fbc11>] mem_cgroup_del_lru_list+0x30/0x80
      	[1530745.950651] RSP: 0018:ffff8803863ddcb8  EFLAGS: 00010002
      	[1530745.950651] RAX: 00000000000001e0 RBX: ffff8803abc02238 RCX: 00000000000001e0
      	[1530745.950651] RDX: 0000000000000000 RSI: ffff88038611a000 RDI: ffff8803abc02238
      	[1530745.950651] RBP: ffff8803863ddcc8 R08: 0000000000000002 R09: ffff8803a04c8643
      	[1530745.950651] R10: 0000000000000000 R11: ffffffff810c7333 R12: 0000000000000000
      	[1530745.950651] R13: ffff880000017f00 R14: 0000000000000092 R15: ffff8800179d0310
      	[1530745.950651] FS:  0000000000000000(0000) GS:ffff880017800000(0000) knlGS:0000000000000000
      	[1530745.950651] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      	[1530745.950651] CR2: 0000000000000230 CR3: 0000000379d87000 CR4: 00000000000006e0
      	[1530745.950651] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      	[1530745.950651] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      	[1530745.950651] Process shmem_test_02 (pid: 19653, threadinfo ffff8803863dc000, task ffff88038612a8a0)
      	[1530745.950651] Stack:
      	[1530745.950651]  ffffea00040c2fe8 0000000000000000 ffff8803863ddd98 ffffffff810c739a
      	[1530745.950651] <0> 00000000863ddd18 000000000000000c 0000000000000000 0000000000000000
      	[1530745.950651] <0> 0000000000000002 0000000000000000 ffff8803863ddd68 0000000000000046
      	[1530745.950651] Call Trace:
      	[1530745.950651]  [<ffffffff810c739a>] release_pages+0x142/0x1e7
      	[1530745.950651]  [<ffffffff810c778f>] ? pagevec_move_tail+0x6e/0x112
      	[1530745.950651]  [<ffffffff810c781e>] pagevec_move_tail+0xfd/0x112
      	[1530745.950651]  [<ffffffff810c78a9>] lru_add_drain+0x76/0x94
      	[1530745.950651]  [<ffffffff810dba0c>] exit_mmap+0x6e/0x145
      	[1530745.950651]  [<ffffffff8103f52d>] mmput+0x5e/0xcf
      	[1530745.950651]  [<ffffffff81043ea8>] exit_mm+0x11c/0x129
      	[1530745.950651]  [<ffffffff8108fb29>] ? audit_free+0x196/0x1c9
      	[1530745.950651]  [<ffffffff81045353>] do_exit+0x1f5/0x6b7
      	[1530745.950651]  [<ffffffff8106133f>] ? up_read+0x2b/0x2f
      	[1530745.950651]  [<ffffffff8137d187>] ? lockdep_sys_exit_thunk+0x35/0x67
      	[1530745.950651]  [<ffffffff81045898>] do_group_exit+0x83/0xb0
      	[1530745.950651]  [<ffffffff810458dc>] sys_exit_group+0x17/0x1b
      	[1530745.950651]  [<ffffffff81002c1b>] system_call_fastpath+0x16/0x1b
      	[1530745.950651] Code: 54 53 0f 1f 44 00 00 83 3d cc 29 7c 00 00 41 89 f4 75 63 eb 4e 48 83 7b 08 00 75 04 0f 0b eb fe 48 89 df e8 18 f3 ff ff 44 89 e2 <48> ff 4c d0 50 48 8b 05 2b 2d 7c 00 48 39 43 08 74 39 48 8b 4b
      	[1530745.950651] RIP  [<ffffffff810fbc11>] mem_cgroup_del_lru_list+0x30/0x80
      	[1530745.950651]  RSP <ffff8803863ddcb8>
      	[1530745.950651] CR2: 0000000000000230
      	[1530745.950651] ---[ end trace c3419c1bb8acc34f ]---
      	[1530745.950651] Fixing recursive fault but reboot is needed!
      
      The problem here is pages on LRU may contain pointer to stale memcg.  To
      make res->usage to be 0, all pages on memcg must be uncharged or moved to
      another(parent) memcg.  Moved page_cgroup have already removed from
      original LRU, but uncharged page_cgroup contains pointer to memcg withou
      PCG_USED bit.  (This asynchronous LRU work is for improving performance.)
      If PCG_USED bit is not set, page_cgroup will never be added to memcg's
      LRU.  So, about pages not on LRU, they never access stale pointer.  Then,
      what we have to take care of is page_cgroup _on_ LRU list.  This patch
      fixes this problem by making mem_cgroup_force_empty() visit all LRUs
      before exiting its loop and guarantee there are no pages on its LRU.
      Signed-off-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      4334ab76
    • Mark Brown's avatar
      revert "drivers/video/s3c-fb.c: fix clock setting for Samsung SoC Framebuffer" · 70f800ff
      Mark Brown authored
      commit eb29a5cc upstream.
      
      Fix divide by zero and broken output.  Commit 600ce1a0 ("fix clock
      setting for Samsung SoC Framebuffer") introduced a mandatory refresh
      parameter to the platform data for the S3C framebuffer but did not
      introduce any validation code, causing existing platforms (none of which
      have refresh set) to divide by zero whenever the framebuffer is
      configured, generating warnings and unusable output.
      
      Ben Dooks noted several problems with the patch:
      
       - The platform data supplies the pixclk directly and should already
         have taken care of the refresh rate.
       - The addition of a window ID parameter doesn't help since only the
         root framebuffer can control the pixclk.
       - pixclk is specified in picoseconds (rather than Hz) as the patch
         assumed.
      
      and suggests reverting the commit so do that.  Without fixing this no
      mainline user of the driver will produce output.
      
      [akpm@linux-foundation.org: don't revert the correct bit]
      Signed-off-by: default avatarMark Brown <broonie@opensource.wolfsonmicro.com>
      Cc: InKi Dae <inki.dae@samsung.com>
      Cc: Kyungmin Park <kmpark@infradead.org>
      Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Ben Dooks <ben-linux@fluff.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      70f800ff
    • Eric Paris's avatar
      inotify: only warn once for inotify problems · 800c0283
      Eric Paris authored
      commit 976ae32b upstream.
      
      inotify will WARN() if it finds that the idr and the fsnotify internals
      somehow got out of sync.  It was only supposed to do this once but due
      to this stupid bug it would warn every single time a problem was
      detected.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      800c0283
    • Eric Paris's avatar
      inotify: do not reuse watch descriptors · cec3ad60
      Eric Paris authored
      commit 9e572cc9 upstream.
      
      Since commit 7e790dd5 ("inotify: fix
      error paths in inotify_update_watch") inotify changed the manor in which
      it gave watch descriptors back to userspace.  Previous to this commit
      inotify acted like the following:
      
        inotify_add_watch(X, Y, Z) = 1
        inotify_rm_watch(X, 1);
        inotify_add_watch(X, Y, Z) = 2
      
      but after this patch inotify would return watch descriptors like so:
      
        inotify_add_watch(X, Y, Z) = 1
        inotify_rm_watch(X, 1);
        inotify_add_watch(X, Y, Z) = 1
      
      which I saw as equivalent to opening an fd where
      
        open(file) = 1;
        close(1);
        open(file) = 1;
      
      seemed perfectly reasonable.  The issue is that quite a bit of userspace
      apparently relies on the behavior in which watch descriptors will not be
      quickly reused.  KDE relies on it, I know some selinux packages rely on
      it, and I have heard complaints from other random sources such as debian
      bug 558981.
      
      Although the man page implies what we do is ok, we broke userspace so
      this patch almost reverts us to the old behavior.  It is still slightly
      racey and I have patches that would fix that, but they are rather large
      and this will fix it for all real world cases.  The race is as follows:
      
       - task1 creates a watch and blocks in idr_new_watch() before it updates
         the hint.
       - task2 creates a watch and updates the hint.
       - task1 updates the hint with it's older wd
       - task removes the watch created by task2
       - task adds a new watch and will reuse the wd originally given to task2
      
      it requires moving some locking around the hint (last_wd) but this should
      solve it for the real world and be -stable safe.
      
      As a side effect this patch papers over a bug in the lib/idr code which
      is causing a large number WARN's to pop on people's system and many
      reports in kerneloops.org.  I'm working on the root cause of that idr
      bug seperately but this should make inotify immune to that issue.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      cec3ad60
  2. 18 Jan, 2010 36 commits