- 03 Jul, 2013 40 commits
-
-
Pavel Emelyanov authored
This is the implementation of the soft-dirty bit concept that should help keep track of changes in user memory, which in turn is very-very required by the checkpoint-restore project (http://criu.org). To create a dump of an application(s) we save all the information about it to files, and the biggest part of such dump is the contents of tasks' memory. However, there are usage scenarios where it's not required to get _all_ the task memory while creating a dump. For example, when doing periodical dumps, it's only required to take full memory dump only at the first step and then take incremental changes of memory. Another example is live migration. We copy all the memory to the destination node without stopping all tasks, then stop them, check for what pages has changed, dump it and the rest of the state, then copy it to the destination node. This decreases freeze time significantly. That said, some help from kernel to watch how processes modify the contents of their memory is required. The proposal is to track changes with the help of new soft-dirty bit this way: 1. First do "echo 4 > /proc/$pid/clear_refs". At that point kernel clears the soft dirty _and_ the writable bits from all ptes of process $pid. From now on every write to any page will result in #pf and the subsequent call to pte_mkdirty/pmd_mkdirty, which in turn will set the soft dirty flag. 2. Then read the /proc/$pid/pagemap2 and check the soft-dirty bit reported there (the 55'th one). If set, the respective pte was written to since last call to clear refs. The soft-dirty bit is the _PAGE_BIT_HIDDEN one. Although it's used by kmemcheck, the latter one marks kernel pages with it, while the former bit is put on user pages so they do not conflict to each other. This patch: A new clear-refs type will be added in the next patch, so prepare code for that. [akpm@linux-foundation.org: don't assume that sizeof(enum clear_refs_types) == sizeof(int)] Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kees Cook authored
The template lookup interface does not provide a way to use format strings, so make sure that the interface cannot be abused accidentally. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kees Cook authored
Disk names may contain arbitrary strings, so they must not be interpreted as format strings. It seems that only md allows arbitrary strings to be used for disk names, but this could allow for a local memory corruption from uid 0 into ring 0. CVE-2013-2851 Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jonathan Salwan authored
In drivers/cdrom/cdrom.c mmc_ioctl_cdrom_read_data() allocates a memory area with kmalloc in line 2885. 2885 cgc->buffer = kmalloc(blocksize, GFP_KERNEL); 2886 if (cgc->buffer == NULL) 2887 return -ENOMEM; In line 2908 we can find the copy_to_user function: 2908 if (!ret && copy_to_user(arg, cgc->buffer, blocksize)) The cgc->buffer is never cleaned and initialized before this function. If ret = 0 with the previous basic block, it's possible to display some memory bytes in kernel space from userspace. When we read a block from the disk it normally fills the ->buffer but if the drive is malfunctioning there is a chance that it would only be partially filled. The result is an leak information to userspace. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Cong Wang authored
There is a hole in struct hd_geometry, so we have to zero the struct on stack before copying it to user-space. Signed-off-by: Cong Wang <amwang@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Libo Chen authored
Without this patch, gdrom_major will leak when gd.cd_info alloc fails. Signed-off-by: Libo Chen <libo.chen@huawei.com> Cc: Jens Axboe <axboe@kernel.dk> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Xue jiufei authored
There may exist NULL pointer dereference in config_item_name() when one volume (say Volume A) unmounts while another (say Volume B) mounting. Volume A Volume B already Mounted. Unmounting, call o2hb_heartbeat_group_drop_item() -> config_item_put(item) set reg(A)->item.ci_name to NULL in function config_item_cleanup(). begin mounting, call o2hb_region_pin() and tranverse all regions. When reading reg(A)->item.ci_name, it causes NULL pointer dereference. call o2hb_region_release() and del reg(A) from list. So we should skip accessing regions that is going to release when tranverse o2hb_all_regions. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: joyce <xuejiufei@huawei.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jie Liu authored
Adjust switch..case syntax at o2net_state_change to meet the kernel coding standard. s/printk/pr_info/. [akpm@linux-foundation.org: revert pr_foo() change] Signed-off-by: Jie Liu <jeff.liu@oracle.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Gurudas Pai <gurudas.pai@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com> Cc: Srinivas Eeeda <srinivas.eeda@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: Tao Ma <tm@tao.ma> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jie Liu authored
Fix a comment typo in o2quo_hb_still_up() Signed-off-by: Jie Liu <jeff.liu@oracle.com> Cc: Gurudas Pai <gurudas.pai@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com> Cc: Srinivas Eeeda <srinivas.eeda@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: Tao Ma <tm@tao.ma> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jie Liu authored
s/o2hb_global_hearbeat_mode_set/o2hb_global_heartbeat_mode_set/ to make the signature of those routines in a consistent manner with others for heartbeating. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Acked-by: Sunil Mushran <sunil.mushran@gmail.com> Cc: Gurudas Pai <gurudas.pai@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com> Cc: Srinivas Eeeda <srinivas.eeda@oracle.com> Cc: Tao Ma <tm@tao.ma> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Noboru Iwamatsu authored
Under heavy I/O load, writing the disk heartbeat can be forced to wait for minutes, and this causes the node to be fenced. This patch tries to use WRITE_SYNC in submitting the heartbeat bio, so that writing the heartbeat will have a priority over other requests. Signed-off-by: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com> Acked-by: Tao Ma <tm@tao.ma> Acked-by: Sunil Mushran <sunil.mushran@gmail.com> Cc: Srinivas Eeeda <srinivas.eeda@oracle.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Tested-by: Gurudas Pai <gurudas.pai@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Junxiao Bi authored
Inlined xattr shared free space of inode block with inlined data or data extent record, so the size of the later two should be adjusted when inlined xattr is enabled. See ocfs2_xattr_ibody_init(). But this isn't done well when reflink. For inode with inlined data, its max inlined data size is adjusted in ocfs2_duplicate_inline_data(), no problem. But for inode with data extent record, its record count isn't adjusted. Fix it, or data extent record and inlined xattr may overwrite each other, then cause data corruption or xattr failure. One panic caused by this bug in our test environment is the following: kernel BUG at fs/ocfs2/xattr.c:1435! invalid opcode: 0000 [#1] SMP Pid: 10871, comm: multi_reflink_t Not tainted 2.6.39-300.17.1.el5uek #1 RIP: ocfs2_xa_offset_pointer+0x17/0x20 [ocfs2] RSP: e02b:ffff88007a587948 EFLAGS: 00010283 RAX: 0000000000000000 RBX: 0000000000000010 RCX: 00000000000051e4 RDX: ffff880057092060 RSI: 0000000000000f80 RDI: ffff88007a587a68 RBP: ffff88007a587948 R08: 00000000000062f4 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000010 R13: ffff88007a587a68 R14: 0000000000000001 R15: ffff88007a587c68 FS: 00007fccff7f06e0(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000015cf000 CR3: 000000007aa76000 CR4: 0000000000000660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process multi_reflink_t Call Trace: ocfs2_xa_reuse_entry+0x60/0x280 [ocfs2] ocfs2_xa_prepare_entry+0x17e/0x2a0 [ocfs2] ocfs2_xa_set+0xcc/0x250 [ocfs2] ocfs2_xattr_ibody_set+0x98/0x230 [ocfs2] __ocfs2_xattr_set_handle+0x4f/0x700 [ocfs2] ocfs2_xattr_set+0x6c6/0x890 [ocfs2] ocfs2_xattr_user_set+0x46/0x50 [ocfs2] generic_setxattr+0x70/0x90 __vfs_setxattr_noperm+0x80/0x1a0 vfs_setxattr+0xa9/0xb0 setxattr+0xc3/0x120 sys_fsetxattr+0xa8/0xd0 system_call_fastpath+0x16/0x1b Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Younger Liu authored
While deleting a file with ocfs2_unlink(), there is a bug in this function. This bug will result in filesystem read-only. After calling ocfs2_orphan_add(), the file which will be deleted is added into orphan dir. If ocfs2_delete_entry() fails, the file still exists in the parent dir. And this scenario introduces a conflict of metadata. If a file is added into orphan dir, when we put inode of the file with iput(), the inode i_flags is setted (~OCFS2_VALID_FL) in ocfs2_remove_inode(), and then write back to disk. But as previously mentioned, the file still exists in the parent dir. On other nodes, the file can be still accessed. When first read the file with ocfs2_read_blocks() from disk, It will check and avalidate inode using ocfs2_validate_inode_block(). So File system will be readonly because the inode is invalid. In other words, the inode i_flags has been set (~OCFS2_VALID_FL). [akpm@linux-foundation.org: cleanups] [jeff.liu@oracle.com: s/inode_is_unlinkable/ocfs2_inode_is_unlinkable/] Signed-off-by: Younger Liu <younger.liu@huawei.com> Signed-off-by: Jensen <shencanquan@huawei.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrew Morton authored
Cc: Jie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: Younger Liu <younger.liu@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jie Liu authored
In ocfs2_relink_block_group(), we roll back all those changes if notify intent to modify buffers for metadata update failed even if the relevant buffer has not yet been modified/got dirty at that point, that are not quite right because of: - None buffer has been modified/dirty if failed to call ocfs2_journal_access_gd() against the previous block group buffer - Only the previous block group buffer has got dirty if failed to call ocfs2_journal_access_gd() against the block group buffer - There is no need to roll back the change for file entry buffer at all Those problems will not cause anything wrong but unnecessary. This patch fix them and kill the useless bg_ptr variable as well. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Cc: Younger Liu <younger.liu@huawei.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Younger Liu authored
While adding a file into orphan dir in ocfs2_orphan_add(), it calls __ocfs2_add_entry() before ocfs2_journal_access_di(). If ocfs2_journal_access_di() failed, the file is added into orphan dir, and orphan dir dinode updated, but file dinode has not been updated. Accordingly, the data is not consistent between file dinode and orphan dir. So, need to call ocfs2_journal_access_di() before __ocfs2_add_entry(), and if ocfs2_journal_access_di() failed, orphan_fe and orphan_dir_inode->i_nlink need rollback. This bug was added by 3939fda4 ("Ocfs2: Journaling i_flags and i_orphaned_slot when adding inode to orphan dir."). Signed-off-by: Younger Liu <younger.liu@huawei.com> Acked-by: Jeff Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Xue jiufei authored
dlmlock_master() returns DLM_RECOVERING/DLM_MIGRATING/ DLM_FORWAR after adding lock to blocked list if lockres has the state DLM_LOCK_RES_RECOVERING/DLM_LOCK_RES_MIGRATING/ DLM_LOCK_RES_IN_PROGRESS. so it will retry in dlmlock(). And this may cause dlm_thread fall into an infinite loop Thread1 dlm_thread calls dlm_lock->dlmlock_master, if lockresA is in state DLM_LOCK_RES_RECOVERING, calls __dlm_wait_on_lockres() and waits until others threads clear this state; If cannot grant this lock, adding lock to blocked list, and return DLM_RECOVERING; Grant this lock and move it to grant list; After a while, retry and calls list_add_tail(), adding lock to blocked list again. Granted and blocked list of this lockres will become the following conditions: lock_res->granted.next = dlm_lock->list_head; lock_res->blocked.next = dlm_lock->list_head; dlm_lock->list_head.next = dlm_lock_resource->blocked; When dlm_thread traverses the granted list, it will fall into an endless loop, checking dlm_lock.list_head, dlm_lock->list_head.next (i.e.lock_res->blocked), lock_res->blocked.next(i.e.dlm_lock.list_head again) ..... Signed-off-by: joyce <xuejiufei@huawei.com> Reviewed-by: jensen <shencanquan@huawei.com> Cc: Jeff Liu <jeff.liu@oracle.com> Acked-by: Sunil Mushran <sunil.mushran@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Junxiao Bi authored
Free space checking will be done in ocfs2_xattr_ibody_init(). So remove here. [akpm@linux-foundation.org: remove unused local] Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Younger Liu authored
There is a memory leak in sc_kref_release(). When free struct o2net_sock_container (sc), we should release sc->sc_page. Signed-off-by: Younger Liu <younger.liu@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Goldwyn Rodrigues authored
While adding extends to a file, the credits are calculated incorrectly and if the requested clusters is more than one (or more because we used a conservative limit) then we run out of journal credits and we hit an assert in journalling code. The function parameter bits_wanted variable was not used at all. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joseph Qi authored
In ocfs2_remove_btree_range, when calling ocfs2_lock_refcount_tree and ocfs2_prepare_refcount_change_for_del failed, it goes to out and then tries to call mutex_unlock without mutex_lock before. And when calling ocfs2_reserve_blocks_for_rec_trunc failed, it should free ref_tree before return. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Goldwyn Rodrigues authored
Code cleanup: needs_checkpoint is assigned to but never used. Delete the variable. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Jeff Liu <jeff.liu@oracle.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Xue jiufei authored
dlm_begin_reco_handler() returns without putting dlm when dlm recovery state is DLM_RECO_STATE_FINALIZE. Signed-off-by: joyce <xuejiufei@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joseph Qi authored
If we use le32_add_cpu to set ocfs2_dinode i_flags, it may lead to the corresponding flag corrupted. So we should change it to bitwise and/or operation. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: shencanquan <shencanquan@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joseph Qi authored
In dlm_request_all_locks, ret is type enum. But o2net_send_message returns a type int value. Then it will never run into the following error branch. So we should change the ret type from enum to int. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Sunil Mushran <sunil.mushran@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joseph Qi authored
Below 3 functions have already been declared in dlmcommon.h, so we have no need to declare them again in dlmrecovery.c: dlm_complete_recovery_thread dlm_launch_recovery_thread dlm_kick_recovery_thread Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Sunil Mushran <sunil.mushran@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Dan Carpenter authored
The difference between "count" and "len" is that "len" is capped at 4095. Changing it like this makes it match how sysfs_write_file() is implemented. This is a static analysis patch. I haven't found any store_attribute() functions where this change makes a difference. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrew Morton authored
spacr64 gcc-3.4.5 (at least) spits this back. Cc: Andrey Smirnov <andrey.smirnov@convergeddevices.net> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Bartlomiej Zolnierkiewicz authored
tasklet_kill() may sleep so call it before taking pch->lock. Fixes following lockup: BUG: scheduling while atomic: cat/2383/0x00000002 Modules linked in: unwind_backtrace+0x0/0xfc __schedule_bug+0x4c/0x58 __schedule+0x690/0x6e0 sys_sched_yield+0x70/0x78 tasklet_kill+0x34/0x8c pl330_free_chan_resources+0x24/0x88 dma_chan_put+0x4c/0x50 [...] BUG: spinlock lockup suspected on CPU#0, swapper/0/0 lock: 0xe52aa04c, .magic: dead4ead, .owner: cat/2383, .owner_cpu: 1 unwind_backtrace+0x0/0xfc do_raw_spin_lock+0x194/0x204 _raw_spin_lock_irqsave+0x20/0x28 pl330_tasklet+0x2c/0x5a8 tasklet_action+0xfc/0x114 __do_softirq+0xe4/0x19c irq_exit+0x98/0x9c handle_IPI+0x124/0x16c gic_handle_irq+0x64/0x68 __irq_svc+0x40/0x70 cpuidle_wrap_enter+0x4c/0xa0 cpuidle_enter_state+0x18/0x68 cpuidle_idle_call+0xac/0xe0 cpu_idle+0xac/0xf0 Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Jassi Brar <jassisinghbrar@gmail.com> Cc: Vinod Koul <vinod.koul@linux.intel.com> Cc: Tomasz Figa <t.figa@samsung.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Chen Gang authored
Need include "asm/uaccess.h" to pass compiling. The related error (with allmodconfig): arch/c6x/mm/init.c: In function `paging_init': arch/c6x/mm/init.c:46:2: error: implicit declaration of function `set_fs' [-Werror=implicit-function-declaration] arch/c6x/mm/init.c:46:9: error: `KERNEL_DS' undeclared (first use in this function) arch/c6x/mm/init.c:46:9: note: each undeclared identifier is reported only once for each function it appears in Signed-off-by: Chen Gang <gang.chen@asianux.com> Cc: <stable@vger.kernel.org> [3.10.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrew Morton authored
Commit f21afc25 ("smp.h: Use local_irq_{save,restore}() in !SMP version of on_each_cpu()") converted on_each_cpu() to a C function. This required inclusion of irqflags.h, which broke ia64 and mn10300 (at least) due to header ordering hell. Switch on_each_cpu() back to a macro to fix this. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: David Daney <david.daney@cavium.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: <stable@vger.kernel.org> [3.10.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64Linus Torvalds authored
Pull ARM64 updates from Catalin Marinas: "Main features: - KVM and Xen ports to AArch64 - Hugetlbfs and transparent huge pages support for arm64 - Applied Micro X-Gene Kconfig entry and dts file - Cache flushing improvements For arm64 huge pages support, there are x86 changes moving part of arch/x86/mm/hugetlbpage.c into mm/hugetlb.c to be re-used by arm64" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64: (66 commits) arm64: Add initial DTS for APM X-Gene Storm SOC and APM Mustang board arm64: Add defines for APM ARMv8 implementation arm64: Enable APM X-Gene SOC family in the defconfig arm64: Add Kconfig option for APM X-Gene SOC family arm64/Makefile: provide vdso_install target ARM64: mm: THP support. ARM64: mm: Raise MAX_ORDER for 64KB pages and THP. ARM64: mm: HugeTLB support. ARM64: mm: Move PTE_PROT_NONE bit. ARM64: mm: Make PAGE_NONE pages read only and no-execute. ARM64: mm: Restore memblock limit when map_mem finished. mm: thp: Correct the HPAGE_PMD_ORDER check. x86: mm: Remove general hugetlb code from x86. mm: hugetlb: Copy general hugetlb code from x86 to mm. x86: mm: Remove x86 version of huge_pmd_share. mm: hugetlb: Copy huge_pmd_share from x86 to mm. arm64: KVM: document kernel object mappings in HYP arm64: KVM: MAINTAINERS update arm64: KVM: userspace API documentation arm64: KVM: enable initialization of a 32bit vcpu ...
-
git://git.linaro.org/people/rmk/linux-armLinus Torvalds authored
Pull ARM updates from Russell King: "This contains the usual updates from other people (listed below) and the usual random muddle of miscellaneous ARM updates which cover some low priority bug fixes and performance improvements. I've started to put the pull request wording into the merge commits, which are: - NoMMU stuff: This includes the following series sent earlier to the list: - nommu-fixes - R7 Support - MPU support I've left out the ARCH_MULTIPLATFORM/!MMU stuff that Arnd and I were discussing today until we've reached a conclusion/that's had some more review. This is rebased (and re-tested) on your devel-stable branch because otherwise there were going to be conflicts with Uwe's V7M work now that you've merged that. I've included the fix for limiting MPU to CPU_V7. - Huge page support These changes bring both HugeTLB support and Transparent HugePage (THP) support to ARM. Only long descriptors (LPAE) are supported in this series. The code has been tested on an Arndale board (Exynos 5250). - LPAE updates Please pull these miscellaneous LPAE fixes I've been collecting for a while now for 3.11. They've been tested and reviewed by quite a few people, and most of the patches are pretty trivial. -- Will Deacon. - arch_timer cleanups Please pull these arch_timer cleanups I've been holding onto for a while. They're the same as my last posting, but have been rebased to v3.10-rc3. - mpidr linearisation (multiprocessor id register - identifies which CPU number we are in the system) This patch series that implements MPIDR linearization through a simple hashing algorithm and updates current cpu_{suspend}/{resume} code to use the newly created hash structures to retrieve context pointers. It represents a stepping stone for the implementation of power management code on forthcoming multi-cluster ARM systems. It has been tested on TC2 (dual cluster A15xA7 system), iMX6q, OMAP4 and Tegra, with processors hitting low-power states requiring warm-boot resume through the cpu_resume code path" * 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (77 commits) ARM: 7775/1: mm: Remove do_sect_fault from LPAE code ARM: 7777/1: Avoid extra calls to the C compiler ARM: 7774/1: Fix dtb dependency to use order-only prerequisites ARM: 7770/1: remove residual ARMv2 support from decompressor ARM: 7769/1: Cortex-A15: fix erratum 798181 implementation ARM: 7768/1: prevent risks of out-of-bound access in ASID allocator ARM: 7767/1: let the ASID allocator handle suspended animation ARM: 7766/1: versatile: don't mark pen as __INIT ARM: 7765/1: perf: Record the user-mode PC in the call chain. ARM: 7735/2: Preserve the user r/w register TPIDRURW on context switch and fork ARM: kernel: implement stack pointer save array through MPIDR hashing ARM: kernel: build MPIDR hash function data structure ARM: mpu: Ensure that MPU depends on CPU_V7 ARM: mpu: protect the vectors page with an MPU region ARM: mpu: Allow enabling of the MPU via kconfig ARM: 7758/1: introduce config HAS_BANDGAP ARM: 7757/1: mm: don't flush icache in switch_mm with hardware broadcasting ARM: 7751/1: zImage: don't overwrite ourself with a page table ARM: 7749/1: spinlock: retry trylock operation if strex fails on free lock ARM: 7748/1: oabi: handle faults when loading swi instruction from userspace ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull second set of VFS changes from Al Viro: "Assorted f_pos race fixes, making do_splice_direct() safe to call with i_mutex on parent, O_TMPFILE support, Jeff's locks.c series, ->d_hash/->d_compare calling conventions changes from Linus, misc stuff all over the place." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) Document ->tmpfile() ext4: ->tmpfile() support vfs: export lseek_execute() to modules lseek_execute() doesn't need an inode passed to it block_dev: switch to fixed_size_llseek() cpqphp_sysfs: switch to fixed_size_llseek() tile-srom: switch to fixed_size_llseek() proc_powerpc: switch to fixed_size_llseek() ubi/cdev: switch to fixed_size_llseek() pci/proc: switch to fixed_size_llseek() isapnp: switch to fixed_size_llseek() lpfc: switch to fixed_size_llseek() locks: give the blocked_hash its own spinlock locks: add a new "lm_owner_key" lock operation locks: turn the blocked_list into a hashtable locks: convert fl_link to a hlist_node locks: avoid taking global lock if possible when waking up blocked waiters locks: protect most of the file_lock handling with i_lock locks: encapsulate the fl_link list handling locks: make "added" in __posix_lock_file a bool ...
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
very similar to ext3 counterpart... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Jie Liu authored
For those file systems(btrfs/ext4/ocfs2/tmpfs) that support SEEK_DATA/SEEK_HOLE functions, we end up handling the similar matter in lseek_execute() to update the current file offset to the desired offset if it is valid, ceph also does the simliar things at ceph_llseek(). To reduce the duplications, this patch make lseek_execute() public accessible so that we can call it directly from the underlying file systems. Thanks Dave Chinner for this suggestion. [AV: call it vfs_setpos(), don't bring the removed 'inode' argument back] v2->v1: - Add kernel-doc comments for lseek_execute() - Call lseek_execute() in ceph->llseek() Signed-off-by: Jie Liu <jeff.liu@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <andi@firstfloor.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Chris Mason <chris.mason@fusionio.com> Cc: Josef Bacik <jbacik@fusionio.com> Cc: Ben Myers <bpm@sgi.com> Cc: Ted Tso <tytso@mit.edu> Cc: Hugh Dickins <hughd@google.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Sage Weil <sage@inktank.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroupLinus Torvalds authored
Pull cpuset changes from Tejun Heo: "cpuset has always been rather odd about its configurations - a cgroup right after creation didn't allow any task executions before configuration, changing configuration in the parent modifies the descendants irreversibly and so on. These behaviors are inherently nasty and almost hostile against sharing the hierarchy with other controllers making it very difficult to use in unified hierarchy. Li is currently in the process of updating the behaviors for __DEVEL__sane_behavior which is the bulk of changes in this pull request. It isn't complete yet and the behaviors will change further but all changes are gated behind sane_behavior. In the process, the rather hairy work-item punting which was used to work around the limitations of cgroup descendant iterator was simplified." * 'for-3.11-cpuset' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cpuset: rename @cont to @cgrp cpuset: fix to migrate mm correctly in a corner case cpuset: allow to move tasks to empty cpusets cpuset: allow to keep tasks in empty cpusets cpuset: introduce effective_{cpumask|nodemask}_cpuset() cpuset: record old_mems_allowed in struct cpuset cpuset: remove async hotplug propagation work cpuset: let hotplug propagation work wait for task attaching cpuset: re-structure update_cpumask() a bit cpuset: remove cpuset_test_cpumask() cpuset: remove unnecessary variable in cpuset_attach() cpuset: cleanup guarantee_online_{cpus|mems}() cpuset: remove redundant check in cpuset_cpus_allowed_fallback()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroupLinus Torvalds authored
Pull cgroup changes from Tejun Heo: "This pull request contains the following changes. - cgroup_subsys_state (css) reference counting has been converted to percpu-ref. css is what each resource controller embeds into its own control structure and perform reference count against. It may be used in hot paths of various subsystems and is similar to module refcnt in that aspect. For example, block-cgroup's css refcnting was showing up a lot in Mikulaus's device-mapper scalability work and this should alleviate it. - cgroup subtree iterator has been updated so that RCU read lock can be released after grabbing reference. This allows simplifying its users which requires blocking which used to build iteration list under RCU read lock and then traverse it outside. This pull request contains simplification of cgroup core and device-cgroup. A separate pull request will update cpuset. - Fixes for various bugs including corner race conditions and RCU usage bugs. - A lot of cleanups and some prepartory work for the planned unified hierarchy support." * 'for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (48 commits) cgroup: CGRP_ROOT_SUBSYS_BOUND should also be ignored when mounting an existing hierarchy cgroup: CGRP_ROOT_SUBSYS_BOUND should be ignored when comparing mount options cgroup: fix deadlock on cgroup_mutex via drop_parsed_module_refcounts() cgroup: always use RCU accessors for protected accesses cgroup: fix RCU accesses around task->cgroups cgroup: fix RCU accesses to task->cgroups cgroup: grab cgroup_mutex in drop_parsed_module_refcounts() cgroup: fix cgroupfs_root early destruction path cgroup: reserve ID 0 for dummy_root and 1 for unified hierarchy cgroup: implement for_each_[builtin_]subsys() cgroup: move init_css_set initialization inside cgroup_mutex cgroup: s/for_each_subsys()/for_each_root_subsys()/ cgroup: clean up find_css_set() and friends cgroup: remove cgroup->actual_subsys_mask cgroup: prefix global variables with "cgroup_" cgroup: convert CFTYPE_* flags to enums cgroup: rename cont to cgrp cgroup: clean up cgroup_serial_nr_cursor cgroup: convert cgroup_cft_commit() to use cgroup_for_each_descendant_pre() cgroup: make serial_nr_cursor available throughout cgroup.c ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds authored
Pull workqueue changes from Tejun Heo: "Surprisingly, Lai and I didn't break too many things implementing custom pools and stuff last time around and there aren't any follow-up changes necessary at this point. The only change in this pull request is Viresh's patches to make some per-cpu workqueues to behave as unbound workqueues dependent on a boot param whose default can be configured via a config option. This leads to higher processing overhead / lower bandwidth as more work items are bounced across CPUs; however, it can lead to noticeable powersave in certain configurations - ~10% w/ idlish constant workload on a big.LITTLE configuration according to Viresh. This is because per-cpu workqueues interfere with how the scheduler perceives whether or not each CPU is idle by forcing pinned tasks on them, which makes the scheduler's power-aware scheduling decisions less effective. Its effectiveness is likely less pronounced on homogenous configurations and this type of optimization can probably be made automatic; however, the changes are pretty minimal and the affected workqueues are clearly marked, so it's an easy gain for some configurations for the time being with pretty unintrusive changes." * 'for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: fbcon: queue work on power efficient wq block: queue work on power efficient wq PHYLIB: queue work on system_power_efficient_wq workqueue: Add system wide power_efficient workqueues workqueues: Introduce new flag WQ_POWER_EFFICIENT for power oriented workqueues
-