- 26 Jun, 2012 1 commit
-
-
Tejun Heo authored
Currently, request_queue has one request_list to allocate requests from regardless of blkcg of the IO being issued. When the unified request pool is used up, cfq proportional IO limits become meaningless - whoever grabs the next request being freed wins the race regardless of the configured weights. This can be easily demonstrated by creating a blkio cgroup w/ very low weight, put a program which can issue a lot of random direct IOs there and running a sequential IO from a different cgroup. As soon as the request pool is used up, the sequential IO bandwidth crashes. This patch implements per-blkg request_list. Each blkg has its own request_list and any IO allocates its request from the matching blkg making blkcgs completely isolated in terms of request allocation. * Root blkcg uses the request_list embedded in each request_queue, which was renamed to @q->root_rl from @q->rq. While making blkcg rl handling a bit harier, this enables avoiding most overhead for root blkcg. * Queue fullness is properly per request_list but bdi isn't blkcg aware yet, so congestion state currently just follows the root blkcg. As writeback isn't aware of blkcg yet, this works okay for async congestion but readahead may get the wrong signals. It's better than blkcg completely collapsing with shared request_list but needs to be improved with future changes. * After this change, each block cgroup gets a full request pool making resource consumption of each cgroup higher. This makes allowing non-root users to create cgroups less desirable; however, note that allowing non-root users to directly manage cgroups is already severely broken regardless of this patch - each block cgroup consumes kernel memory and skews IO weight (IO weights are not hierarchical). v2: queue-sysfs.txt updated and patch description udpated as suggested by Vivek. v3: blk_get_rl() wasn't checking error return from blkg_lookup_create() and may cause oops on lookup failure. Fix it by falling back to root_rl on blkg lookup failures. This problem was spotted by Rakesh Iyer <rni@google.com>. v4: Updated to accomodate 458f27a9 "block: Avoid missed wakeup in request waitqueue". blk_drain_queue() now wakes up waiters on all blkg->rl on the target queue. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 25 Jun, 2012 9 commits
-
-
Tejun Heo authored
Request allocation is about to be made per-blkg meaning that there'll be multiple request lists. * Make queue full state per request_list. blk_*queue_full() functions are renamed to blk_*rl_full() and takes @rl instead of @q. * Rename blk_init_free_list() to blk_init_rl() and make it take @rl instead of @q. Also add @gfp_mask parameter. * Add blk_exit_rl() instead of destroying rl directly from blk_release_queue(). * Add request_list->q and make request alloc/free functions - blk_free_request(), [__]freed_request(), __get_request() - take @rl instead of @q. This patch doesn't introduce any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Tejun Heo authored
Add q->nr_rqs[] which currently behaves the same as q->rq.count[] and move q->rq.elvpriv to q->nr_rqs_elvpriv. blk_drain_queue() is updated to use q->nr_rqs[] instead of q->rq.count[]. These counters separates queue-wide request statistics from the request list and allow implementation of per-queue request allocation. While at it, properly indent fields of struct request_list. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Tejun Heo authored
Make bio_blkcg() and friends inline. They all are very simple and used only in few places. This patch is to prepare for further updates to request allocation path. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Tejun Heo authored
Block layer very lazy allocation of ioc. It waits until the moment ioc is absolutely necessary; unfortunately, that time could be inside queue lock and __get_request() performs unlock - try alloc - retry dancing. Just allocate it up-front on entry to block layer. We're not saving the rain forest by deferring it to the last possible moment and complicating things unnecessarily. This patch is to prepare for further updates to request allocation path. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Tejun Heo authored
Currently, there are two request allocation functions - get_request() and get_request_wait(). The former tries to allocate a request once and the latter keeps retrying until it succeeds. The latter wraps the former and keeps retrying until allocation succeeds. The combination of two functions deliver fallible non-wait allocation, fallible wait allocation and unfailing wait allocation. However, given that forward progress is guaranteed, fallible wait allocation isn't all that useful and in fact nobody uses it. This patch simplifies the interface as follows. * get_request() is renamed to __get_request() and is only used by the wrapper function. * get_request_wait() is renamed to get_request(). It now takes @gfp_mask and retries iff it contains %__GFP_WAIT. This patch doesn't introduce any functional change and is to prepare for further updates to request allocation path. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Tejun Heo authored
iscsi_remove_host() uses bsg_remove_queue() which implements custom queue draining. fc_bsg_remove() open-codes mostly identical logic. The draining logic isn't correct in that blk_stop_queue() doesn't prevent new requests from being queued - it just stops processing, so nothing prevents new requests to be queued after the logic determines that the queue is drained. blk_cleanup_queue() now implements proper queue draining and these custom draining logics aren't necessary. Drop them and use bsg_unregister_queue() + blk_cleanup_queue() instead. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: James Smart <james.smart@emulex.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Tejun Heo authored
mempool_create_node() currently assumes %GFP_KERNEL. Its only user, blk_init_free_list(), is about to be updated to use other allocation flags - add @gfp_mask argument to the function. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Tejun Heo authored
Currently, blkcg_activate_policy() depends on %GFP_ATOMIC allocation from __blkg_lookup_create() for root blkcg creation. This could make policy fail unnecessarily. Make blkg_alloc() take @gfp_mask, __blkg_lookup_create() take an optional @new_blkg for preallocated blkg, and blkcg_activate_policy() preload radix tree and preallocate blkg with %GFP_KERNEL before trying to create the root blkg. v2: __blkg_lookup_create() was returning %NULL on blkg alloc failure instead of ERR_PTR() value. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Tejun Heo authored
There's no point in calling radix_tree_preload() if preloading doesn't use more permissible GFP mask. Drop preloading from __blkg_lookup_create(). While at it, drop sparse locking annotation which no longer applies. v2: Vivek pointed out the odd preload usage. Instead of updating, just drop it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 15 Jun, 2012 4 commits
-
-
Jan Kara authored
Sometimes, warnings about ioctls to partition happen often enough that they form majority of the warnings in the kernel log and users complain. In some cases warnings are about ioctls such as SG_IO so it's not good to get rid of the warnings completely as they can ease debugging of userspace problems when ioctl is refused. Since I have seen warnings from lots of commands, including some proprietary userspace applications, I don't think disallowing the ioctls for processes with CAP_SYS_RAWIO will happen in the near future if ever. So lets just stop warning for processes with CAP_SYS_RAWIO for which ioctl is allowed. CC: Paolo Bonzini <pbonzini@redhat.com> CC: James Bottomley <JBottomley@parallels.com> CC: linux-scsi@vger.kernel.org Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Asias He authored
This function was only used by btrfs code in btrfs_abort_devices() (seems in a wrong way). It was removed in commit d07eb911, So, Let's remove the dead code to avoid any confusion. Changes in v2: update commit log, btrfs_abort_devices() was removed already. Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-kernel@vger.kernel.org Cc: Chris Mason <chris.mason@oracle.com> Cc: linux-btrfs@vger.kernel.org Cc: David Sterba <dave@jikos.cz> Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Asias He authored
Commit 777eb1bf disconnects externally supplied queue_lock before blk_drain_queue(). Switching the lock would introduce lock unbalance because theads which have taken the external lock might unlock the internal lock in the during the queue drain. This patch mitigate this by disconnecting the lock after the queue draining since queue draining makes a lot of request_queue users go away. However, please note, this patch only makes the problem less likely to happen. Anyone who still holds a ref might try to issue a new request on a dead queue after the blk_cleanup_queue() finishes draining, the lock unbalance might still happen in this case. ===================================== [ BUG: bad unlock balance detected! ] 3.4.0+ #288 Not tainted ------------------------------------- fio/17706 is trying to release lock (&(&q->__queue_lock)->rlock) at: [<ffffffff81329372>] blk_queue_bio+0x2a2/0x380 but there are no more locks to release! other info that might help us debug this: 1 lock held by fio/17706: #0: (&(&vblk->lock)->rlock){......}, at: [<ffffffff81327f1a>] get_request_wait+0x19a/0x250 stack backtrace: Pid: 17706, comm: fio Not tainted 3.4.0+ #288 Call Trace: [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380 [<ffffffff810dea49>] print_unlock_inbalance_bug+0xf9/0x100 [<ffffffff810dfe4f>] lock_release_non_nested+0x1df/0x330 [<ffffffff811dae24>] ? dio_bio_end_aio+0x34/0xc0 [<ffffffff811d6935>] ? bio_check_pages_dirty+0x85/0xe0 [<ffffffff811daea1>] ? dio_bio_end_aio+0xb1/0xc0 [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380 [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380 [<ffffffff810e0079>] lock_release+0xd9/0x250 [<ffffffff81a74553>] _raw_spin_unlock_irq+0x23/0x40 [<ffffffff81329372>] blk_queue_bio+0x2a2/0x380 [<ffffffff81328faa>] generic_make_request+0xca/0x100 [<ffffffff81329056>] submit_bio+0x76/0xf0 [<ffffffff8115470c>] ? set_page_dirty_lock+0x3c/0x60 [<ffffffff811d69e1>] ? bio_set_pages_dirty+0x51/0x70 [<ffffffff811dd1a8>] do_blockdev_direct_IO+0xbf8/0xee0 [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80 [<ffffffff811dd4e5>] __blockdev_direct_IO+0x55/0x60 [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80 [<ffffffff811d92e7>] blkdev_direct_IO+0x57/0x60 [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80 [<ffffffff8114c6ae>] generic_file_aio_read+0x70e/0x760 [<ffffffff810df7c5>] ? __lock_acquire+0x215/0x5a0 [<ffffffff811e9924>] ? aio_run_iocb+0x54/0x1a0 [<ffffffff8114bfa0>] ? grab_cache_page_nowait+0xc0/0xc0 [<ffffffff811e82cc>] aio_rw_vect_retry+0x7c/0x1e0 [<ffffffff811e8250>] ? aio_fsync+0x30/0x30 [<ffffffff811e9936>] aio_run_iocb+0x66/0x1a0 [<ffffffff811ea9b0>] do_io_submit+0x6f0/0xb80 [<ffffffff8134de2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff811eae50>] sys_io_submit+0x10/0x20 [<ffffffff81a7c9e9>] system_call_fastpath+0x16/0x1b Changes since v2: Update commit log to explain how the code is still broken even if we delay the lock switching after the drain. Changes since v1: Update commit log as Tejun suggested. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Asias He authored
After hot-unplug a stressed disk, I found that rl->wait[] is not empty while rl->count[] is empty and there are theads still sleeping on get_request after the queue cleanup. With simple debug code, I found there are exactly nr_sleep - nr_wakeup of theads in D state. So there are missed wakeup. $ dmesg | grep nr_sleep [ 52.917115] ---> nr_sleep=1046, nr_wakeup=873, delta=173 $ vmstat 1 1 173 0 712640 24292 96172 0 0 0 0 419 757 0 0 0 100 0 To quote Tejun: Ah, okay, freed_request() wakes up single waiter with the assumption that after the wakeup there will at least be one successful allocation which in turn will continue the wakeup chain until the wait list is empty - ie. waiter wakeup is dependent on successful request allocation happening after each wakeup. With queue marked dead, any woken up waiter fails the allocation path, so the wakeup chaining is lost and we're left with hung waiters. What we need is wake_up_all() after drain completion. This patch fixes the missed wakeup by waking up all the theads which are sleeping on wait queue after queue drain. Changes in v2: Drop waitqueue_active() optimization Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Asias He <asias@redhat.com> Fixed a bug by me, where stacked devices would oops on calling blk_drain_queue() since ->rq.wait[] do not get initialized unless it's a full queue setup. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 13 Jun, 2012 4 commits
-
-
git://git.drbd.org/linux-drbdJens Axboe authored
-
Jens Axboe authored
Merge branch 'stable/for-jens-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus
-
Tao Guo authored
Fix a regression introduced by 7eaceacc ("block: remove per-queue plugging"). In that patch, Jens removed the whole mm_unplug_device() function, which used to be the trigger to make umem start to work. We need to implement unplugging to make umem start to work, or I/O will never be triggered. Signed-off-by: Tao Guo <Tao.Guo@emc.com> Cc: Neil Brown <neilb@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Shaohua Li <shli@kernel.org> Cc: <stable@vger.kernel.org> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Eric Dumazet authored
Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered by splice_shrink_spd() called from vmsplice_to_pipe() commit 35f3d14d (pipe: add support for shrinking and growing pipes) added capability to adjust pipe->buffers. Problem is some paths don't hold pipe mutex and assume pipe->buffers doesn't change for their duration. Fix this by adding nr_pages_max field in struct splice_pipe_desc, and use it in place of pipe->buffers where appropriate. splice_shrink_spd() loses its struct pipe_inode_info argument. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Tom Herbert <therbert@google.com> Cc: stable <stable@vger.kernel.org> # 2.6.35 Tested-by: Dave Jones <davej@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 12 Jun, 2012 4 commits
-
-
Lars Ellenberg authored
We must not look at mdev->actlog, unless we have a get_ldev() reference. It also does not make much sense to try to disconnect or pull-ahead of the peer, if we don't have good local data. Only even consider congestion policies, if our local disk is D_UP_TO_DATE. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
If a read is aborted due to force-detach of a supposedly unresponsive local backing device, and retried on the peer, it can happen that the local request later still completes (hopefully with an error). As it may already have been completed to upper layers meanwhile, it must not be retried again now. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
BUG: unable to handle kernel NULL pointer dereference at (null) ... [<d1e17561>] ? _drbd_bm_set_bits+0x151/0x240 [drbd] [<d1e236f8>] ? receive_bitmap+0x4f8/0xbc0 [drbd] This fixes an off-by-one error in the receive_bitmap() path, if run-length encoded bitmap transfer is enabled. If the bitmap is an exact multiple of PAGE_SIZE, which means the visible capacity of the drbd device is an exact multiple of 128 MiB (for 4k page size), and bitmap compression (use-rle) is enabled (which became default with 8.4), and the very last bit is dirty and reported in an rle comressed bitmap packet, we ended up trying to kmap_atomic a page pointer that does not exist (bitmap->bm_pages[last index + 1]). bug introduced by: Date: Fri Jul 24 15:33:24 2009 +0200 set bits: optimize for complete last word, fix off-by-one-word corner case made effective by: Date: Thu Dec 16 00:32:38 2010 +0100 drbd: get rid of unused debug code Long time ago, we had paranoia code in the bitmap that allocated one extra word, assigned a magic value, and checked on every occasion that the magic value was still unchanged. That debug code is unused, the extra long word complicates code a bit. Get rid of it. No-one triggered this bug in the last few years, because a large subset of our userbase is unaffected: * typically the last few blocks of a device are not modified frequently, and remain unset * use-rle was disabled by default in drbd < 8.4 * those with slightly "odd" device sizes, or * drbd internal meta data (which will skew the device size slightly, thus makes it harder to have a bug relevant device size) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Konrad Rzeszutek Wilk authored
Part of the ring structure is the 'id' field which is under control of the frontend. The frontend stamps it with "some" value (this some in this implementation being a value less than BLK_RING_SIZE), and when it gets a response expects said value to be in the response structure. We have a check for the id field when spolling new requests but not when de-spolling responses. We also add an extra check in add_id_to_freelist to make sure that the 'struct request' was not NULL - as we cannot pass a NULL to __blk_end_request_all, otherwise that crashes (and all the operations that the response is dealing with end up with __blk_end_request_all). Lastly we also print the name of the operation that failed. [v1: s/BUG/WARN/ suggested by Stefano] [v2: Add extra check in add_id_to_freelist] [v3: Redid op_name per Jan's suggestion] [v4: add const * and add WARN on failure returns] Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 06 Jun, 2012 1 commit
-
-
Tejun Heo authored
blkg_destroy() caches @blkg->q in local variable @q. While there are two places which needs @blkg->q, only lockdep_assert_held() used the local variable leading to unused local variable warning if lockdep is configured out. Drop the local variable and just use @blkg->q directly. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Rakesh Iyer <rni@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 05 Jun, 2012 2 commits
-
-
Asai Thambi S P authored
On module load, creates a debugfs parent 'rssd' in debugfs root. Then for each device, create a new node with corresponding disk name. Under the new node, two entries 'registers' and 'flags' are created. NOTE: These entries were removed from sysfs in the previous patch Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Asai Thambi S P authored
This patch removes entries 'registers' and 'flags' from sysfs. Updated ABI file to reflect this change. Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 04 Jun, 2012 8 commits
-
-
Tejun Heo authored
When policy data allocation fails in the middle, blkg_alloc() invokes blkg_free() to destroy the half constructed blkg. This ends up calling pd_exit_fn() on policy datas which didn't go through pd_init_fn(). Fix it by making blkg_alloc() call pd_init_fn() immediately after each policy data allocation. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Tejun Heo authored
cfq may be built w/ or w/o blkcg support depending on CONFIG_CFQ_CGROUP_IOSCHED. If blkcg support is disabled, most of related code is ifdef'd out but some part is left dangling - blkcg_policy_cfq is left zero-filled and blkcg_policy_[un]register() calls are made on it. Feeding zero filled policy to blkcg_policy_register() is incorrect and triggers the following WARN_ON() if CONFIG_BLK_CGROUP && !CONFIG_CFQ_GROUP_IOSCHED. ------------[ cut here ]------------ WARNING: at block/blk-cgroup.c:867 Modules linked in: Modules linked in: CPU: 3 Not tainted 3.4.0-09547-gfb21affa #1 Process swapper/0 (pid: 1, task: 000000003ff80000, ksp: 000000003ff7f8b8) Krnl PSW : 0704100180000000 00000000003d76ca (blkcg_policy_register+0xca/0xe0) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3 Krnl GPRS: 0000000000000000 00000000014b85ec 00000000014b85b0 0000000000000000 000000000096fb60 0000000000000000 00000000009a8e78 0000000000000048 000000000099c070 0000000000b6f000 0000000000000000 000000000099c0b8 00000000014b85b0 0000000000667580 000000003ff7fd98 000000003ff7fd70 Krnl Code: 00000000003d76be: a7280001 lhi %r2,1 00000000003d76c2: a7f4ffdf brc 15,3d7680 #00000000003d76c6: a7f40001 brc 15,3d76c8 >00000000003d76ca: a7c8ffea lhi %r12,-22 00000000003d76ce: a7f4ffce brc 15,3d766a 00000000003d76d2: a7f40001 brc 15,3d76d4 00000000003d76d6: a7c80000 lhi %r12,0 00000000003d76da: a7f4ffc2 brc 15,3d765e Call Trace: ([<0000000000b6f000>] initcall_debug+0x0/0x4) [<0000000000989e8a>] cfq_init+0x62/0xd4 [<00000000001000ba>] do_one_initcall+0x3a/0x170 [<000000000096fb60>] kernel_init+0x214/0x2bc [<0000000000623202>] kernel_thread_starter+0x6/0xc [<00000000006231fc>] kernel_thread_starter+0x0/0xc no locks held by swapper/0/1. Last Breaking-Event-Address: [<00000000003d76c6>] blkcg_policy_register+0xc6/0xe0 ---[ end trace b8ef4903fcbf9dd3 ]--- This patch fixes the problem by ensuring all blkcg support code is inside CONFIG_CFQ_GROUP_IOSCHED. * blkcg_policy_cfq declaration and blkg_to_cfqg() definition are moved inside the first CONFIG_CFQ_GROUP_IOSCHED block. __maybe_unused is dropped from blkcg_policy_cfq decl. * blkcg_deactivate_poilcy() invocation is moved inside ifdef. This also makes the activation logic match cfq_init_queue(). * All blkcg_policy_[un]register() invocations are moved inside ifdef. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <20120601112954.GC3535@osiris.boeblingen.de.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Tejun Heo authored
cfq_init() would return zero after kmem cache creation failure. Fix so that it returns -ENOMEM. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Sachin Kamat authored
version.h header file inclusion is no longer required. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
-
Kukjin Kim authored
Should be 'exynos5_xxx' instead of 'exonys5_xxx'. It happened at the commit 30b84288 ("Merge tag 'soc2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc") during v3.5 merge window. Signed-off-by: Kukjin Kim <kgene.kim@samsung.com> [ My bad - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull some left-over PM patches from Rafael J. Wysocki. * 'pm-acpi' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / PM: Make acpi_pm_device_sleep_state() follow the specification ACPI / PM: Make __acpi_bus_get_power() cover D3cold correctly ACPI / PM: Fix error messages in drivers/acpi/bus.c rtc-cmos / PM: report wakeup event on ACPI RTC alarm ACPI / PM: Generate wakeup events on fixed power button
-
Linus Torvalds authored
This reverts commit 5ceb9ce6. That commit seems to be the cause of the mm compation list corruption issues that Dave Jones reported. The locking (or rather, absense there-of) is dubious, as is the use of the 'page' variable once it has been found to be outside the pageblock range. So revert it for now, we can re-visit this for 3.6. If we even need to: as Minchan Kim says, "The patch wasn't a bug fix and even test workload was very theoretical". Reported-and-tested-by: Dave Jones <davej@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Hugh Dickins authored
New tmpfs use of !PageUptodate pages for fallocate() is triggering the WARNING: at mm/page-writeback.c:1990 when __set_page_dirty_nobuffers() is called from migrate_page_copy() for compaction. It is anomalous that migration should use __set_page_dirty_nobuffers() on an address_space that does not participate in dirty and writeback accounting; and this has also been observed to insert surprising dirty tags into a tmpfs radix_tree, despite tmpfs not using tags at all. We should probably give migrate_page_copy() a better way to preserve the tag and migrate accounting info, when mapping_cap_account_dirty(). But that needs some more work: so in the interim, avoid the warning by using a simple SetPageDirty on PageSwapBacked pages. Reported-and-tested-by: Dave Jones <davej@redhat.com> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 03 Jun, 2012 3 commits
-
-
Linus Torvalds authored
The comment above it says "Stat data, not accessed from path walking", but in fact some of inode fields we use for the common stat data was way down at the end of the inode, causing unnecessary cache misses for the common stat operations. The inode structure is pretty big, and this can change padding depending on field width, but at least on the common 64-bit configurations this doesn't change the size. Some of our inode layout has historically been to tro to avoid unnecessary padding fields, but cache locality is at least as important for layout, if not more. Noticed by looking at kernel profiles, and noticing that the "i_blkbits" access stood out like a sore thumb. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dmLinus Torvalds authored
Pull device-mapper updates from Alasdair G Kergon: "Improve multipath's retrying mechanism in some defined circumstances and provide a simple reserve/release mechanism for userspace tools to access thin provisioning metadata while the pool is in use." * tag 'dm-3.5-changes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: dm thin: provide userspace access to pool metadata dm thin: use slab mempools dm mpath: allow ioctls to trigger pg init dm mpath: delay retry of bypassed pg dm mpath: reduce size of struct multipath
-
- 02 Jun, 2012 4 commits
-
-
Joe Thornber authored
This patch implements two new messages that can be sent to the thin pool target allowing it to take a snapshot of the _metadata_. This, read-only snapshot can be accessed by userland, concurrently with the live target. Only one metadata snapshot can be held at a time. The pool's status line will give the block location for the current msnap. Since version 0.1.5 of the userland thin provisioning tools, the thin_dump program displays the msnap as follows: thin_dump -m <msnap root> <metadata dev> Available here: https://github.com/jthornber/thin-provisioning-tools Now that userland can access the metadata we can do various things that have traditionally been kernel side tasks: i) Incremental backups. By using metadata snapshots we can work out what blocks have changed over time. Combined with data snapshots we can ensure the data doesn't change while we back it up. A short proof of concept script can be found here: https://github.com/jthornber/thinp-test-suite/blob/master/incremental_backup_example.rb ii) Migration of thin devices from one pool to another. iii) Merging snapshots back into an external origin. iv) Asyncronous replication. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
-
Mike Snitzer authored
Use dedicated caches prefixed with a "dm_" name rather than relying on kmalloc mempools backed by generic slab caches so the memory usage of thin provisioning (and any leaks) can be accounted for independently. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
-
Mikulas Patocka authored
After the failure of a group of paths, any alternative paths that need initialising do not become available until further I/O is sent to the device. Until this has happened, ioctls return -EAGAIN. With this patch, new paths are made available in response to an ioctl too. The processing of the ioctl gets delayed until this has happened. Instead of returning an error, we submit a work item to kmultipathd (that will potentially activate the new path) and retry in ten milliseconds. Note that the patch doesn't retry an ioctl if the ioctl itself fails due to a path failure. Such retries should be handled intelligently by the code that generated the ioctl in the first place, noting that some SCSI commands should not be retried because they are not idempotent (XOR write commands). For commands that could be retried, there is a danger that if the device rejected the SCSI command, the path could be errorneously marked as failed, and the request would be retried on another path which might fail too. It can be determined if the failure happens on the device or on the SCSI controller, but there is no guarantee that all SCSI drivers set these flags correctly. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
-
Mike Christie authored
If I/O needs retrying and only bypassed priority groups are available, set the pg_init_delay_retry flag to wait before retrying. If, for example, the reason for the bypass is that the controller is getting reset or there is a firmware upgrade happening, retrying right away would cause a flood of log messages and retries for what could be a few seconds or even several minutes. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
-