Commit b0a1ea51 authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'for-4.3/blkcg' of git://git.kernel.dk/linux-block

Pull blk-cg updates from Jens Axboe:
 "A bit later in the cycle, but this has been in the block tree for a a
  while.  This is basically four patchsets from Tejun, that improve our
  buffered cgroup writeback.  It was dependent on the other cgroup
  changes, but they went in earlier in this cycle.

  Series 1 is set of 5 patches that has cgroup writeback updates:

   - bdi_writeback iteration fix which could lead to some wb's being
     skipped or repeated during e.g. sync under memory pressure.

   - Simplification of wb work wait mechanism.

   - Writeback tracepoints updated to report cgroup.

  Series 2 is is a set of updates for the CFQ cgroup writeback handling:

     cfq has always charged all async IOs to the root cgroup.  It didn't
     have much choice as writeback didn't know about cgroups and there
     was no way to tell who to blame for a given writeback IO.
     writeback finally grew support for cgroups and now tags each
     writeback IO with the appropriate cgroup to charge it against.

     This patchset updates cfq so that it follows the blkcg each bio is
     tagged with.  Async cfq_queues are now shared across cfq_group,
     which is per-cgroup, instead of per-request_queue cfq_data.  This
     makes all IOs follow the weight based IO resource distribution
     implemented by cfq.

     - Switched from GFP_ATOMIC to GFP_NOWAIT as suggested by Jeff.

     - Other misc review points addressed, acks added and rebased.

  Series 3 is the blkcg policy cleanup patches:

     This patchset contains assorted cleanups for blkcg_policy methods
     and blk[c]g_policy_data handling.

     - alloc/free added for blkg_policy_data.  exit dropped.

     - alloc/free added for blkcg_policy_data.

     - blk-throttle's async percpu allocation is replaced with direct
       allocation.

     - all methods now take blk[c]g_policy_data instead of blkcg_gq or
       blkcg.

  And finally, series 4 is a set of patches cleaning up the blkcg stats
  handling:

    blkcg's stats have always been somwhat of a mess.  This patchset
    tries to improve the situation a bit.

     - The following patches added to consolidate blkcg entry point and
       blkg creation.  This is in itself is an improvement and helps
       colllecting common stats on bio issue.

     - per-blkg stats now accounted on bio issue rather than request
       completion so that bio based and request based drivers can behave
       the same way.  The issue was spotted by Vivek.

     - cfq-iosched implements custom recursive stats and blk-throttle
       implements custom per-cpu stats.  This patchset make blkcg core
       support both by default.

     - cfq-iosched and blk-throttle keep track of the same stats
       multiple times.  Unify them"

* 'for-4.3/blkcg' of git://git.kernel.dk/linux-block: (45 commits)
  blkcg: use CGROUP_WEIGHT_* scale for io.weight on the unified hierarchy
  blkcg: s/CFQ_WEIGHT_*/CFQ_WEIGHT_LEGACY_*/
  blkcg: implement interface for the unified hierarchy
  blkcg: misc preparations for unified hierarchy interface
  blkcg: separate out tg_conf_updated() from tg_set_conf()
  blkcg: move body parsing from blkg_conf_prep() to its callers
  blkcg: mark existing cftypes as legacy
  blkcg: rename subsystem name from blkio to io
  blkcg: refine error codes returned during blkcg configuration
  blkcg: remove unnecessary NULL checks from __cfqg_set_weight_device()
  blkcg: reduce stack usage of blkg_rwstat_recursive_sum()
  blkcg: remove cfqg_stats->sectors
  blkcg: move io_service_bytes and io_serviced stats into blkcg_gq
  blkcg: make blkg_[rw]stat_recursive_sum() to be able to index into blkcg_gq
  blkcg: make blkcg_[rw]stat per-cpu
  blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with it
  blkcg: consolidate blkg creation in blkcg_bio_issue_check()
  blk-throttle: improve queue bypass handling
  blkcg: move root blkg lookup optimization from throtl_lookup_tg() to __blkg_lookup()
  blkcg: inline [__]blkg_lookup()
  ...
parents 33e247c7 69d7fde5
...@@ -201,7 +201,7 @@ Proportional weight policy files ...@@ -201,7 +201,7 @@ Proportional weight policy files
specifies the number of bytes. specifies the number of bytes.
- blkio.io_serviced - blkio.io_serviced
- Number of IOs completed to/from the disk by the group. These - Number of IOs (bio) issued to the disk by the group. These
are further divided by the type of operation - read or write, sync are further divided by the type of operation - read or write, sync
or async. First two fields specify the major and minor number of the or async. First two fields specify the major and minor number of the
device, third field specifies the operation type and the fourth field device, third field specifies the operation type and the fourth field
...@@ -327,18 +327,11 @@ Note: If both BW and IOPS rules are specified for a device, then IO is ...@@ -327,18 +327,11 @@ Note: If both BW and IOPS rules are specified for a device, then IO is
subjected to both the constraints. subjected to both the constraints.
- blkio.throttle.io_serviced - blkio.throttle.io_serviced
- Number of IOs (bio) completed to/from the disk by the group (as - Number of IOs (bio) issued to the disk by the group. These
seen by throttling policy). These are further divided by the type are further divided by the type of operation - read or write, sync
of operation - read or write, sync or async. First two fields specify or async. First two fields specify the major and minor number of the
the major and minor number of the device, third field specifies the device, third field specifies the operation type and the fourth field
operation type and the fourth field specifies the number of IOs. specifies the number of IOs.
blkio.io_serviced does accounting as seen by CFQ and counts are in
number of requests (struct request). On the other hand,
blkio.throttle.io_serviced counts number of IO in terms of number
of bios as seen by throttling policy. These bios can later be
merged by elevator and total number of requests completed can be
lesser.
- blkio.throttle.io_service_bytes - blkio.throttle.io_service_bytes
- Number of bytes transferred to/from the disk by the group. These - Number of bytes transferred to/from the disk by the group. These
...@@ -347,11 +340,6 @@ Note: If both BW and IOPS rules are specified for a device, then IO is ...@@ -347,11 +340,6 @@ Note: If both BW and IOPS rules are specified for a device, then IO is
device, third field specifies the operation type and the fourth field device, third field specifies the operation type and the fourth field
specifies the number of bytes. specifies the number of bytes.
These numbers should roughly be same as blkio.io_service_bytes as
updated by CFQ. The difference between two is that
blkio.io_service_bytes will not be updated if CFQ is not operating
on request queue.
Common files among various policies Common files among various policies
----------------------------------- -----------------------------------
- blkio.reset_stats - blkio.reset_stats
......
...@@ -27,7 +27,7 @@ CONTENTS ...@@ -27,7 +27,7 @@ CONTENTS
5-3-1. Format 5-3-1. Format
5-3-2. Control Knobs 5-3-2. Control Knobs
5-4. Per-Controller Changes 5-4. Per-Controller Changes
5-4-1. blkio 5-4-1. io
5-4-2. cpuset 5-4-2. cpuset
5-4-3. memory 5-4-3. memory
6. Planned Changes 6. Planned Changes
...@@ -203,7 +203,7 @@ other issues. The mapping from nice level to weight isn't obvious or ...@@ -203,7 +203,7 @@ other issues. The mapping from nice level to weight isn't obvious or
universal, and there are various other knobs which simply aren't universal, and there are various other knobs which simply aren't
available for tasks. available for tasks.
The blkio controller implicitly creates a hidden leaf node for each The io controller implicitly creates a hidden leaf node for each
cgroup to host the tasks. The hidden leaf has its own copies of all cgroup to host the tasks. The hidden leaf has its own copies of all
the knobs with "leaf_" prefixed. While this allows equivalent control the knobs with "leaf_" prefixed. While this allows equivalent control
over internal tasks, it's with serious drawbacks. It always adds an over internal tasks, it's with serious drawbacks. It always adds an
...@@ -438,9 +438,62 @@ may be specified in any order and not all pairs have to be specified. ...@@ -438,9 +438,62 @@ may be specified in any order and not all pairs have to be specified.
5-4. Per-Controller Changes 5-4. Per-Controller Changes
5-4-1. blkio 5-4-1. io
- blk-throttle becomes properly hierarchical. - blkio is renamed to io. The interface is overhauled anyway. The
new name is more in line with the other two major controllers, cpu
and memory, and better suited given that it may be used for cgroup
writeback without involving block layer.
- Everything including stat is always hierarchical making separate
recursive stat files pointless and, as no internal node can have
tasks, leaf weights are meaningless. The operation model is
simplified and the interface is overhauled accordingly.
io.stat
The stat file. The reported stats are from the point where
bio's are issued to request_queue. The stats are counted
independent of which policies are enabled. Each line in the
file follows the following format. More fields may later be
added at the end.
$MAJ:$MIN rbytes=$RBYTES wbytes=$WBYTES rios=$RIOS wrios=$WIOS
io.weight
The weight setting, currently only available and effective if
cfq-iosched is in use for the target device. The weight is
between 1 and 10000 and defaults to 100. The first line
always contains the default weight in the following format to
use when per-device setting is missing.
default $WEIGHT
Subsequent lines list per-device weights of the following
format.
$MAJ:$MIN $WEIGHT
Writing "$WEIGHT" or "default $WEIGHT" changes the default
setting. Writing "$MAJ:$MIN $WEIGHT" sets per-device weight
while "$MAJ:$MIN default" clears it.
This file is available only on non-root cgroups.
io.max
The maximum bandwidth and/or iops setting, only available if
blk-throttle is enabled. The file is of the following format.
$MAJ:$MIN rbps=$RBPS wbps=$WBPS riops=$RIOPS wiops=$WIOPS
${R|W}BPS are read/write bytes per second and ${R|W}IOPS are
read/write IOs per second. "max" indicates no limit. Writing
to the file follows the same format but the individual
settings may be ommitted or specified in any order.
This file is available only on non-root cgroups.
5-4-2. cpuset 5-4-2. cpuset
......
...@@ -1990,7 +1990,7 @@ int bio_associate_current(struct bio *bio) ...@@ -1990,7 +1990,7 @@ int bio_associate_current(struct bio *bio)
get_io_context_active(ioc); get_io_context_active(ioc);
bio->bi_ioc = ioc; bio->bi_ioc = ioc;
bio->bi_css = task_get_css(current, blkio_cgrp_id); bio->bi_css = task_get_css(current, io_cgrp_id);
return 0; return 0;
} }
EXPORT_SYMBOL_GPL(bio_associate_current); EXPORT_SYMBOL_GPL(bio_associate_current);
......
This diff is collapsed.
...@@ -1888,8 +1888,8 @@ generic_make_request_checks(struct bio *bio) ...@@ -1888,8 +1888,8 @@ generic_make_request_checks(struct bio *bio)
*/ */
create_io_context(GFP_ATOMIC, q->node); create_io_context(GFP_ATOMIC, q->node);
if (blk_throtl_bio(q, bio)) if (!blkcg_bio_issue_check(q, bio))
return false; /* throttled, will be resubmitted later */ return false;
trace_block_bio_queue(q, bio); trace_block_bio_queue(q, bio);
return true; return true;
......
This diff is collapsed.
...@@ -272,15 +272,10 @@ static inline struct io_context *create_io_context(gfp_t gfp_mask, int node) ...@@ -272,15 +272,10 @@ static inline struct io_context *create_io_context(gfp_t gfp_mask, int node)
* Internal throttling interface * Internal throttling interface
*/ */
#ifdef CONFIG_BLK_DEV_THROTTLING #ifdef CONFIG_BLK_DEV_THROTTLING
extern bool blk_throtl_bio(struct request_queue *q, struct bio *bio);
extern void blk_throtl_drain(struct request_queue *q); extern void blk_throtl_drain(struct request_queue *q);
extern int blk_throtl_init(struct request_queue *q); extern int blk_throtl_init(struct request_queue *q);
extern void blk_throtl_exit(struct request_queue *q); extern void blk_throtl_exit(struct request_queue *q);
#else /* CONFIG_BLK_DEV_THROTTLING */ #else /* CONFIG_BLK_DEV_THROTTLING */
static inline bool blk_throtl_bio(struct request_queue *q, struct bio *bio)
{
return false;
}
static inline void blk_throtl_drain(struct request_queue *q) { } static inline void blk_throtl_drain(struct request_queue *q) { }
static inline int blk_throtl_init(struct request_queue *q) { return 0; } static inline int blk_throtl_init(struct request_queue *q) { return 0; }
static inline void blk_throtl_exit(struct request_queue *q) { } static inline void blk_throtl_exit(struct request_queue *q) { }
......
This diff is collapsed.
...@@ -53,8 +53,6 @@ struct wb_writeback_work { ...@@ -53,8 +53,6 @@ struct wb_writeback_work {
unsigned int for_background:1; unsigned int for_background:1;
unsigned int for_sync:1; /* sync(2) WB_SYNC_ALL writeback */ unsigned int for_sync:1; /* sync(2) WB_SYNC_ALL writeback */
unsigned int auto_free:1; /* free on completion */ unsigned int auto_free:1; /* free on completion */
unsigned int single_wait:1;
unsigned int single_done:1;
enum wb_reason reason; /* why was writeback initiated? */ enum wb_reason reason; /* why was writeback initiated? */
struct list_head list; /* pending work list */ struct list_head list; /* pending work list */
...@@ -178,14 +176,11 @@ static void wb_wakeup(struct bdi_writeback *wb) ...@@ -178,14 +176,11 @@ static void wb_wakeup(struct bdi_writeback *wb)
static void wb_queue_work(struct bdi_writeback *wb, static void wb_queue_work(struct bdi_writeback *wb,
struct wb_writeback_work *work) struct wb_writeback_work *work)
{ {
trace_writeback_queue(wb->bdi, work); trace_writeback_queue(wb, work);
spin_lock_bh(&wb->work_lock); spin_lock_bh(&wb->work_lock);
if (!test_bit(WB_registered, &wb->state)) { if (!test_bit(WB_registered, &wb->state))
if (work->single_wait)
work->single_done = 1;
goto out_unlock; goto out_unlock;
}
if (work->done) if (work->done)
atomic_inc(&work->done->cnt); atomic_inc(&work->done->cnt);
list_add_tail(&work->list, &wb->work_list); list_add_tail(&work->list, &wb->work_list);
...@@ -706,7 +701,7 @@ EXPORT_SYMBOL_GPL(wbc_account_io); ...@@ -706,7 +701,7 @@ EXPORT_SYMBOL_GPL(wbc_account_io);
/** /**
* inode_congested - test whether an inode is congested * inode_congested - test whether an inode is congested
* @inode: inode to test for congestion * @inode: inode to test for congestion (may be NULL)
* @cong_bits: mask of WB_[a]sync_congested bits to test * @cong_bits: mask of WB_[a]sync_congested bits to test
* *
* Tests whether @inode is congested. @cong_bits is the mask of congestion * Tests whether @inode is congested. @cong_bits is the mask of congestion
...@@ -716,6 +711,9 @@ EXPORT_SYMBOL_GPL(wbc_account_io); ...@@ -716,6 +711,9 @@ EXPORT_SYMBOL_GPL(wbc_account_io);
* determined by whether the cgwb (cgroup bdi_writeback) for the blkcg * determined by whether the cgwb (cgroup bdi_writeback) for the blkcg
* associated with @inode is congested; otherwise, the root wb's congestion * associated with @inode is congested; otherwise, the root wb's congestion
* state is used. * state is used.
*
* @inode is allowed to be NULL as this function is often called on
* mapping->host which is NULL for the swapper space.
*/ */
int inode_congested(struct inode *inode, int cong_bits) int inode_congested(struct inode *inode, int cong_bits)
{ {
...@@ -737,32 +735,6 @@ int inode_congested(struct inode *inode, int cong_bits) ...@@ -737,32 +735,6 @@ int inode_congested(struct inode *inode, int cong_bits)
} }
EXPORT_SYMBOL_GPL(inode_congested); EXPORT_SYMBOL_GPL(inode_congested);
/**
* wb_wait_for_single_work - wait for completion of a single bdi_writeback_work
* @bdi: bdi the work item was issued to
* @work: work item to wait for
*
* Wait for the completion of @work which was issued to one of @bdi's
* bdi_writeback's. The caller must have set @work->single_wait before
* issuing it. This wait operates independently fo
* wb_wait_for_completion() and also disables automatic freeing of @work.
*/
static void wb_wait_for_single_work(struct backing_dev_info *bdi,
struct wb_writeback_work *work)
{
if (WARN_ON_ONCE(!work->single_wait))
return;
wait_event(bdi->wb_waitq, work->single_done);
/*
* Paired with smp_wmb() in wb_do_writeback() and ensures that all
* modifications to @work prior to assertion of ->single_done is
* visible to the caller once this function returns.
*/
smp_rmb();
}
/** /**
* wb_split_bdi_pages - split nr_pages to write according to bandwidth * wb_split_bdi_pages - split nr_pages to write according to bandwidth
* @wb: target bdi_writeback to split @nr_pages to * @wb: target bdi_writeback to split @nr_pages to
...@@ -791,38 +763,6 @@ static long wb_split_bdi_pages(struct bdi_writeback *wb, long nr_pages) ...@@ -791,38 +763,6 @@ static long wb_split_bdi_pages(struct bdi_writeback *wb, long nr_pages)
return DIV_ROUND_UP_ULL((u64)nr_pages * this_bw, tot_bw); return DIV_ROUND_UP_ULL((u64)nr_pages * this_bw, tot_bw);
} }
/**
* wb_clone_and_queue_work - clone a wb_writeback_work and issue it to a wb
* @wb: target bdi_writeback
* @base_work: source wb_writeback_work
*
* Try to make a clone of @base_work and issue it to @wb. If cloning
* succeeds, %true is returned; otherwise, @base_work is issued directly
* and %false is returned. In the latter case, the caller is required to
* wait for @base_work's completion using wb_wait_for_single_work().
*
* A clone is auto-freed on completion. @base_work never is.
*/
static bool wb_clone_and_queue_work(struct bdi_writeback *wb,
struct wb_writeback_work *base_work)
{
struct wb_writeback_work *work;
work = kmalloc(sizeof(*work), GFP_ATOMIC);
if (work) {
*work = *base_work;
work->auto_free = 1;
work->single_wait = 0;
} else {
work = base_work;
work->auto_free = 0;
work->single_wait = 1;
}
work->single_done = 0;
wb_queue_work(wb, work);
return work != base_work;
}
/** /**
* bdi_split_work_to_wbs - split a wb_writeback_work to all wb's of a bdi * bdi_split_work_to_wbs - split a wb_writeback_work to all wb's of a bdi
* @bdi: target backing_dev_info * @bdi: target backing_dev_info
...@@ -838,15 +778,19 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, ...@@ -838,15 +778,19 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
struct wb_writeback_work *base_work, struct wb_writeback_work *base_work,
bool skip_if_busy) bool skip_if_busy)
{ {
long nr_pages = base_work->nr_pages; int next_memcg_id = 0;
int next_blkcg_id = 0;
struct bdi_writeback *wb; struct bdi_writeback *wb;
struct wb_iter iter; struct wb_iter iter;
might_sleep(); might_sleep();
restart: restart:
rcu_read_lock(); rcu_read_lock();
bdi_for_each_wb(wb, bdi, &iter, next_blkcg_id) { bdi_for_each_wb(wb, bdi, &iter, next_memcg_id) {
DEFINE_WB_COMPLETION_ONSTACK(fallback_work_done);
struct wb_writeback_work fallback_work;
struct wb_writeback_work *work;
long nr_pages;
/* SYNC_ALL writes out I_DIRTY_TIME too */ /* SYNC_ALL writes out I_DIRTY_TIME too */
if (!wb_has_dirty_io(wb) && if (!wb_has_dirty_io(wb) &&
(base_work->sync_mode == WB_SYNC_NONE || (base_work->sync_mode == WB_SYNC_NONE ||
...@@ -855,14 +799,31 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, ...@@ -855,14 +799,31 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
if (skip_if_busy && writeback_in_progress(wb)) if (skip_if_busy && writeback_in_progress(wb))
continue; continue;
base_work->nr_pages = wb_split_bdi_pages(wb, nr_pages); nr_pages = wb_split_bdi_pages(wb, base_work->nr_pages);
if (!wb_clone_and_queue_work(wb, base_work)) {
next_blkcg_id = wb->blkcg_css->id + 1; work = kmalloc(sizeof(*work), GFP_ATOMIC);
if (work) {
*work = *base_work;
work->nr_pages = nr_pages;
work->auto_free = 1;
wb_queue_work(wb, work);
continue;
}
/* alloc failed, execute synchronously using on-stack fallback */
work = &fallback_work;
*work = *base_work;
work->nr_pages = nr_pages;
work->auto_free = 0;
work->done = &fallback_work_done;
wb_queue_work(wb, work);
next_memcg_id = wb->memcg_css->id + 1;
rcu_read_unlock(); rcu_read_unlock();
wb_wait_for_single_work(bdi, base_work); wb_wait_for_completion(bdi, &fallback_work_done);
goto restart; goto restart;
} }
}
rcu_read_unlock(); rcu_read_unlock();
} }
...@@ -902,8 +863,6 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, ...@@ -902,8 +863,6 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
if (!skip_if_busy || !writeback_in_progress(&bdi->wb)) { if (!skip_if_busy || !writeback_in_progress(&bdi->wb)) {
base_work->auto_free = 0; base_work->auto_free = 0;
base_work->single_wait = 0;
base_work->single_done = 0;
wb_queue_work(&bdi->wb, base_work); wb_queue_work(&bdi->wb, base_work);
} }
} }
...@@ -924,7 +883,7 @@ void wb_start_writeback(struct bdi_writeback *wb, long nr_pages, ...@@ -924,7 +883,7 @@ void wb_start_writeback(struct bdi_writeback *wb, long nr_pages,
*/ */
work = kzalloc(sizeof(*work), GFP_ATOMIC); work = kzalloc(sizeof(*work), GFP_ATOMIC);
if (!work) { if (!work) {
trace_writeback_nowork(wb->bdi); trace_writeback_nowork(wb);
wb_wakeup(wb); wb_wakeup(wb);
return; return;
} }
...@@ -954,7 +913,7 @@ void wb_start_background_writeback(struct bdi_writeback *wb) ...@@ -954,7 +913,7 @@ void wb_start_background_writeback(struct bdi_writeback *wb)
* We just wake up the flusher thread. It will perform background * We just wake up the flusher thread. It will perform background
* writeback as soon as there is no other work to do. * writeback as soon as there is no other work to do.
*/ */
trace_writeback_wake_background(wb->bdi); trace_writeback_wake_background(wb);
wb_wakeup(wb); wb_wakeup(wb);
} }
...@@ -1660,14 +1619,14 @@ static long wb_writeback(struct bdi_writeback *wb, ...@@ -1660,14 +1619,14 @@ static long wb_writeback(struct bdi_writeback *wb,
} else if (work->for_background) } else if (work->for_background)
oldest_jif = jiffies; oldest_jif = jiffies;
trace_writeback_start(wb->bdi, work); trace_writeback_start(wb, work);
if (list_empty(&wb->b_io)) if (list_empty(&wb->b_io))
queue_io(wb, work); queue_io(wb, work);
if (work->sb) if (work->sb)
progress = writeback_sb_inodes(work->sb, wb, work); progress = writeback_sb_inodes(work->sb, wb, work);
else else
progress = __writeback_inodes_wb(wb, work); progress = __writeback_inodes_wb(wb, work);
trace_writeback_written(wb->bdi, work); trace_writeback_written(wb, work);
wb_update_bandwidth(wb, wb_start); wb_update_bandwidth(wb, wb_start);
...@@ -1692,7 +1651,7 @@ static long wb_writeback(struct bdi_writeback *wb, ...@@ -1692,7 +1651,7 @@ static long wb_writeback(struct bdi_writeback *wb,
* we'll just busyloop. * we'll just busyloop.
*/ */
if (!list_empty(&wb->b_more_io)) { if (!list_empty(&wb->b_more_io)) {
trace_writeback_wait(wb->bdi, work); trace_writeback_wait(wb, work);
inode = wb_inode(wb->b_more_io.prev); inode = wb_inode(wb->b_more_io.prev);
spin_lock(&inode->i_lock); spin_lock(&inode->i_lock);
spin_unlock(&wb->list_lock); spin_unlock(&wb->list_lock);
...@@ -1797,26 +1756,14 @@ static long wb_do_writeback(struct bdi_writeback *wb) ...@@ -1797,26 +1756,14 @@ static long wb_do_writeback(struct bdi_writeback *wb)
set_bit(WB_writeback_running, &wb->state); set_bit(WB_writeback_running, &wb->state);
while ((work = get_next_work_item(wb)) != NULL) { while ((work = get_next_work_item(wb)) != NULL) {
struct wb_completion *done = work->done; struct wb_completion *done = work->done;
bool need_wake_up = false;
trace_writeback_exec(wb->bdi, work); trace_writeback_exec(wb, work);
wrote += wb_writeback(wb, work); wrote += wb_writeback(wb, work);
if (work->single_wait) { if (work->auto_free)
WARN_ON_ONCE(work->auto_free);
/* paired w/ rmb in wb_wait_for_single_work() */
smp_wmb();
work->single_done = 1;
need_wake_up = true;
} else if (work->auto_free) {
kfree(work); kfree(work);
}
if (done && atomic_dec_and_test(&done->cnt)) if (done && atomic_dec_and_test(&done->cnt))
need_wake_up = true;
if (need_wake_up)
wake_up_all(&wb->bdi->wb_waitq); wake_up_all(&wb->bdi->wb_waitq);
} }
......
...@@ -91,6 +91,29 @@ int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen) ...@@ -91,6 +91,29 @@ int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen)
return ret; return ret;
} }
/**
* kernfs_path_len - determine the length of the full path of a given node
* @kn: kernfs_node of interest
*
* The returned length doesn't include the space for the terminating '\0'.
*/
size_t kernfs_path_len(struct kernfs_node *kn)
{
size_t len = 0;
unsigned long flags;
spin_lock_irqsave(&kernfs_rename_lock, flags);
do {
len += strlen(kn->name) + 1;
kn = kn->parent;
} while (kn && kn->parent);
spin_unlock_irqrestore(&kernfs_rename_lock, flags);
return len;
}
/** /**
* kernfs_path - build full path of a given node * kernfs_path - build full path of a given node
* @kn: kernfs_node of interest * @kn: kernfs_node of interest
......
...@@ -286,7 +286,7 @@ static inline struct bdi_writeback *wb_find_current(struct backing_dev_info *bdi ...@@ -286,7 +286,7 @@ static inline struct bdi_writeback *wb_find_current(struct backing_dev_info *bdi
* %current's blkcg equals the effective blkcg of its memcg. No * %current's blkcg equals the effective blkcg of its memcg. No
* need to use the relatively expensive cgroup_get_e_css(). * need to use the relatively expensive cgroup_get_e_css().
*/ */
if (likely(wb && wb->blkcg_css == task_css(current, blkio_cgrp_id))) if (likely(wb && wb->blkcg_css == task_css(current, io_cgrp_id)))
return wb; return wb;
return NULL; return NULL;
} }
...@@ -402,7 +402,7 @@ static inline void unlocked_inode_to_wb_end(struct inode *inode, bool locked) ...@@ -402,7 +402,7 @@ static inline void unlocked_inode_to_wb_end(struct inode *inode, bool locked)
} }
struct wb_iter { struct wb_iter {
int start_blkcg_id; int start_memcg_id;
struct radix_tree_iter tree_iter; struct radix_tree_iter tree_iter;
void **slot; void **slot;
}; };
...@@ -414,9 +414,9 @@ static inline struct bdi_writeback *__wb_iter_next(struct wb_iter *iter, ...@@ -414,9 +414,9 @@ static inline struct bdi_writeback *__wb_iter_next(struct wb_iter *iter,
WARN_ON_ONCE(!rcu_read_lock_held()); WARN_ON_ONCE(!rcu_read_lock_held());
if (iter->start_blkcg_id >= 0) { if (iter->start_memcg_id >= 0) {
iter->slot = radix_tree_iter_init(titer, iter->start_blkcg_id); iter->slot = radix_tree_iter_init(titer, iter->start_memcg_id);
iter->start_blkcg_id = -1; iter->start_memcg_id = -1;
} else { } else {
iter->slot = radix_tree_next_slot(iter->slot, titer, 0); iter->slot = radix_tree_next_slot(iter->slot, titer, 0);
} }
...@@ -430,30 +430,30 @@ static inline struct bdi_writeback *__wb_iter_next(struct wb_iter *iter, ...@@ -430,30 +430,30 @@ static inline struct bdi_writeback *__wb_iter_next(struct wb_iter *iter,
static inline struct bdi_writeback *__wb_iter_init(struct wb_iter *iter, static inline struct bdi_writeback *__wb_iter_init(struct wb_iter *iter,
struct backing_dev_info *bdi, struct backing_dev_info *bdi,
int start_blkcg_id) int start_memcg_id)
{ {
iter->start_blkcg_id = start_blkcg_id; iter->start_memcg_id = start_memcg_id;
if (start_blkcg_id) if (start_memcg_id)
return __wb_iter_next(iter, bdi); return __wb_iter_next(iter, bdi);
else else
return &bdi->wb; return &bdi->wb;
} }
/** /**
* bdi_for_each_wb - walk all wb's of a bdi in ascending blkcg ID order * bdi_for_each_wb - walk all wb's of a bdi in ascending memcg ID order
* @wb_cur: cursor struct bdi_writeback pointer * @wb_cur: cursor struct bdi_writeback pointer
* @bdi: bdi to walk wb's of * @bdi: bdi to walk wb's of
* @iter: pointer to struct wb_iter to be used as iteration buffer * @iter: pointer to struct wb_iter to be used as iteration buffer
* @start_blkcg_id: blkcg ID to start iteration from * @start_memcg_id: memcg ID to start iteration from
* *
* Iterate @wb_cur through the wb's (bdi_writeback's) of @bdi in ascending * Iterate @wb_cur through the wb's (bdi_writeback's) of @bdi in ascending
* blkcg ID order starting from @start_blkcg_id. @iter is struct wb_iter * memcg ID order starting from @start_memcg_id. @iter is struct wb_iter
* to be used as temp storage during iteration. rcu_read_lock() must be * to be used as temp storage during iteration. rcu_read_lock() must be
* held throughout iteration. * held throughout iteration.
*/ */
#define bdi_for_each_wb(wb_cur, bdi, iter, start_blkcg_id) \ #define bdi_for_each_wb(wb_cur, bdi, iter, start_memcg_id) \
for ((wb_cur) = __wb_iter_init(iter, bdi, start_blkcg_id); \ for ((wb_cur) = __wb_iter_init(iter, bdi, start_memcg_id); \
(wb_cur); (wb_cur) = __wb_iter_next(iter, bdi)) (wb_cur); (wb_cur) = __wb_iter_next(iter, bdi))
#else /* CONFIG_CGROUP_WRITEBACK */ #else /* CONFIG_CGROUP_WRITEBACK */
......
This diff is collapsed.
...@@ -27,7 +27,7 @@ SUBSYS(cpuacct) ...@@ -27,7 +27,7 @@ SUBSYS(cpuacct)
#endif #endif
#if IS_ENABLED(CONFIG_BLK_CGROUP) #if IS_ENABLED(CONFIG_BLK_CGROUP)
SUBSYS(blkio) SUBSYS(io)
#endif #endif
#if IS_ENABLED(CONFIG_MEMCG) #if IS_ENABLED(CONFIG_MEMCG)
......
...@@ -266,6 +266,7 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn) ...@@ -266,6 +266,7 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
} }
int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen); int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
size_t kernfs_path_len(struct kernfs_node *kn);
char * __must_check kernfs_path(struct kernfs_node *kn, char *buf, char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
size_t buflen); size_t buflen);
void pr_cont_kernfs_name(struct kernfs_node *kn); void pr_cont_kernfs_name(struct kernfs_node *kn);
...@@ -332,6 +333,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn) ...@@ -332,6 +333,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
static inline int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen) static inline int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen)
{ return -ENOSYS; } { return -ENOSYS; }
static inline size_t kernfs_path_len(struct kernfs_node *kn)
{ return 0; }
static inline char * __must_check kernfs_path(struct kernfs_node *kn, char *buf, static inline char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
size_t buflen) size_t buflen)
{ return NULL; } { return NULL; }
......
This diff is collapsed.
...@@ -523,7 +523,7 @@ static int cgwb_create(struct backing_dev_info *bdi, ...@@ -523,7 +523,7 @@ static int cgwb_create(struct backing_dev_info *bdi,
int ret = 0; int ret = 0;
memcg = mem_cgroup_from_css(memcg_css); memcg = mem_cgroup_from_css(memcg_css);
blkcg_css = cgroup_get_e_css(memcg_css->cgroup, &blkio_cgrp_subsys); blkcg_css = cgroup_get_e_css(memcg_css->cgroup, &io_cgrp_subsys);
blkcg = css_to_blkcg(blkcg_css); blkcg = css_to_blkcg(blkcg_css);
memcg_cgwb_list = mem_cgroup_cgwb_list(memcg); memcg_cgwb_list = mem_cgroup_cgwb_list(memcg);
blkcg_cgwb_list = &blkcg->cgwb_list; blkcg_cgwb_list = &blkcg->cgwb_list;
...@@ -645,7 +645,7 @@ struct bdi_writeback *wb_get_create(struct backing_dev_info *bdi, ...@@ -645,7 +645,7 @@ struct bdi_writeback *wb_get_create(struct backing_dev_info *bdi,
/* see whether the blkcg association has changed */ /* see whether the blkcg association has changed */
blkcg_css = cgroup_get_e_css(memcg_css->cgroup, blkcg_css = cgroup_get_e_css(memcg_css->cgroup,
&blkio_cgrp_subsys); &io_cgrp_subsys);
if (unlikely(wb->blkcg_css != blkcg_css || if (unlikely(wb->blkcg_css != blkcg_css ||
!wb_tryget(wb))) !wb_tryget(wb)))
wb = NULL; wb = NULL;
......
...@@ -1289,7 +1289,7 @@ static void wb_update_dirty_ratelimit(struct dirty_throttle_control *dtc, ...@@ -1289,7 +1289,7 @@ static void wb_update_dirty_ratelimit(struct dirty_throttle_control *dtc,
wb->dirty_ratelimit = max(dirty_ratelimit, 1UL); wb->dirty_ratelimit = max(dirty_ratelimit, 1UL);
wb->balanced_dirty_ratelimit = balanced_dirty_ratelimit; wb->balanced_dirty_ratelimit = balanced_dirty_ratelimit;
trace_bdi_dirty_ratelimit(wb->bdi, dirty_rate, task_ratelimit); trace_bdi_dirty_ratelimit(wb, dirty_rate, task_ratelimit);
} }
static void __wb_update_bandwidth(struct dirty_throttle_control *gdtc, static void __wb_update_bandwidth(struct dirty_throttle_control *gdtc,
...@@ -1683,7 +1683,7 @@ static void balance_dirty_pages(struct address_space *mapping, ...@@ -1683,7 +1683,7 @@ static void balance_dirty_pages(struct address_space *mapping,
* do a reset, as it may be a light dirtier. * do a reset, as it may be a light dirtier.
*/ */
if (pause < min_pause) { if (pause < min_pause) {
trace_balance_dirty_pages(bdi, trace_balance_dirty_pages(wb,
sdtc->thresh, sdtc->thresh,
sdtc->bg_thresh, sdtc->bg_thresh,
sdtc->dirty, sdtc->dirty,
...@@ -1712,7 +1712,7 @@ static void balance_dirty_pages(struct address_space *mapping, ...@@ -1712,7 +1712,7 @@ static void balance_dirty_pages(struct address_space *mapping,
} }
pause: pause:
trace_balance_dirty_pages(bdi, trace_balance_dirty_pages(wb,
sdtc->thresh, sdtc->thresh,
sdtc->bg_thresh, sdtc->bg_thresh,
sdtc->dirty, sdtc->dirty,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment