Commits · 99d540018caa920b7a54e2d3048f1dff530b294b · Kirill Smelkov / linux

05 Aug, 2014 1 commit
- Merge branch 'for-jens' of http://evilpiepirate.org/git/linux-bcache into for-3.17/drivers · 99d54001
  Jens Axboe authored Aug 05, 2014
```
Kent writes:

Hey Jens, here's the pull request for 3.17 - typically late, but lots of
tasty fixes in this one.
```
  99d54001
04 Aug, 2014 22 commits

bcache: Drop unneeded blk_sync_queue() calls · 0781c874

Kent Overstreet authored Jul 07, 2014

this is needed for the queue/block device we created (it's done by
blk_cleanup_queue() which we do call) - but calling it for the block devices we
only opened is pointless.

Change-Id: I53dfded14ed15b9581d10ca8399d5e1b3abbf9f2

0781c874

bcache: add mutex lock for bch_is_open · 789d21db

Jianjian Huo authored Jul 13, 2014

Since bch_is_open will iterate linked list bch_cache_sets and
uncached_devices, it needs bch_register_lock.
Signed-off-by: Jianjian Huo <samuel.huo@gmail.com>

789d21db

bcache: Correct printing of btree_gc_max_duration_ms · 5b25abad

Surbhi Palande authored Apr 17, 2014

time_stats::btree_gc_max_duration_mc is not bit shifted by 8

Fixes BUG #138

Change-Id: I44fc6e1d0579674016acc533f1a546b080e5371a
Signed-off-by: Surbhi Palande <sap@daterainc.com>

5b25abad

bcache: try to set b->parent properly · 2452cc89

Slava Pestov authored Jul 12, 2014

bcache_flash_dev.ktest would reliably crash with 8k and 16k bucket size
before; now it passes.

Change-Id: Ib542232235e39298c3a7548fe52b645cabb823d1

2452cc89

bcache: fix memory corruption in init error path · c9a78332

Slava Pestov authored Jun 19, 2014

If register_cache_set() failed, we would touch ca->set after
it had already been freed. Also, fix an assertion to catch
this.

Change-Id: I748e5f5b223e2d9b2602075dec2f997cced2394d

c9a78332

bcache: fix crash with incomplete cache set · bf0c55c9
Slava Pestov authored Jul 11, 2014
```
Change-Id: I6abde52afe917633480caaf4e2518f42a816d886
```
bf0c55c9
bcache: Fix more early shutdown bugs · d83353b3
Kent Overstreet authored Jun 11, 2014
```
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
```
d83353b3

bcache: fix use-after-free in btree_gc_coalesce() · 400ffaa2

Slava Pestov authored Jul 12, 2014

If we goto out_nocoalesce after we free new_nodes[0], we end up freeing
new_nodes[0] again. This was generating a lockdep warning. The fix is
to set new_nodes[0] to NULL, since the out_nocoalesce path safely
ignores NULL entries in the new_nodes array.

This regression was introduced in 2d7f9531.

Change-Id: I76564d7257800583214376b4bacf236cda90c89c

400ffaa2

bcache: Fix an infinite loop in journal replay · 6b708de6

Kent Overstreet authored Jun 02, 2014

When running with multiple cache devices, if one of the devices has a completely
empty journal but we'd already found some journal entries on a previosu device
we'd go into an infinite loop.

Change-Id: I1dcdc0d738192746de28f40e8b08825b0dea5e2b
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

6b708de6

bcache: fix crash in bcache_btree_node_alloc_fail tracepoint · 913dc33f
Slava Pestov authored May 23, 2014
```
'b' was NULL.

Change-Id: Icac0fd04afa2d23f213d96d51afd53374e6dd0c0
```
913dc33f
bcache: bcache_write tracepoint was crashing · 60ae81ee
Slava Pestov authored May 22, 2014
```
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
```
60ae81ee
bcache: fix typo in bch_bkey_equal_header · 8e094808
Slava Pestov authored Jun 30, 2014
```
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
```
8e094808

bcache: Allocate bounce buffers with GFP_NOWAIT · 501d52a9

Kent Overstreet authored May 19, 2014

There's no point in blocking on these allocations, since our fallback paths will
probably go faster than blocking.

Change-Id: I733ca202c25cb36bde02607a0a60552229a4241c

501d52a9

bcache: Make sure to pass GFP_WAIT to mempool_alloc() · bcf090e0

Kent Overstreet authored May 19, 2014

this was very wrong - mempool_alloc() only guarantees success with GFP_WAIT.
bcache uses GFP_NOWAIT in various other places where we have a fallback,
circuits must've gotten crossed when writing this code or something.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

bcf090e0

bcache: fix uninterruptible sleep in writeback thread · 9e5c3535

Slava Pestov authored May 01, 2014

There were two issues here:

- writeback thread did not start until the device first became dirty
- writeback thread used uninterruptible sleep once running

Without this patch I see kernel warnings printed and a load average of
1.52 after booting my test VM. With this patch the warnings are gone and
the load average is near 0.00 as expected.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

9e5c3535

bcache: wait for buckets when allocating new btree root · c5aa4a31

Slava Pestov authored Apr 21, 2014

Tested:
- sometimes bcache_tier test would hang on startup with a failure
  to allocate the btree root -- no longer seeing this
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

c5aa4a31

bcache: fix crash on shutdown in passthrough mode · a664d0f0
Slava Pestov authored May 20, 2014
```
We never started the writeback thread in this case, so don't stop it.
```
a664d0f0
bcache: fix lockdep warnings on shutdown · e5112201
Slava Pestov authored Apr 29, 2014

e5112201
bcache allocator: send discards with correct size · 8b326d3a
Slava Pestov authored Apr 21, 2014

8b326d3a

bcache: Fix to remove the rcu_sched stalls. · dbd810ab

Surbhi Palande authored Apr 10, 2014

while loop was executing infinitely.
This fix ends the while loop gracefully.
Signed-off-by: Surbhi Palande <sap@daterainc.com>
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

dbd810ab

bcache: Fix a journal replay bug · 9aa61a99

Kent Overstreet authored Apr 10, 2014

journal replay wansn't validating pointers with bch_extent_invalid() before
derefing, fixed
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

9aa61a99

bcache: Fix a bug when detaching · 5b1016e6

Kent Overstreet authored Mar 19, 2014

After detaching a backing device from a cache set, a bit wasn't getting
reset meaning the second detach wouldn't work correctly.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

5b1016e6

10 Jul, 2014 17 commits

drbd: silence underflow warning in read_in_block() · bf0d6e4a

Dan Carpenter authored May 06, 2014

My static checker warns that "data_size" could be negative and underflow
the limit check.  The code looks suspicious but I don't know if it is a
real bug.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

bf0d6e4a

drbd: implicitly truncate cpu-mask · 1e39152f

Lars Ellenberg authored May 19, 2014

Don't error out with misleading "out of memory"
if the cpu-mask has more bits set than there are CPUs.
Just truncate to nr_cpu_ids implicitly.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

1e39152f

drbd: drop spurious parameters from _drbd_md_sync_page_io · 193cb00c

Lars Ellenberg authored Apr 02, 2014

size is always 4096,
page is always device->md_io.page.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

193cb00c

drbd: resync should only lock out specific ranges · f5b90b6b

Lars Ellenberg authored May 07, 2014

During resync, if we need to block some specific incoming write because
of active resync requests to that same range, we potentially caused
*all* new application writes (to "cold" activity log extents) to block
until this one request has been processed.

Improve the do_submit() logic to
 * grab all incoming requests to some "incoming" list
 * process this list
   - move aside requests that are blocked by resync
   - prepare activity log transactions,
   - commit transactions and submit corresponding requests
   - if there are remaining requests that only wait for
     activity log extents to become free, stop the fast path
     (mark activity log as "starving")
   - iterate until no more requests are waiting for the activity log,
     but all potentially remaining requests are only blocked by resync
 * only then grab new incoming requests

That way, very busy IO on currently "hot" activity log extents cannot
starve scattered IO to "cold" extents. And blocked-by-resync requests
are processed once resync traffic on the affected region has ceased,
without blocking anything else.

The only blocking mode left is when we cannot start requests to "cold"
extents because all currently "hot" extents are actually used.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

f5b90b6b

drbd: debugfs: add per device data_gen_id · cc356f85

Lars Ellenberg authored May 14, 2014

The data generation identifiers used to be exposed via sysfs
at /sys/block/drbdX/drbd/meta_data/data_gen_id (out-of-tree),
for advanced policy scripting.
Bring that information over to debugfs.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

cc356f85

drbd: debugfs: add per connection oldest requests · 3d299f48

Lars Ellenberg authored May 14, 2014

Information of former /sys/block/drbdX/drbd/oldest_requests
is already with higher detail in these files:
 debugfs/drbd/resource/$name/in_flight_summary,
 debugfs/drbd/resource/$name/volumes/$vnr/oldest_requests

This patch adds
 debugfs/drbd/resource/$name/connections/peer/oldest_requests
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

3d299f48

drbd: debugfs: add version tag to debugfs files · b44e1184

Lars Ellenberg authored May 06, 2014

Make the first line of debugfs files a version number,
starting now with "v: 0".

If we change content of presentation, we will bump that.
Monitoring or diagnostic scritps that may parse these files
can then easily know when they need to be reviewed.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

b44e1184

drbd: debugfs: add per volume oldest_requests · 54e6fc38

Lars Ellenberg authored May 08, 2014

Show oldest requests
 * pending master bio completion and,
 * if different, local disk bio completion.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

54e6fc38

drbd: debugfs: add callback_history · 944410e9

Lars Ellenberg authored May 06, 2014

Add a per-connection worker thread callback_history
with timing details, call site and callback function.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

944410e9

drbd: debugfs: Add in_flight_summary · f418815f

Lars Ellenberg authored May 05, 2014

* Add details about pending meta data operations to in_flight_summary.

* Report number of requests waiting for activity log transactions.

* timing details of peer_requests to in_flight_summary.

* FLUSH details
  DRBD devides the incoming request stream into "epochs",
  in which peers are allowed to re-order writes independendly.

  These epochs are separated by P_BARRIER on the replication link.
  Such barrier packets, depending on configuration, may cause
  the receiving side to drain the lower level device request queues
  and call blkdev_issue_flush().

  This is known to be an other major source of latency in DRBD.

  Track timing details of calls to blkdev_issue_flush(),
  and add them to in_flight_summary.

* data socket stats
  To be able to diagnose bottlenecks and root causes of "slow" IO on DRBD,
  it is useful to see network buffer stats along with the timing details of
  requests, peer requests, and meta data IO.

* pending bitmap IO timing details to in_flight_summary.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

f418815f

drbd: debugfs: deal with destructor racing with open of debugfs file · 4a521cca

Lars Ellenberg authored May 05, 2014

Try to close the race between open() and debugfs_remove_recursive()
from inside an object destructor.
Once open succeeds, the object should stay around.
Open should not succeed if the object has already reached its destructor.

This may be overkill, but to make that happen, we check for existence of
a parent directory, "stale-ness" of "this" dentry, and serialize
kref_get_unless_zero() on the outermost object relevant for this file
with d_delete() on this dentry (using the parent's i_mutex).
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

4a521cca

drbd: debugfs: add in_flight_summary data · db1866ff

Lars Ellenberg authored May 02, 2014

To help diagnosing "high latency" or "hung" IO situations on DRBD,
present per drbd resource group a summary of operations currently in progress.

First item is a list of oldest drbd_request objects
waiting for various things:
 * still being prepared
 * waiting for activity log transaction
 * waiting for local disk
 * waiting to be sent
 * waiting for peer acknowledgement ("receive ack", "write ack")
 * waiting for peer epoch acknowledgement ("barrier ack")
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

db1866ff

drbd: debugfs: add basic hierarchy · 4d3d5aa8

Lars Ellenberg authored May 02, 2014

Add new debugfs hierarchy /sys/kernel/debug/
  drbd/
    resources/
      $resource_name/connections/peer/$volume_number/
      $resource_name/volumes/$volume_number/
    minors/$minor_number -> ../resources/$resource_name/volumes/$volume_number/

Followup commits will populate this hierarchy with files containing
statistics, diagnostic information and some attribute data.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

4d3d5aa8

drbd: track details of bitmap IO · 4ce49266

Lars Ellenberg authored May 06, 2014

Track start and submit time of bitmap operations, and
add pending bitmap IO contexts to a new pending_bitmap_io list.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

4ce49266

drbd: register peer requests on read_ee early · c5a2c150

Lars Ellenberg authored May 08, 2014

Initialize peer_request with timestamp and proper empty list head.
Add peer_request to list early, so debugfs can find this request and
report it as "preparing", even if we sleep before we actually submit it.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

c5a2c150

drbd: track timing details of peer_requests · 21ae5d7f

Lars Ellenberg authored May 05, 2014

To be able to present timing details in debugfs,
we need to track preparation/submit times of peer requests.

Track peer request flags early,
before they are put on the epoch_entry lists.

Waiting for activity log transactions may be a major latency factor.
We want to be able to present the peer_request state accurately in
debugfs, and what it is waiting for.

Consistently mark/unmark peer requests with EE_CALL_AL_COMPLETE_IO.
Set it only *after* calling drbd_al_begin_io(),
clear it as soon as we call drbd_al_complete_io().
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

21ae5d7f

drbd: improve throttling decisions of background resynchronisation · ad3fee79

Lars Ellenberg authored Dec 20, 2013

Background resynchronisation does some "side-stepping", or throttles
itself, if it detects application IO activity, and the current resync
rate estimate is above the configured "cmin-rate".

What was not detected: if there is no application IO,
because it blocks on activity log transactions.

Introduce a new atomic_t ap_actlog_cnt, tracking such blocked requests,
and count non-zero as application IO activity.
This counter is exposed at proc_details level 2 and above.

Also make sure to release the currently locked resync extent
if we side-step due to such voluntary throttling.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

ad3fee79