Commits · 53bc7dc004fecf39e0ba70f2f8d120a1444315d3 · Kirill Smelkov / linux

24 Jul, 2015 3 commits

xen-blkback: replace work_pending with work_busy in purge_persistent_gnt() · 53bc7dc0

Bob Liu authored Jul 22, 2015

The BUG_ON() in purge_persistent_gnt() will be triggered when previous purge
work haven't finished.

There is a work_pending() before this BUG_ON, but it doesn't account if the work
is still currently running.

CC: stable@vger.kernel.org
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

53bc7dc0

xen-blkfront: don't add indirect pages to list when !feature_persistent · 7b076750

Bob Liu authored Jul 22, 2015

We should consider info->feature_persistent when adding indirect page to list
info->indirect_pages, else the BUG_ON() in blkif_free() would be triggered.

When we are using persistent grants the indirect_pages list
should always be empty because blkfront has pre-allocated enough
persistent pages to fill all requests on the ring.

CC: stable@vger.kernel.org
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

7b076750

xen-blkfront: introduce blkfront_gather_backend_features() · d50babbe

Bob Liu authored Jul 22, 2015

There is a bug when migrate from !feature-persistent host to feature-persistent
host, because domU still thinks new host/backend doesn't support persistent.
Dmesg like:
backed has not unmapped grant: 839
backed has not unmapped grant: 773
backed has not unmapped grant: 773
backed has not unmapped grant: 773
backed has not unmapped grant: 839

The fix is to recheck feature-persistent of new backend in blkif_recover().
See: https://lkml.org/lkml/2015/5/25/469

As Roger suggested, we can split the part of blkfront_connect that checks for
optional features, like persistent grants, indirect descriptors and
flush/barrier features to a separate function and call it from both
blkfront_connect and blkif_recover
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

d50babbe

22 Jun, 2015 1 commit

drivers: xen-blkfront: only talk_to_blkback() when in XenbusStateInitialising · a9b54bb9

Bob Liu authored Jun 19, 2015

Patch 69b91ede
"drivers: xen-blkback: delay pending_req allocation to connect_ring"
exposed an problem that Xen blkfront has. There is a race
with XenStored and the drivers such that we can see two:

vbd vbd-268440320: blkfront:blkback_changed to state 2.
vbd vbd-268440320: blkfront:blkback_changed to state 2.
vbd vbd-268440320: blkfront:blkback_changed to state 4.

state changes to XenbusStateInitWait ('2'). The end result is that
blkback_changed() receives two notify and calls twice setup_blkring().

While the backend driver may only get the first setup_blkring() which is
wrong and reads out-dated (or reads them as they are being updated
with new ring-ref values).

The end result is that the ring ends up being incorrectly set.

The other drivers in the tree have such checks already in.
Reported-and-Tested-by: Robert Butera <robert.butera@oracle.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

a9b54bb9

06 Jun, 2015 3 commits

xen/block: add multi-page ring support · 86839c56

Bob Liu authored Jun 03, 2015

Extend xen/block to support multi-page ring, so that more requests can be
issued by using more than one pages as the request ring between blkfront
and backend.
As a result, the performance can get improved significantly.

We got some impressive improvements on our highend iscsi storage cluster
backend. If using 64 pages as the ring, the IOPS increased about 15 times
for the throughput testing and above doubled for the latency testing.

The reason was the limit on outstanding requests is 32 if use only one-page
ring, but in our case the iscsi lun was spread across about 100 physical
drives, 32 was really not enough to keep them busy.

Changes in v2:
 - Rebased to 4.0-rc6.
 - Document on how multi-page ring feature working to linux io/blkif.h.

Changes in v3:
 - Remove changes to linux io/blkif.h and follow the protocol defined
   in io/blkif.h of XEN tree.
 - Rebased to 4.1-rc3

Changes in v4:
 - Turn to use 'ring-page-order' and 'max-ring-page-order'.
 - A few comments from Roger.

Changes in v5:
 - Clarify with 4k granularity to comment
 - Address more comments from Roger
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

86839c56

driver: xen-blkfront: move talk_to_blkback to a more suitable place · 8ab0144a

Bob Liu authored Jun 03, 2015

The major responsibility of talk_to_blkback() is allocate and initialize
the request ring and write the ring info to xenstore.
But this work should be done after backend entered 'XenbusStateInitWait' as
defined in the protocol file.
See xen/include/public/io/blkif.h in XEN git tree:
Front                                Back
=================================    =====================================
XenbusStateInitialising              XenbusStateInitialising
 o Query virtual device               o Query backend device identification
   properties.                          data.
 o Setup OS device instance.          o Open and validate backend device.
                                      o Publish backend features and
                                        transport parameters.
                                                     |
                                                     |
                                                     V
                                     XenbusStateInitWait

o Query backend features and
  transport parameters.
o Allocate and initialize the
  request ring.

There is no problem with this yet, but it is an violation of the design and
furthermore it would not allow frontend/backend to negotiate 'multi-page'
and 'multi-queue' features.

Changes in v2:
 - Re-write the commit message to be more clear.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

8ab0144a

drivers: xen-blkback: delay pending_req allocation to connect_ring · 69b91ede

Bob Liu authored Jun 03, 2015

This is a pre-patch for multi-page ring feature.
In connect_ring, we can know exactly how many pages are used for the shared
ring, delay pending_req allocation here so that we won't waste too much memory.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

69b91ede

05 Jun, 2015 5 commits

NVMe: Automatic namespace rescan · a5768aa8

Keith Busch authored Jun 01, 2015

Namespaces may be dynamically allocated and deleted or attached and
detached. This has the driver rescan the device for namespace changes
after each device reset or namespace change asynchronous event.

There could potentially be many detached namespaces that we don't want
polluting /dev/ with unusable block handles, so this will delete disks
if the namespace is not active as indicated by the response from identify
namespace. This also skips adding the disk if no capacity is provisioned
to the namespace in the first place.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

a5768aa8

Merge branch 'for-4.2/core' into for-4.2/drivers · b281ebb8
Jens Axboe authored Jun 05, 2015

b281ebb8

block: add blk_set_queue_dying() to blkdev.h · 3f21c265

Jens Axboe authored Jun 05, 2015

We export this function and NVMe wants to use it, but for some reason
it was never added to the block header. Do that.
Signed-off-by: Jens Axboe <axboe@fb.com>

3f21c265

NVMe: Memory barrier before queue_count is incremented · 36a7e993

Jon Derrick authored May 27, 2015

Protects against reordering and/or preempting which would allow the
kthread to access the queue descriptor before it is set up
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

36a7e993

NVMe: add sysfs and ioctl controller reset · 4cc06521

Keith Busch authored Jun 05, 2015

We need the ability to perform an nvme controller reset as discussed on
the mailing list thread:

  http://lists.infradead.org/pipermail/linux-nvme/2015-March/001585.html

This adds a sysfs entry that when written to will reset perform an NVMe
controller reset if the controller was successfully initialized in the
first place.

This also adds locking around resetting the device in the async probe
method so the driver can't schedule two resets.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: Brandon Schultz <brandon.schulz@hgst.com>
Cc: David Sariel <david.sariel@pmcs.com>

Updated by Jens to:

1) Merge this with the ioctl reset patch from David Sariel. The ioctl
   path now shares the reset code from the sysfs path.

2) Don't flush work if we fail issuing the reset.
Signed-off-by: Jens Axboe <axboe@fb.com>

4cc06521

02 Jun, 2015 2 commits

null_blk: restart request processing on completion handler · 8b70f45e

Akinobu Mita authored Jun 02, 2015

When irqmode=2 (IRQ completion handler is timer) and queue_mode=1
(Block interface to use is rq), the completion handler should restart
request handling for any pending requests on a queue because request
processing stops when the number of commands are queued more than
hw_queue_depth (null_rq_prep_fn returns BLKPREP_DEFER).

Without this change, the following command cannot finish.

	# modprobe null_blk irqmode=2 queue_mode=1 hw_queue_depth=1
	# fio --name=t --rw=read --size=1g --direct=1 \
	  --ioengine=libaio --iodepth=64 --filename=/dev/nullb0
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

8b70f45e

null_blk: prevent timer handler running on a different CPU where started · 419c21a3

Akinobu Mita authored Jun 02, 2015

When irqmode=2 (IRQ completion handler is timer), timer handler should
be called on the same CPU where the timer has been started.

Since completion_queues are per-cpu and the completion handler only
touches completion_queue for local CPU, we need to prevent the handler
from running on a different CPU where the timer has been started.
Otherwise, the IO cannot be completed until another completion handler
is executed on that CPU.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

419c21a3

01 Jun, 2015 3 commits

NVMe: Remove hctx reliance for multi-namespace · 42483228

Keith Busch authored Jun 01, 2015

The driver needs to track shared tags to support multiple namespaces
that may be dynamically allocated or deleted. Relying on the first
request_queue's hctx's is not appropriate as we cannot clear outstanding
tags for all namespaces using this handle, nor can the driver easily track
all request_queue's hctx as namespaces are attached/detached. Instead,
this patch uses the nvme_dev's tagset to get the shared tag resources
instead of through a request_queue hctx.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

42483228

Merge branch 'for-4.2/core' into for-4.2/drivers · 843e8ddb
Jens Axboe authored Jun 01, 2015

843e8ddb

blk-mq: Shared tag enhancements · f26cdc85

Keith Busch authored Jun 01, 2015

Storage controllers may expose multiple block devices that share hardware
resources managed by blk-mq. This patch enhances the shared tags so a
low-level driver can access the shared resources not tied to the unshared
h/w contexts. This way the LLD can dynamically add and delete disks and
request queues without having to track all the request_queue hctx's to
iterate outstanding tags.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

f26cdc85

29 May, 2015 4 commits

block: don't honor chunk sizes for data-less IO · e548ca4e

Jens Axboe authored May 29, 2015

We don't need to honor chunk sizes for IO that doesn't carry any
data.
Signed-off-by: Jens Axboe <axboe@fb.com>

e548ca4e

block: only honor SG gap prevention for merges that contain data · beefa6ba

Jens Axboe authored May 29, 2015

We can safely merge anything that wont generate an SG list entry,
so if the bio is data-less (discard), don't look at potential
SG gaps.
Signed-off-by: Jens Axboe <axboe@fb.com>

beefa6ba

NVMe: End sync requests immediately on failure · 75619bfa

Keith Busch authored May 28, 2015

Do not retry failed sync commands so the original status may be seen
without issuing unnecessary retries.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

75619bfa

NVMe: Use requested sync command timeout · f4ff414a

Keith Busch authored May 28, 2015

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

f4ff414a

26 May, 2015 1 commit

block: fix returnvar.cocci warnings · f6454b04

Julia Lawall authored May 26, 2015

Remove unneeded variable used to store return value.

Generated by: scripts/coccinelle/misc/returnvar.cocci
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Jens Axboe <axboe@fb.com>

f6454b04

22 May, 2015 12 commits

NVMe: Fix obtaining command result · a0a931d6

Keith Busch authored May 22, 2015

Replaces req->sense_len usage, which is not owned by the LLD, to
req->special to contain the command result for driver created commands,
and sets the result unconditionally on completion.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@fb.com>
Fixes: d29ec824 ("nvme: submit internal commands through the block layer")
Signed-off-by: Jens Axboe <axboe@fb.com>

a0a931d6

block, dm: don't copy bios for request clones · 5f1b670d

Christoph Hellwig authored May 22, 2015

Currently dm-multipath has to clone the bios for every request sent
to the lower devices, which wastes cpu cycles and ties down memory.

This patch instead adds a new REQ_CLONE flag that instructs req_bio_endio
to not complete bios attached to a request, which we set on clone
requests similar to bios in a flush sequence.  With this change I/O
errors on a path failure only get propagated to dm-multipath, which
can then either resubmit the I/O or complete the bios on the original
request.

I've done some basic testing of this on a Linux target with ALUA support,
and it survives path failures during I/O nicely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

5f1b670d

block: remove management of bi_remaining when restoring original bi_end_io · 326e1dbb

Mike Snitzer authored May 22, 2015

Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for
non-chains") regressed all existing callers that followed this pattern:
 1) saving a bio's original bi_end_io
 2) wiring up an intermediate bi_end_io
 3) restoring the original bi_end_io from intermediate bi_end_io
 4) calling bio_endio() to execute the restored original bi_end_io

The regression was due to BIO_CHAIN only ever getting set if
bio_inc_remaining() is called.  For the above pattern it isn't set until
step 3 above (step 2 would've needed to establish BIO_CHAIN).  As such
the first bio_endio(), in step 2 above, never decremented __bi_remaining
before calling the intermediate bi_end_io -- leaving __bi_remaining with
the value 1 instead of 0.  When bio_inc_remaining() occurred during step
3 it brought it to a value of 2.  When the second bio_endio() was
called, in step 4 above, it should've called the original bi_end_io but
it didn't because there was an extra reference that wasn't dropped (due
to atomic operations being optimized away since BIO_CHAIN wasn't set
upfront).

Fix this issue by removing the __bi_remaining management complexity for
all callers that use the above pattern -- bio_chain() is the only
interface that _needs_ to be concerned with __bi_remaining.  For the
above pattern callers just expect the bi_end_io they set to get called!
Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls
that aren't associated with the bio_chain() interface.

Also, the bio_inc_remaining() interface has been moved local to bio.c.

Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

326e1dbb

nvme: submit internal commands through the block layer · d29ec824

Christoph Hellwig authored May 22, 2015

Use block layer queues with an internal cmd_type to submit internally
generated NVMe commands. This both simplifies the code a lot and allow
for a better structure. For example now the LighNVM code can construct
commands without knowing the details of the underlying I/O descriptors.
Or a future NVMe over network target could inject commands, as well as
could the SCSI translation and ioctl code be reused for such a beast.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

d29ec824

nvme: fail SCSI read/write command with unsupported protection bit · 772ce435
Christoph Hellwig authored May 22, 2015
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
```
772ce435

nvme: report the DPOFUA in MODE_SENSE · 90851768

Christoph Hellwig authored May 22, 2015

NVMe device always support the FUA bit, and the SCSI translations
accepts the DPO bit, which doesn't have much of a meaning for us.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

90851768

nvme: simplify and cleanup the READ/WRITE SCSI CDB parsing code · cbbb7a2e
Christoph Hellwig authored May 22, 2015
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
```
cbbb7a2e
nvme: first round at deobsfucating the SCSI translation code · 3726897e
Christoph Hellwig authored May 22, 2015
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
```
3726897e

nvme: fix scsi translation error handling · e61b0a86

Christoph Hellwig authored May 22, 2015

Erorr handling for the scsi translation was completely broken, as there
were two different positive error number spaces overlapping.  Fix this
up by removing one of them, and centralizing the generation of the other
positive values in a single place.  Also fix up a few places that didn't
handle the NVMe error codes properly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

e61b0a86

nvme: split nvme_trans_send_fw_cmd · b90c48d0

Christoph Hellwig authored May 22, 2015

This function handles two totally different opcodes, so split it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

b90c48d0

nvme: store a struct device pointer in struct nvme_dev · e75ec752

Christoph Hellwig authored May 22, 2015

Most users want the generic device, so store that in struct nvme_dev
instead of the pci_dev.  This also happens to be a nice step towards
making some code reusable for non-PCI transports.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

e75ec752

nvme: consolidate synchronous command submission helpers · f705f837

Christoph Hellwig authored May 22, 2015

Note that we keep the unused timeout argument, but allow callers to
pass 0 instead of a timeout if they want the default.  This will allow
adding a timeout to the pass through path later on.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

f705f837

20 May, 2015 6 commits

loop: remove (now) unused 'out' label · 6a927007

Jens Axboe authored May 20, 2015

gcc, righfully, complains:

drivers/block/loop.c:1369:1: warning: label 'out' defined but not used [-Wunused-label]

Kill it.
Signed-off-by: Jens Axboe <axboe@fb.com>

6a927007

s390/block/dasd: remove obsolete while -EBUSY loop · a05e5780

Jarod Wilson authored May 06, 2015

With the mutex_trylock bit gone from blkdev_reread_part(), the retry logic
in dasd_scan_partitions() shouldn't be necessary.

CC: Christoph Hellwig <hch@infradead.org>
CC: Jens Axboe <axboe@kernel.dk>
CC: Tejun Heo <tj@kernel.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Markus Pargmann <mpa@pengutronix.de>
CC: Stefan Weinhuber <wein@de.ibm.com>
CC: Stefan Haberland <stefan.haberland@de.ibm.com>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: Fabian Frederick <fabf@skynet.be>
CC: Ming Lei <ming.lei@canonical.com>
CC: David Herrmann <dh.herrmann@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: nbd-general@lists.sourceforge.net
CC: linux-s390@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

a05e5780

block: dasd_genhd: convert to blkdev_reread_part · 6029a06c

Ming Lei authored May 06, 2015

Also remove the obsolete comment.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

6029a06c

block: nbd: convert to blkdev_reread_part() · 9dcd1379

Ming Lei authored May 06, 2015

Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

9dcd1379

block: loop: fix another reread part failure · 06f0e9e6

Ming Lei authored May 06, 2015

loop_clr_fd() can be run piggyback with lo_release(), and
under this situation, reread partition may always fail because
bd_mutex has been held already.

This patch detects the situation by the reference count, and
call __blkdev_reread_part() to avoid acquiring the lock again.

In the meantime, this patch switches to new kernel APIs
of blkdev_reread_part() and __blkdev_reread_part().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

06f0e9e6

block: loop: don't hold lo_ctl_mutex in lo_open · f8933667

Ming Lei authored May 06, 2015

The lo_ctl_mutex is held for running all ioctl handlers, and
in some ioctl handlers, ioctl_by_bdev(BLKRRPART) is called for
rereading partitions, which requires bd_mutex.

So it is easy to cause failure because trylock(bd_mutex) may
fail inside blkdev_reread_part(), and follows the lock context:

blkid or other application:
	->open()
		->mutex_lock(bd_mutex)
		->lo_open()
			->mutex_lock(lo_ctl_mutex)

losetup(set fd ioctl):
	->mutex_lock(lo_ctl_mutex)
	->ioctl_by_bdev(BLKRRPART)
		->trylock(bd_mutex)

This patch trys to eliminate the ABBA lock dependency by removing
lo_ctl_mutext in lo_open() with the following approach:

1) make lo_refcnt as atomic_t and avoid acquiring lo_ctl_mutex in lo_open():
	- for open vs. add/del loop, no any problem because of loop_index_mutex
	- freeze request queue during clr_fd, so I/O can't come until
	  clearing fd is completed, like the effect of holding lo_ctl_mutex
	  in lo_open
	- both open() and release() have been serialized by bd_mutex already

2) don't hold lo_ctl_mutex for decreasing/checking lo_refcnt in
lo_release(), then lo_ctl_mutex is only required for the last release.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

f8933667