1. 24 Apr, 2009 5 commits
    • Jens Axboe's avatar
      cfq-iosched: clear ->prio_trees[] on cfqd alloc · 26a2ac00
      Jens Axboe authored
      Not strictly needed, but we should make it clear that we init the
      rbtree roots here.
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      26a2ac00
    • Hannes Reinecke's avatar
      block: fix intermittent dm timeout based oops · 17d5c8ca
      Hannes Reinecke authored
      Very rarely under stress testing of dm, oopses are occuring as
      something tampers with an old stack frame.  This has been traced back
      to blk_abort_queue() leaving a timeout_list pointing to the stack.
      The reason is that sometimes blk_abort_request() won't delete the
      timer (if the request is marked as complete but before the timer has
      been removed, a small race window).  Fix this by splicing back from
      the ususally empty list to the q->timeout_list.
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      17d5c8ca
    • Sage Weil's avatar
      umem: fix request_queue lock warning · f3c737de
      Sage Weil authored
      The umem driver issues two warnings on boot, due to blk_plug_device() and
      blk_remove_plug() being called without q->queue_lock held.  Starting with
      e48ec690 (block: extend queue_flag bitops), the queue_flag_* functions
      warn if q->queue_lock doesn't appear to be locked.  In fact, q->queue_lock
      is NULL (though that apparently isn't otherwise a problem as the driver is
      using card->lock for everything).
      
      Although blk_init_queue() with take a request_fn_proc and spinlock_t*,
      there isn't a corresponding init helper that takes a make_request_fn.
      Setting queue_lock to &card->lock explicitly seems to work fine for me.
      The warning goes away and the device appears to behave.
      
      [    1.531881] v2.3 : Micro Memory(tm) PCI memory board block driver
      [    1.538136] umem 0000:02:01.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
      [    1.545018] umem 0000:02:01.0: Micro Memory(tm) controller found (PCI Mem Module (Battery Backup))
      [    1.554176] umem 0000:02:01.0: CSR 0xfc9ffc00 -> 0xffffc200013d0c00 (0x100)
      [    1.561279] umem 0000:02:01.0: Size 1048576 KB, Battery 1 Disabled (FAILURE), Battery 2 Disabled (FAILURE)
      [    1.571114] umem 0000:02:01.0: Window size 16777216 bytes, IRQ 20
      [    1.577304] umem 0000:02:01.0: memory NOT initialized. Consider over-writing whole device.
      [    1.585989]  umema:<4>------------[ cut here ]------------
      [    1.591775] WARNING: at include/linux/blkdev.h:492 blk_plug_device+0x6d/0x106()
      [    1.592025] Hardware name: H8SSL
      [    1.592025] Modules linked in:
      [    1.592025] Pid: 1, comm: swapper Not tainted 2.6.29 #8
      [    1.592025] Call Trace:
      [    1.592025]  [<ffffffff8023c994>] warn_slowpath+0xd3/0xf2
      [    1.592025]  [<ffffffff8025a5b5>] ? save_trace+0x3f/0x9b
      [    1.592025]  [<ffffffff8025a68b>] ? add_lock_to_list+0x7a/0xba
      [    1.592025]  [<ffffffff8025e609>] ? validate_chain+0xb3b/0xce8
      [    1.592025]  [<ffffffff80441556>] ? mm_make_request+0x27/0x59
      [    1.592025]  [<ffffffff80441556>] ? mm_make_request+0x27/0x59
      [    1.592025]  [<ffffffff8025ef04>] ? __lock_acquire+0x74e/0x7b9
      [    1.592025]  [<ffffffff8025a70e>] ? get_lock_stats+0x34/0x5e
      [    1.592025]  [<ffffffff8025a746>] ? put_lock_stats+0xe/0x27
      [    1.592025]  [<ffffffff80441556>] ? mm_make_request+0x27/0x59
      [    1.592025]  [<ffffffff803ad165>] blk_plug_device+0x6d/0x106
      [    1.592025]  [<ffffffff80441575>] mm_make_request+0x46/0x59
      [    1.592025]  [<ffffffff803ac2d9>] generic_make_request+0x335/0x3cf
      [    1.592025]  [<ffffffff8027fcc7>] ? mempool_alloc_slab+0x11/0x13
      [    1.592025]  [<ffffffff8027fdce>] ? mempool_alloc+0x45/0x101
      [    1.592025]  [<ffffffff8025a746>] ? put_lock_stats+0xe/0x27
      [    1.592025]  [<ffffffff803adda5>] submit_bio+0x10a/0x119
      [    1.592025]  [<ffffffff802c8d00>] submit_bh+0xe5/0x109
      [    1.592025]  [<ffffffff802cbf43>] block_read_full_page+0x2aa/0x2cb
      [    1.592025]  [<ffffffff802cf4c4>] ? blkdev_get_block+0x0/0x4c
      [    1.592025]  [<ffffffff805c90a8>] ? _spin_unlock_irq+0x36/0x51
      [    1.592025]  [<ffffffff80286836>] ? __lru_cache_add+0x92/0xb2
      [    1.592025]  [<ffffffff802cf008>] blkdev_readpage+0x13/0x15
      [    1.592025]  [<ffffffff8027de06>] read_cache_page_async+0x90/0x134
      [    1.592025]  [<ffffffff802ceff5>] ? blkdev_readpage+0x0/0x15
      [    1.592025]  [<ffffffff802f5f1c>] ? adfspart_check_ICS+0x0/0x16c
      [    1.592025]  [<ffffffff8027deb8>] read_cache_page+0xe/0x45
      [    1.592025]  [<ffffffff802f5170>] read_dev_sector+0x2e/0x93
      [    1.592025]  [<ffffffff802f5f44>] adfspart_check_ICS+0x28/0x16c
      [    1.592025]  [<ffffffff8025d427>] ? trace_hardirqs_on+0xd/0xf
      [    1.592025]  [<ffffffff802f5f1c>] ? adfspart_check_ICS+0x0/0x16c
      [    1.592025]  [<ffffffff802f59c5>] rescan_partitions+0x168/0x2fb
      [    1.592025]  [<ffffffff802ceae9>] __blkdev_get+0x259/0x336
      [    1.592025]  [<ffffffff803ca1e2>] ? kobject_put+0x47/0x4b
      [    1.592025]  [<ffffffff802cebd1>] blkdev_get+0xb/0xd
      [    1.592025]  [<ffffffff802f5773>] register_disk+0xc4/0x12b
      [    1.592025]  [<ffffffff803b2a7b>] add_disk+0xc3/0x12d
      [    1.592025]  [<ffffffff808a1d4a>] ? mm_init+0x0/0x1a5
      [    1.592025]  [<ffffffff808a1e73>] mm_init+0x129/0x1a5
      [    1.592025]  [<ffffffff808a1d4a>] ? mm_init+0x0/0x1a5
      [    1.592025]  [<ffffffff80209056>] _stext+0x56/0x130
      [    1.592025]  [<ffffffff80274932>] ? register_irq_proc+0xae/0xca
      [    1.592025]  [<ffffffff802f0000>] ? proc_pid_lookup+0xb4/0x18b
      [    1.592025]  [<ffffffff8087f975>] kernel_init+0x132/0x18b
      [    1.592025]  [<ffffffff8020d17a>] child_rip+0xa/0x20
      [    1.592025]  [<ffffffff8020cb40>] ? restore_args+0x0/0x30
      [    1.592025]  [<ffffffff8087f843>] ? kernel_init+0x0/0x18b
      [    1.592025]  [<ffffffff8020d170>] ? child_rip+0x0/0x20
      [    1.592025] ---[ end trace 7150b3b86da74e1e ]---
      [    1.889858] ------------[ cut here ]------------[ve_plug+0x5f/0x91()
      [    1.893848] Hardware name: H8SSL
      [    1.893848] Modules linked in:
      [    1.893848] Pid: 1, comm: swapper Tainted: G        W  2.6.29 #8
      [    1.893848] Call Trace:
      [    1.893848]  [<ffffffff8023c994>] warn_slowpath+0xd3/0xf2
      [    1.893848]  [<ffffffff805c8411>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [    1.893848]  [<ffffffff8020cb40>] ? restore_args+0x0/0x30
      [    1.893848]  [<ffffffff80254245>] ? __atomic_notifier_call_chain+0x0/0xb2
      [    1.893848]  [<ffffffff805c90a3>] ? _spin_unlock_irq+0x31/0x51
      [    1.893848]  [<ffffffff805c90bf>] ? _spin_unlock_irq+0x4d/0x51
      [    1.893848]  [<ffffffff8044157d>] ? mm_make_request+0x4e/0x59
      [    1.893848]  [<ffffffff8025a70e>] ? get_lock_stats+0x34/0x5e
      [    1.893848]  [<ffffffff8025a75d>] ? put_lock_stats+0x25/0x27
      [    1.893848]  [<ffffffff80441504>] ? mm_unplug_device+0x25/0x50
      [    1.893848]  [<ffffffff803acf23>] blk_remove_plug+0x5f/0x91
      [    1.893848]  [<ffffffff8044150f>] mm_unplug_device+0x30/0x50
      [    1.893848]  [<ffffffff803ab74a>] blk_unplug+0x78/0x7d
      [    1.893848]  [<ffffffff803ab75c>] blk_backing_dev_unplug+0xd/0xf
      [    1.893848]  [<ffffffff802c853c>] block_sync_page+0x4a/0x4c
      [    1.893848]  [<ffffffff8027da1c>] sync_page+0x44/0x4d
      [    1.893848]  [<ffffffff805c66fd>] __wait_on_bit_lock+0x42/0x8a
      [    1.893848]  [<ffffffff8027d9d8>] ? sync_page+0x0/0x4d
      [    1.893848]  [<ffffffff8027d9c4>] __lock_page+0x64/0x6b
      [    1.893848]  [<ffffffff802508db>] ? wake_bit_function+0x0/0x2a
      [    1.893848]  [<ffffffff8027de4a>] read_cache_page_async+0xd4/0x134
      [    1.893848]  [<ffffffff802ceff5>] ? blkdev_readpage+0x0/0x15
      [    1.893848]  [<ffffffff802f5f1c>] ? adfspart_check_ICS+0x0/0x16c
      [    1.893848]  [<ffffffff8027deb8>] read_cache_page+0xe/0x45
      [    1.893848]  [<ffffffff802f5170>] read_dev_sector+0x2e/0x93
      [    1.893848]  [<ffffffff802f5f44>] adfspart_check_ICS+0x28/0x16c
      [    1.893848]  [<ffffffff8025d427>] ? trace_hardirqs_on+0xd/0xf
      [    1.893848]  [<ffffffff802f5f1c>] ? adfspart_check_ICS+0x0/0x16c
      [    1.893848]  [<ffffffff802f59c5>] rescan_partitions+0x168/0x2fb
      [    1.893848]  [<ffffffff802ceae9>] __blkdev_get+0x259/0x336
      [    1.893848]  [<ffffffff803ca1e2>] ? kobject_put+0x47/0x4b
      [    1.893848]  [<ffffffff802cebd1>] blkdev_get+0xb/0xd
      [    1.893848]  [<ffffffff802f5773>] register_disk+0xc4/0x12b
      [    1.893848]  [<ffffffff803b2a7b>] add_disk+0xc3/0x12d
      [    1.893848]  [<ffffffff808a1d4a>] ? mm_init+0x0/0x1a5
      [    1.893848]  [<ffffffff808a1e73>] mm_init+0x129/0x1a5
      [    1.893848]  [<ffffffff808a1d4a>] ? mm_init+0x0/0x1a5
      [    1.893848]  [<ffffffff80209056>] _stext+0x56/0x130
      [    1.893848]  [<ffffffff80274932>] ? register_irq_proc+0xae/0xca
      [    1.893848]  [<ffffffff802f0000>] ? proc_pid_lookup+0xb4/0x18b
      [    1.893848]  [<ffffffff8087f975>] kernel_init+0x132/0x18b
      [    1.893848]  [<ffffffff8020d17a>] child_rip+0xa/0x20
      [    1.893848]  [<ffffffff8020cb40>] ? restore_args+0x0/0x30
      [    1.893848]  [<ffffffff8087f843>] ? kernel_init+0x0/0x18b
      [    1.893848]  [<ffffffff8020d170>] ? child_rip+0x0/0x20
      [    1.893848] ---[ end trace 7150b3b86da74e1f ]---
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      f3c737de
    • Jerome Marchand's avatar
      block: simplify I/O stat accounting · 42dad764
      Jerome Marchand authored
      This simplifies I/O stat accounting switching code and separates it
      completely from I/O scheduler switch code.
      
      Requests are accounted according to the state of their request queue
      at the time of the request allocation. There is no need anymore to
      flush the request queue when switching I/O accounting state.
      Signed-off-by: default avatarJerome Marchand <jmarchan@redhat.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      42dad764
    • Alexander Beregalov's avatar
      pktcdvd.h should include mempool.h · 097102c2
      Alexander Beregalov authored
      Fix this build error:
      In file included from fs/compat_ioctl.c:104:
      include/linux/pktcdvd.h:285: error: expected specifier-qualifier-list before 'mempool_t'
      Signed-off-by: default avatarAlexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      097102c2
  2. 22 Apr, 2009 14 commits
    • Jeff Moyer's avatar
      cfq-iosched: use the default seek distance when there aren't enough seek samples · 04dc6e71
      Jeff Moyer authored
      If the cfq io context doesn't have enough samples yet to provide a mean
      seek distance, then use the default threshold we have for seeky IO instead
      of defaulting to 0.
      Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      04dc6e71
    • Jeff Moyer's avatar
      cfq-iosched: make seek_mean converge more quickly · 4d00aa47
      Jeff Moyer authored
      Right now, depending on the first sector to which a process issues I/O,
      the seek time may start out way out of whack. So make sure we start
      with 0 sectors in seek, instead of the offset of the first request
      issued.
      Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      4d00aa47
    • Jens Axboe's avatar
      block: make blk_abort_queue() ignore non-request based devices · b7591134
      Jens Axboe authored
      There's nothing to do for those devices, since the timeout handling is
      based on requests.
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      b7591134
    • Tejun Heo's avatar
      block: include empty disks in /proc/diskstats · 71982a40
      Tejun Heo authored
      /proc/diskstats used to show stats for all disks whether they're
      zero-sized or not and their non-zero partitions.  Commit
      074a7aca accidentally changed the
      behavior such that it doesn't print out zero sized disks.  This patch
      implements DISK_PITER_INCL_EMPTY_PART0 flag to partition iterator and
      uses it in diskstats_show() such that empty part0 is shown in
      /proc/diskstats.
      
      Reported and bisectd by Dianel Collins.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarDaniel Collins <solemnwarning@solemnwarning.no-ip.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      71982a40
    • Tejun Heo's avatar
      bio: use bio_kmalloc() in copy/map functions · a9e9dc24
      Tejun Heo authored
      Impact: remove possible deadlock condition
      
      There is no reason to use mempool backed allocation for map functions.
      Also, because kern mapping is used inside LLDs (e.g. for EH), using
      mempool backed allocation can lead to deadlock under extreme
      conditions (mempool already consumed by the time a request reached EH
      and requests are blocked on EH).
      
      Switch copy/map functions to bio_kmalloc().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      a9e9dc24
    • Tejun Heo's avatar
      bio: fix bio_kmalloc() · 451a9ebf
      Tejun Heo authored
      Impact: fix bio_kmalloc() and its destruction path
      
      bio_kmalloc() was broken in two ways.
      
      * bvec_alloc_bs() first allocates bvec using kmalloc() and then
        ignores it and allocates again like non-kmalloc bvecs.
      
      * bio_kmalloc_destructor() didn't check for and free bio integrity
        data.
      
      This patch fixes the above problems.  kmalloc patch is separated out
      from bio_alloc_bioset() and allocates the requested number of bvecs as
      inline bvecs.
      
      * bio_alloc_bioset() no longer takes NULL @bs.  None other than
        bio_kmalloc() used it and outside users can't know how it was
        allocated anyway.
      
      * Define and use BIO_POOL_NONE so that pool index check in
        bvec_free_bs() triggers if inline or kmalloc allocated bvec gets
        there.
      
      * Relocate destructors on top of each allocation function so that how
        they're used is more clear.
      
      Jens Axboe suggested allocating bvecs inline.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      451a9ebf
    • Tejun Heo's avatar
      block: fix queue bounce limit setting · cd0aca2d
      Tejun Heo authored
      Impact: don't set GFP_DMA in q->bounce_gfp unnecessarily
      
      All DMA address limits are expressed in terms of the last addressable
      unit (byte or page) instead of one plus that.  However, when
      determining bounce_gfp for 64bit machines in blk_queue_bounce_limit(),
      it compares the specified limit against 0x100000000UL to determine
      whether it's below 4G ending up falsely setting GFP_DMA in
      q->bounce_gfp.
      
      As DMA zone is very small on x86_64, this makes larger SG_IO transfers
      very eager to trigger OOM killer.  Fix it.  While at it, rename the
      parameter to @dma_mask for clarity and convert comment to proper
      winged style.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      cd0aca2d
    • Tejun Heo's avatar
      block: fix SG_IO vector request data length handling · 25636e28
      Tejun Heo authored
      Impact: fix SG_IO behavior such that it matches the documentation
      
      SG_IO howto says that if ->dxfer_len and sum of iovec disagress, the
      shorter one wins.  However, the current implementation returns -EINVAL
      for such cases.  Trim iovc if it's longer than ->dxfer_len.
      
      This patch uses iov_*() helpers which take struct iovec * by casting
      struct sg_iovec * to it.  sg_iovec is always identical to iovec and
      this will be further cleaned up with later patches.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      25636e28
    • Tejun Heo's avatar
      scatterlist: make sure sg_miter_next() doesn't return 0 sized mappings · 23c560a9
      Tejun Heo authored
      Impact: fix not-so-critical but annoying bug
      
      sg_miter_next() returns 0 sized mapping if there is an zero sized sg
      entry in the list or at the end of each iteration.  As the users
      always check the ->length field, this bug shouldn't be critical other
      than causing unnecessary iteration.
      
      Fix it.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      23c560a9
    • Linus Torvalds's avatar
      Linux 2.6.30-rc3 · 09106974
      Linus Torvalds authored
      09106974
    • Arjan van de Ven's avatar
      driver synchronization: make scsi_wait_scan more advanced · d4d5291c
      Arjan van de Ven authored
      There is currently only one way for userspace to say "wait for my storage
      device to get ready for the modules I just loaded": to load the
      scsi_wait_scan module. Expectations of userspace are that once this
      module is loaded, all the (storage) devices for which the drivers
      were loaded before the module load are present.
      
      Now, there are some issues with the implementation, and the async
      stuff got caught in the middle of this: The existing code only
      waits for the scsy async probing to finish, but it did not take
      into account at all that probing might not have begun yet.
      (Russell ran into this problem on his computer and the fix works for him)
      
      This patch fixes this more thoroughly than the previous "fix", which
      had some bad side effects (namely, for kernel code that wanted to wait for
      the scsi scan it would also do an async sync, which would deadlock if you did
      it from async context already.. there's a report about that on lkml):
      The patch makes the module first wait for all device driver probes, and then it
      will wait for the scsi parallel scan to finish.
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Tested-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d4d5291c
    • Jonathan Corbet's avatar
      Trivial: fix a typo in slow-work.h · 5dd559f0
      Jonathan Corbet authored
      Fix a comment typo in slow-work.h
      
      ...a trivial mistake, but it will mess up kerneldoc if nothing else.
      Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5dd559f0
    • David Howells's avatar
      PERCPU: Collect the DECLARE/DEFINE declarations together · 5028eaa9
      David Howells authored
      Collect the DECLARE/DEFINE declarations together in linux/percpu-defs.h so
      that they're in one place, and give them descriptive comments, particularly
      the SHARED_ALIGNED variant.
      
      It would be nice to collect these in linux/percpu.h, but that's not possible
      without sorting out the severe #include recursion between the x86 arch headers
      and the general headers (and possibly other arches too).
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5028eaa9
    • David Howells's avatar
      FRV: Fix the section attribute on UP DECLARE_PER_CPU() · 9b8de747
      David Howells authored
      In non-SMP mode, the variable section attribute specified by DECLARE_PER_CPU()
      does not agree with that specified by DEFINE_PER_CPU().  This means that
      architectures that have a small data section references relative to a base
      register may throw up linkage errors due to too great a displacement between
      where the base register points and the per-CPU variable.
      
      On FRV, the .h declaration says that the variable is in the .sdata section, but
      the .c definition says it's actually in the .data section.  The linker throws
      up the following errors:
      
      kernel/built-in.o: In function `release_task':
      kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o
      kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o
      
      To fix this, DECLARE_PER_CPU() should simply apply the same section attribute
      as does DEFINE_PER_CPU().  However, this is made slightly more complex by
      virtue of the fact that there are several variants on DEFINE, so these need to
      be matched by variants on DECLARE.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9b8de747
  3. 21 Apr, 2009 21 commits