Commits · a1e4ccb990447df0fe83d164d9a7bc2e6c4b7db7 · Kirill Smelkov / linux

18 Jun, 2012 2 commits

KVM: Introduce __KVM_HAVE_IRQ_LINE · a1e4ccb9

Christoffer Dall authored Jun 15, 2012

This is a preparatory patch for the KVM/ARM implementation. KVM/ARM will use
the KVM_IRQ_LINE ioctl, which is currently conditional on
__KVM_HAVE_IOAPIC, but ARM obviously doesn't have any IOAPIC support and we
need a separate define.
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

a1e4ccb9

KVM: use KVM_CAP_IRQ_ROUTING to protect the routing related code · 9900b4b4

Marc Zyngier authored Jun 15, 2012

The KVM code sometimes uses CONFIG_HAVE_KVM_IRQCHIP to protect
code that is related to IRQ routing, which not all in-kernel
irqchips may support.

Use KVM_CAP_IRQ_ROUTING instead.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

9900b4b4

13 Jun, 2012 4 commits

KVM: trace events: update list of exit reasons · dcce0489

Cornelia Huck authored Jun 11, 2012

The list of exit reasons for the kvm_userspace_exit event was
missing recent additions; bring it into sync again.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

dcce0489

KVM: s390: Perform early event mask processing during boot · cd183459

Heinz Graalfs authored Jun 11, 2012

For processing under KVM it is required to detect
the actual SCLP console type in order to set it as
preferred console.
Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

cd183459

KVM: s390: Set CPU in stopped state on initial cpu reset · 61bde82c

Christian Borntraeger authored Jun 11, 2012

The initial cpu reset sets the cpu in the stopped state.
Several places check for the cpu state (e.g. sigp set prefix) and
not setting the STOPPED state triggered errors with newer guest
kernels after reboot.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

61bde82c

KVM: x86: change PT_FIRST_AVAIL_BITS_SHIFT to avoid conflict with EPT Dirty bit · 00763e41

Xudong Hao authored Jun 07, 2012

EPT Dirty bit use bit 9 as Intel SDM definition, to avoid conflict, change
PT_FIRST_AVAIL_BITS_SHIFT to 10.
Signed-off-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

00763e41

12 Jun, 2012 1 commit

KVM: MMU: Remove unused parameter from mmu_memory_cache_alloc() · 80feb89a

Takuya Yoshikawa authored May 29, 2012

Size is not needed to return one from pre-allocated objects.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

80feb89a

06 Jun, 2012 4 commits

Merge branch 'for-upstream' of git://github.com/agraf/linux-2.6 into next · 25e531a9

Avi Kivity authored Jun 06, 2012

Alex says:

"Changes this time include:

  - Generalize KVM_GUEST support to overall ePAPR code
  - Fix reset for Book3S HV
  - Fix machine check deferral when CONFIG_KVM_GUEST=y
  - Add support for BookE register DECAR"

* 'for-upstream' of git://github.com/agraf/linux-2.6:
  KVM: PPC: Not optimizing MSR_CE and MSR_ME with paravirt.
  KVM: PPC: booke: Added DECAR support
  KVM: PPC: Book3S HV: Make the guest hash table size configurable
  KVM: PPC: Factor out guest epapr initialization
Signed-off-by: Avi Kivity <avi@redhat.com>

25e531a9

KVM: disable uninitialized var warning · 79f702a6

Michael S. Tsirkin authored Jun 03, 2012

I see this in 3.5-rc1:

arch/x86/kvm/mmu.c: In function ‘kvm_test_age_rmapp’:
arch/x86/kvm/mmu.c:1271: warning: ‘iter.desc’ may be used uninitialized in this function

The line in question was introduced by commit
1e3f42f0

 static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
                              unsigned long data)
 {
-       u64 *spte;
+       u64 *sptep;
+       struct rmap_iterator iter;   <- line 1271
        int young = 0;

        /*

The reason I think is that the compiler assumes that
the rmap value could be 0, so

static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator
*iter)
{
        if (!rmap)
                return NULL;

        if (!(rmap & 1)) {
                iter->desc = NULL;
                return (u64 *)rmap;
        }

        iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
        iter->pos = 0;
        return iter->desc->sptes[iter->pos];
}

will not initialize iter.desc, but the compiler isn't
smart enough to see that

        for (sptep = rmap_get_first(*rmapp, &iter); sptep;
             sptep = rmap_get_next(&iter)) {

will immediately exit in this case.
I checked by adding
        if (!*rmapp)
                goto out;
on top which is clearly equivalent but disables the warning.

This patch uses uninitialized_var to disable the warning without
increasing code size.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

79f702a6

KVM: Cleanup the kvm_print functions and introduce pr_XX wrappers · a737f256

Christoffer Dall authored Jun 03, 2012

Introduces a couple of print functions, which are essentially wrappers
around standard printk functions, with a KVM: prefix.

Functions introduced or modified are:
 - kvm_err(fmt, ...)
 - kvm_info(fmt, ...)
 - kvm_debug(fmt, ...)
 - kvm_pr_unimpl(fmt, ...)
 - pr_unimpl(vcpu, fmt, ...) -> vcpu_unimpl(vcpu, fmt, ...)
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

a737f256

KVM: s390: Change maintainer · 4ae57b6c

Christian Borntraeger authored Jun 05, 2012

Since Carsten is now working on a different project, Cornelia will
work as the 2nd s390/kvm maintainer.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
CC: Carsten Otte <cotte@de.ibm.com>
CC: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

4ae57b6c

05 Jun, 2012 9 commits

KVM: VMX: Fix KVM_SET_SREGS with big real mode segments · b246dd5d

Orit Wasserman authored May 31, 2012

For example migration between Westmere and Nehelem hosts, caught in big real mode.

The code that fixes the segments for real mode guest was moved from enter_rmode
to vmx_set_segments. enter_rmode calls vmx_set_segments for each segment.
Signed-off-by: Orit Wasserman <owasserm@rehdat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

b246dd5d

KVM: MMU: do not iterate over all VMs in mmu_shrink() · 19526396

Gleb Natapov authored Jun 04, 2012

mmu_shrink() needlessly iterates over all VMs even though it will not
attempt to free mmu pages from more than one on them. Fix that and also
check used mmu pages count outside of VM lock to skip inactive VMs faster.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

19526396

KVM: ia64: Mark ia64 KVM as BROKEN · a6bb7929

Avi Kivity authored May 17, 2012

Practically all patches to ia64 KVM are build fixes; numerous warnings remain;
the last patch from the maintainer was committed more than three years ago.  It
is clear that no one is using this thing.

Mark as BROKEN to ensure people don't get hit by pointless build problems.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

a6bb7929

KVM: VMX: Use EPT Access bit in response to memory notifiers · 3f6d8c8a

Xudong Hao authored May 22, 2012

Signed-off-by: Haitao Shan <haitao.shan@intel.com>
Signed-off-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

3f6d8c8a

KVM: VMX: Enable EPT A/D bits if supported by turning on relevant bit in EPTP · b38f9934

Xudong Hao authored May 28, 2012

In EPT page structure entry, Enable EPT A/D bits if processor supported.
Signed-off-by: Haitao Shan <haitao.shan@intel.com>
Signed-off-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

b38f9934

KVM: VMX: Add parameter to control A/D bits support, default is on · 83c3a331

Xudong Hao authored May 28, 2012

Add kernel parameter to control A/D bits support, it's on by default.
Signed-off-by: Haitao Shan <haitao.shan@intel.com>
Signed-off-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

83c3a331

KVM: VMX: Add EPT A/D bits definitions · aaf07bc2

Xudong Hao authored May 28, 2012

Signed-off-by: Haitao Shan <haitao.shan@intel.com>
Signed-off-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

aaf07bc2

KVM: Avoid wasting pages for small lpage_info arrays · c1a7b32a

Takuya Yoshikawa authored May 20, 2012

lpage_info is created for each large level even when the memory slot is
not for RAM. This means that when we add one slot for a PCI device, we
end up allocating at least KVM_NR_PAGE_SIZES - 1 pages by vmalloc().

To make things worse, there is an increasing number of devices which
would result in more pages being wasted this way.

This patch mitigates this problem by using kvm_kvzalloc().
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>

c1a7b32a

KVM: Separate out dirty_bitmap allocation code as kvm_kvzalloc() · 92eca8fa

Takuya Yoshikawa authored May 20, 2012

Will be used for lpage_info allocation later.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>

92eca8fa

04 Jun, 2012 12 commits

Pull 'for-linus' branches of git://git.kernel.org/pub/scm/linux/kernel/git/viro/{signal,vfs} · 99becf13

Linus Torvalds authored Jun 04, 2012

Pull signal and vfs compile breakage fixes from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  fixups for signal breakage

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  nommu: fix compilation of nommu.c

99becf13

Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 · bf2785a8

Linus Torvalds authored Jun 04, 2012

Pull cifs fixes from Steve French.

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Move get_next_mid to ops struct
  CIFS: Make accessing is_valid_oplock/dump_detail ops struct field safe
  CIFS: Improve identation in cifs_unlock_range
  CIFS: Fix possible wrong memory allocation

bf2785a8

fixups for signal breakage · 03240b27

Al Viro authored Jun 04, 2012

Obvious brainos spotted by Geert.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

03240b27

nommu: fix compilation of nommu.c · ad1ed293

Greg Ungerer authored Jun 04, 2012

Compiling 3.5-rc1 for nommu targets gives:

  CC      mm/nommu.o
mm/nommu.c: In function ‘sys_mmap_pgoff’:
mm/nommu.c:1489:2: error: ‘ret’ undeclared (first use in this function)
mm/nommu.c:1489:2: note: each undeclared identifier is reported only once for each function it appears in

It is trivially fixed by replacing 'ret' with the local variable that is
already defined for the return value 'retval'.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ad1ed293

Merge tag 'stable/frontswap.v16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm · a3fe778c

Linus Torvalds authored Jun 04, 2012

Pull frontswap feature from Konrad Rzeszutek Wilk:
 "Frontswap provides a "transcendent memory" interface for swap pages.
  In some environments, dramatic performance savings may be obtained
  because swapped pages are saved in RAM (or a RAM-like device) instead
  of a swap disk.  This tag provides the basic infrastructure along with
  some changes to the existing backends."

Fix up trivial conflict in mm/Makefile due to removal of swap token code
changing a line next to the new frontswap entry.

This pull request came in before the merge window even opened, it got
delayed to after the merge window by me just wanting to make sure it had
actual users.  Apparently IBM is using this on their embedded side, and
Jan Beulich says that it's already made available for SLES and OpenSUSE
users.

Also acked by Rik van Riel, and Konrad points to other people liking it
too.  So in it goes.

By Dan Magenheimer (4) and Konrad Rzeszutek Wilk (2)
via Konrad Rzeszutek Wilk
* tag 'stable/frontswap.v16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm:
  frontswap: s/put_page/store/g s/get_page/load
  MAINTAINER: Add myself for the frontswap API
  mm: frontswap: config and doc files
  mm: frontswap: core frontswap functionality
  mm: frontswap: core swap subsystem hooks and headers
  mm: frontswap: add frontswap header file

a3fe778c

Merge branches 'irq-urgent-for-linus' and 'smp-hotplug-for-linus' of... · 9171c670

Linus Torvalds authored Jun 04, 2012

Merge branches 'irq-urgent-for-linus' and 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq and smpboot updates from Thomas Gleixner:
 "Just cleanup patches with no functional change and a fix for suspend
  issues."

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Introduce irq_do_set_affinity() to reduce duplicated code
  genirq: Add IRQS_PENDING for nested and simple irq

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  smpboot, idle: Fix comment mismatch over idle_threads_init()
  smpboot, idle: Optimize calls to smp_processor_id() in idle_threads_init()

9171c670

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c22072bd

Linus Torvalds authored Jun 04, 2012

Pull timer updates from Thomas Gleixner:
 "The clocksource driver is pure hardware enablement and the skew option
  is default off, well tested and non dangerous."

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick: Move skew_tick option into the HIGH_RES_TIMER section
  clocksource: em_sti: Add DT support
  clocksource: em_sti: Emma Mobile STI driver
  clockevents: Make clockevents_config() a global symbol
  tick: Add tick skew boot option

c22072bd

vfs: Fix /proc/<tid>/fdinfo/<fd> file handling · 0640113b

Linus Torvalds authored Jun 04, 2012

Cyrill Gorcunov reports that I broke the fdinfo files with commit
30a08bf2 ("proc: move fd symlink i_mode calculations into
tid_fd_revalidate()"), and he's quite right.

The tid_fd_revalidate() function is not just used for the <tid>/fd
symlinks, it's also used for the <tid>/fdinfo/<fd> files, and the
permission model for those are different.

So do the dynamic symlink permission handling just for symlinks, making
the fdinfo files once more appear as the proper regular files they are.

Of course, Al Viro argued (probably correctly) that we shouldn't do the
symlink permission games at all, and make the symlinks always just be
the normal 'lrwxrwxrwx'. That would have avoided this issue too, but
since somebody noticed that the permissions had changed (which was the
reason for that original commit 30a08bf2 in the first place), people
do apparently use this feature.

[ Basically, you can use the symlink permission data as a cheap "fdinfo"
replacement, since you see whether the file is open for reading and/or
writing by just looking at st_mode of the symlink. So the feature
does make sense, even if the pain it has caused means we probably
shouldn't have done it to begin with. ]
Reported-and-tested-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0640113b

gpio/samsung: fix the typo 'exynos5_xxx' instead of 'exonys5_xxx' · 5041caa4

Kukjin Kim authored Jun 04, 2012

Should be 'exynos5_xxx' instead of 'exonys5_xxx'.

It happened at the commit 30b84288 ("Merge tag 'soc2' of
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc")
during v3.5 merge window.
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
[ My bad  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

5041caa4

Merge branch 'pm-acpi' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 4d578573

Linus Torvalds authored Jun 03, 2012

Pull some left-over PM patches from Rafael J. Wysocki.

* 'pm-acpi' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / PM: Make acpi_pm_device_sleep_state() follow the specification
  ACPI / PM: Make __acpi_bus_get_power() cover D3cold correctly
  ACPI / PM: Fix error messages in drivers/acpi/bus.c
  rtc-cmos / PM: report wakeup event on ACPI RTC alarm
  ACPI / PM: Generate wakeup events on fixed power button

4d578573

Revert "mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks" · 68e3e926

Linus Torvalds authored Jun 03, 2012

This reverts commit 5ceb9ce6.

That commit seems to be the cause of the mm compation list corruption
issues that Dave Jones reported.  The locking (or rather, absense
there-of) is dubious, as is the use of the 'page' variable once it has
been found to be outside the pageblock range.

So revert it for now, we can re-visit this for 3.6.  If we even need to:
as Minchan Kim says, "The patch wasn't a bug fix and even test workload
was very theoretical".
Reported-and-tested-by: Dave Jones <davej@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

68e3e926

mm: fix warning in __set_page_dirty_nobuffers · 752dc185

Hugh Dickins authored Jun 02, 2012

New tmpfs use of !PageUptodate pages for fallocate() is triggering the
WARNING: at mm/page-writeback.c:1990 when __set_page_dirty_nobuffers()
is called from migrate_page_copy() for compaction.

It is anomalous that migration should use __set_page_dirty_nobuffers()
on an address_space that does not participate in dirty and writeback
accounting; and this has also been observed to insert surprising dirty
tags into a tmpfs radix_tree, despite tmpfs not using tags at all.

We should probably give migrate_page_copy() a better way to preserve the
tag and migrate accounting info, when mapping_cap_account_dirty(). But
that needs some more work: so in the interim, avoid the warning by using
a simple SetPageDirty on PageSwapBacked pages.
Reported-and-tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

752dc185

03 Jun, 2012 3 commits

vfs: move inode stat information closer together · 2f9d3df8

Linus Torvalds authored Jun 03, 2012

The comment above it says "Stat data, not accessed from path walking",
but in fact some of inode fields we use for the common stat data was way
down at the end of the inode, causing unnecessary cache misses for the
common stat operations.

The inode structure is pretty big, and this can change padding depending
on field width, but at least on the common 64-bit configurations this
doesn't change the size.  Some of our inode layout has historically been
to tro to avoid unnecessary padding fields, but cache locality is at
least as important for layout, if not more.

Noticed by looking at kernel profiles, and noticing that the "i_blkbits"
access stood out like a sore thumb.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2f9d3df8

Linux 3.5-rc1 · f8f5701b
Linus Torvalds authored Jun 02, 2012

f8f5701b

Merge tag 'dm-3.5-changes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm · 912afc36

Linus Torvalds authored Jun 02, 2012

Pull device-mapper updates from Alasdair G Kergon:
 "Improve multipath's retrying mechanism in some defined circumstances
  and provide a simple reserve/release mechanism for userspace tools to
  access thin provisioning metadata while the pool is in use."

* tag 'dm-3.5-changes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
  dm thin: provide userspace access to pool metadata
  dm thin: use slab mempools
  dm mpath: allow ioctls to trigger pg init
  dm mpath: delay retry of bypassed pg
  dm mpath: reduce size of struct multipath

912afc36

02 Jun, 2012 5 commits

dm thin: provide userspace access to pool metadata · cc8394d8

Joe Thornber authored Jun 03, 2012

This patch implements two new messages that can be sent to the thin
pool target allowing it to take a snapshot of the _metadata_.  This,
read-only snapshot can be accessed by userland, concurrently with the
live target.

Only one metadata snapshot can be held at a time.  The pool's status
line will give the block location for the current msnap.

Since version 0.1.5 of the userland thin provisioning tools, the
thin_dump program displays the msnap as follows:

    thin_dump -m <msnap root> <metadata dev>

Available here: https://github.com/jthornber/thin-provisioning-tools

Now that userland can access the metadata we can do various things
that have traditionally been kernel side tasks:

     i) Incremental backups.

     By using metadata snapshots we can work out what blocks have
     changed over time.  Combined with data snapshots we can ensure
     the data doesn't change while we back it up.

     A short proof of concept script can be found here:

     https://github.com/jthornber/thinp-test-suite/blob/master/incremental_backup_example.rb

     ii) Migration of thin devices from one pool to another.

     iii) Merging snapshots back into an external origin.

     iv) Asyncronous replication.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

cc8394d8

dm thin: use slab mempools · a24c2569

Mike Snitzer authored Jun 03, 2012

Use dedicated caches prefixed with a "dm_" name rather than relying on
kmalloc mempools backed by generic slab caches so the memory usage of
thin provisioning (and any leaks) can be accounted for independently.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

a24c2569

dm mpath: allow ioctls to trigger pg init · 35991652

Mikulas Patocka authored Jun 03, 2012

After the failure of a group of paths, any alternative paths that
need initialising do not become available until further I/O is sent to
the device.  Until this has happened, ioctls return -EAGAIN.

With this patch, new paths are made available in response to an ioctl
too.  The processing of the ioctl gets delayed until this has happened.

Instead of returning an error, we submit a work item to kmultipathd
(that will potentially activate the new path) and retry in ten
milliseconds.

Note that the patch doesn't retry an ioctl if the ioctl itself fails due
to a path failure.  Such retries should be handled intelligently by the
code that generated the ioctl in the first place, noting that some SCSI
commands should not be retried because they are not idempotent (XOR write
commands).  For commands that could be retried, there is a danger that
if the device rejected the SCSI command, the path could be errorneously
marked as failed, and the request would be retried on another path which
might fail too.  It can be determined if the failure happens on the
device or on the SCSI controller, but there is no guarantee that all
SCSI drivers set these flags correctly.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

35991652

dm mpath: delay retry of bypassed pg · f220fd4e

Mike Christie authored Jun 03, 2012

If I/O needs retrying and only bypassed priority groups are available,
set the pg_init_delay_retry flag to wait before retrying.

If, for example, the reason for the bypass is that the controller is
getting reset or there is a firmware upgrade happening, retrying right
away would cause a flood of log messages and retries for what could be a
few seconds or even several minutes.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

f220fd4e

dm mpath: reduce size of struct multipath · 1fbdd2b3

Mike Snitzer authored Jun 03, 2012

Move multipath structure's 'lock' and 'queue_size' members to eliminate
two 4-byte holes.  Also use a bit within a single unsigned int for each
existing flag (saves 8-bytes).  This allows future flags to be added
without each consuming an unsigned int.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

1fbdd2b3