Commits · 17bd158006a33615270f9dba15c62f49bd447435 · nexedi / linux

24 Oct, 2010 40 commits

KVM: PPC: Implement Level interrupts on Book3S · 17bd1580

Alexander Graf authored Aug 30, 2010

The current interrupt logic is just completely broken. We get a notification
from user space, telling us that an interrupt is there. But then user space
expects us that we just acknowledge an interrupt once we deliver it to the
guest.

This is not how real hardware works though. On real hardware, the interrupt
controller pulls the external interrupt line until it gets notified that the
interrupt was received.

So in reality we have two events: pulling and letting go of the interrupt line.

To maintain backwards compatibility, I added a new request for the pulling
part. The letting go part was implemented earlier already.

With this in place, we can now finally start guests that do not randomly stall
and stop to work at random times.

This patch implements above logic for Book3S.
Signed-off-by: Alexander Graf <agraf@suse.de>

17bd1580

KVM: PPC: Enable napping only for Book3s_64 · 591bd8e7

Alexander Graf authored Aug 17, 2010

Before I incorrectly enabled napping also for BookE, which would result in
needless dcache flushes. Since we only need to force enable napping on
Book3s_64 because it doesn't go into MSR_POW otherwise, we can just #ifdef
that code to this particular platform.
Reported-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

591bd8e7

KVM: PPC: allow ppc440gp to pass the compatibility check · ebc65874

Hollis Blanchard authored Aug 07, 2010

Match only the first part of cur_cpu_spec->platform.

440GP (the first 440 processor) is identified by the string "ppc440gp", while
all later 440 processors use simply "ppc440".
Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

ebc65874

KVM: PPC: fix compilation of "dump tlbs" debug function · 0b3bafc8

Hollis Blanchard authored Aug 07, 2010

Missing local variable.
Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

0b3bafc8

KVM: PPC: initialize IVORs in addition to IVPR · 082decf2

Hollis Blanchard authored Aug 07, 2010

Developers can now tell at a glace the exact type of the premature interrupt,
instead of just knowing that there was some premature interrupt.
Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

082decf2

KVM: PPC: Don't put MSR_POW in MSR · 296c19d0

Alexander Graf authored Aug 15, 2010

On Book3S a mtmsr with the MSR_POW bit set indicates that the OS is in
idle and only needs to be waked up on the next interrupt.

Now, unfortunately we let that bit slip into the stored MSR value which
is not what the real CPU does, so that we ended up executing code like
this:

	r = mfmsr();
	/* r containts MSR_POW */
	mtmsr(r | MSR_EE);

This obviously breaks, as we're going into idle mode in code sections that
don't expect to be idling.

This patch masks MSR_POW out of the stored MSR value on wakeup, making
guests happy again.
Signed-off-by: Alexander Graf <agraf@suse.de>

296c19d0

KVM: PPC: Implement correct SID mapping on Book3s_32 · 8b6db3bc

Alexander Graf authored Aug 15, 2010

Up until now we were doing segment mappings wrong on Book3s_32. For Book3s_64
we were using a trick where we know that a single mmu_context gives us 16 bits
of context ids.

The mm system on Book3s_32 instead uses a clever algorithm to distribute VSIDs
across the available range, so a context id really only gives us 16 available
VSIDs.

To keep at least a few guest processes in the SID shadow, let's map a number of
contexts that we can use as VSID pool. This makes the code be actually correct
and shouldn't hurt performance too much.
Signed-off-by: Alexander Graf <agraf@suse.de>

8b6db3bc

KVM: PPC: Force enable nap on KVM · ad087376

Alexander Graf authored Aug 17, 2010

There are some heuristics in the PPC power management code that try to find
out if the particular hardware we're running on supports proper power management
or just hangs the machine when going into nap mode.

Since we know that KVM is safe with nap, let's force enable it in the PV code
once we're certain that we are on a KVM VM.
Signed-off-by: Alexander Graf <agraf@suse.de>

ad087376

KVM: PPC: Make PV mtmsrd L=1 work with r30 and r31 · df08bd10

Alexander Graf authored Aug 05, 2010

We had an arbitrary limitation in mtmsrd L=1 that kept us from using r30 and
r31 as input registers. Let's get rid of that and get more potential speedups!
Signed-off-by: Alexander Graf <agraf@suse.de>

df08bd10

KVM: PPC: Update int_pending also on dequeue · 9ee18b1e

Alexander Graf authored Aug 05, 2010

When having a decrementor interrupt pending, the dequeuing happens manually
through an mtdec instruction. This instruction simply calls dequeue on that
interrupt, so the int_pending hint doesn't get updated.

This patch enables updating the int_pending hint also on dequeue, thus
correctly enabling guests to stay in guest contexts more often.
Signed-off-by: Alexander Graf <agraf@suse.de>

9ee18b1e

KVM: PPC: Make PV mtmsr work with r30 and r31 · 512ba59e

Alexander Graf authored Aug 05, 2010

So far we've been restricting ourselves to r0-r29 as registers an mtmsr
instruction could use. This was bad, as there are some code paths in
Linux actually using r30.

So let's instead handle all registers gracefully and get rid of that
stupid limitation
Signed-off-by: Alexander Graf <agraf@suse.de>

512ba59e

KVM: PPC: Add mtsrin PV code · cbe487fa

Alexander Graf authored Aug 03, 2010

This is the guest side of the mtsr acceleration. Using this a guest can now
call mtsrin with almost no overhead as long as it ensures that it only uses
it with (MSR_IR|MSR_DR) == 0. Linux does that, so we're good.
Signed-off-by: Alexander Graf <agraf@suse.de>

cbe487fa

KVM: PPC: Put segment registers in shared page · df1bfa25

Alexander Graf authored Aug 03, 2010

Now that the actual mtsr doesn't do anything anymore, we can move the sr
contents over to the shared page, so a guest can directly read and write
its sr contents from guest context.
Signed-off-by: Alexander Graf <agraf@suse.de>

df1bfa25

KVM: PPC: Interpret SR registers on demand · 8e865178

Alexander Graf authored Aug 03, 2010

Right now we're examining the contents of Book3s_32's segment registers when
the register is written and put the interpreted contents into a struct.

There are two reasons this is bad. For starters, the struct has worse real-time
performance, as it occupies more ram. But the more important part is that with
segment registers being interpreted from their raw values, we can put them in
the shared page, allowing guests to mess with them directly.

This patch makes the internal representation of SRs be u32s.
Signed-off-by: Alexander Graf <agraf@suse.de>

8e865178

KVM: PPC: Move BAT handling code into spr handler · c1c88e2f

Alexander Graf authored Aug 02, 2010

The current approach duplicates the spr->bat finding logic and makes it harder
to reuse the actually used variables. So let's move everything down to the spr
handler.
Signed-off-by: Alexander Graf <agraf@suse.de>

c1c88e2f

KVM: PPC: Add feature bitmap for magic page · 7508e16c

Alexander Graf authored Aug 03, 2010

We will soon add SR PV support to the shared page, so we need some
infrastructure that allows the guest to query for features KVM exports.

This patch adds a second return value to the magic mapping that
indicated to the guest which features are available.
Signed-off-by: Alexander Graf <agraf@suse.de>

7508e16c

KVM: PPC: Remove unused define · cb24c508

Alexander Graf authored Aug 02, 2010

The define VSID_ALL is unused. Let's remove it.
Signed-off-by: Alexander Graf <agraf@suse.de>

cb24c508

KVM: PPC: Revert "KVM: PPC: Use kernel hash function" · b9877ce2

Alexander Graf authored Aug 02, 2010

It turns out the in-kernel hash function is sub-optimal for our subtle
hash inputs where every bit is significant. So let's revert to the original
hash functions.

This reverts commit 05340ab4f9a6626f7a2e8f9fe5397c61d494f445.
Signed-off-by: Alexander Graf <agraf@suse.de>

b9877ce2

KVM: PPC: Move slb debugging to tracepoints · 928d78be

Alexander Graf authored Aug 02, 2010

This patch moves debugging printks for shadow SLB debugging over to tracepoints.
Signed-off-by: Alexander Graf <agraf@suse.de>

928d78be

KVM: PPC: Make invalidation code more reliable · e7c1d14e

Alexander Graf authored Aug 02, 2010

There is a race condition in the pte invalidation code path where we can't
be sure if a pte was invalidated already. So let's move the spin lock around
to get rid of the race.
Signed-off-by: Alexander Graf <agraf@suse.de>

e7c1d14e

KVM: PPC: Don't flush PTEs on NX/RO hit · 2e602847

Alexander Graf authored Aug 02, 2010

When hitting a no-execute or read-only data/inst storage interrupt we were
flushing the respective PTE so we're sure it gets properly overwritten next.

According to the spec, this is unnecessary though. The guest issues a tlbie
anyways, so we're safe to just keep the PTE around and have it manually removed
from the guest, saving us a flush.
Signed-off-by: Alexander Graf <agraf@suse.de>

2e602847

KVM: PPC: Preload magic page when in kernel mode · 4cb6b7ea

Alexander Graf authored Aug 02, 2010

When the guest jumps into kernel mode and has the magic page mapped, theres a
very high chance that it will also use it. So let's detect that scenario and
map the segment accordingly.
Signed-off-by: Alexander Graf <agraf@suse.de>

4cb6b7ea

KVM: PPC: Add tracepoints for generic spte flushes · c60b4cf7

Alexander Graf authored Aug 02, 2010

The different ways of flusing shadow ptes have their own debug prints which use
stupid old printk.

Let's move them to tracepoints, making them easier available, faster and
possible to activate on demand
Signed-off-by: Alexander Graf <agraf@suse.de>

c60b4cf7

KVM: PPC: Fix sid map search after flush · c22c3196

Alexander Graf authored Aug 02, 2010

After a flush the sid map contained lots of entries with 0 for their gvsid and
hvsid value. Unfortunately, 0 can be a real value the guest searches for when
looking up a vsid so it would incorrectly find the host's 0 hvsid mapping which
doesn't belong to our sid space.

So let's also check for the valid bit that indicated that the sid we're
looking at actually contains useful data.
Signed-off-by: Alexander Graf <agraf@suse.de>

c22c3196

KVM: PPC: Move pte invalidate debug code to tracepoint · 8696ee43

Alexander Graf authored Aug 02, 2010

This patch moves the SPTE flush debug printk over to tracepoints.
Signed-off-by: Alexander Graf <agraf@suse.de>

8696ee43

KVM: PPC: Add tracepoint for generic mmu map · 4c4eea77

Alexander Graf authored Aug 02, 2010

This patch moves the generic mmu map debugging over to tracepoints.
Signed-off-by: Alexander Graf <agraf@suse.de>

4c4eea77

KVM: PPC: Move book3s_64 mmu map debug print to trace point · 82fdee7b
Alexander Graf authored Aug 02, 2010
```
This patch moves Book3s MMU debugging over to tracepoints.
Signed-off-by: Alexander Graf <agraf@suse.de>
```
82fdee7b

KVM: PPC: Move EXIT_DEBUG partially to tracepoints · bed1ed98

Alexander Graf authored Aug 02, 2010

We have a debug printk on every exit that is usually #ifdef'ed out. Using
tracepoints makes a lot more sense here though, as they can be dynamically
enabled.

This patch converts the most commonly used debug printks of EXIT_DEBUG to
tracepoints.
Signed-off-by: Alexander Graf <agraf@suse.de>

bed1ed98

KVM: ia64: define kvm_lapic_enabled() to fix a compile error · 55438cc7

Takuya Yoshikawa authored Sep 02, 2010

The following patch

  commit 57ce1659316f4ca298919649f9b1b55862ac3826
  KVM: x86: In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's

ignored the fact that kvm_irq_delivery_to_apic() was also used by ia64.

We define kvm_lapic_enabled() to fix a compile error caused by this.
This will have the same effect as reverting the problematic patch for ia64.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>

55438cc7

KVM: MMU: lower the aduit frequency · 30644b90

Xiao Guangrong authored Aug 30, 2010

The audit is very high overhead, so we need lower the frequency to assure
the guest is running.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

30644b90

KVM: MMU: improve spte audit · eb259186

Xiao Guangrong authored Aug 30, 2010

Both audit_mappings() and audit_sptes_have_rmaps() need to walk vcpu's page
table, so we can do these checking in a spte walking
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

eb259186

KVM: MMU: improve active sp audit · 49edf878

Xiao Guangrong authored Aug 30, 2010

Both audit_rmap() and audit_write_protection() need to walk all active sp, so
we can do these checking in a sp walking
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

49edf878

KVM: MMU: move audit to a separate file · 2f4f3372

Xiao Guangrong authored Aug 30, 2010

Move the audit code from arch/x86/kvm/mmu.c to arch/x86/kvm/mmu_audit.c
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

2f4f3372

KVM: MMU: support disable/enable mmu audit dynamicly · 8b1fe17c

Xiao Guangrong authored Aug 30, 2010

Add a r/w module parameter named 'mmu_audit', it can control audit
enable/disable:

enable:
  echo 1 > /sys/module/kvm/parameters/mmu_audit

disable:
  echo 0 > /sys/module/kvm/parameters/mmu_audit

This patch not change the logic
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

8b1fe17c

KVM: Fix guest kernel crash on MSR_K7_CLK_CTL · 84e0cefa

Jes Sorensen authored Sep 01, 2010

MSR_K7_CLK_CTL is a no longer documented MSR, which is only relevant
on said old AMD CPU models. This change returns the expected value,
which the Linux kernel is expecting to avoid writing back the MSR,
plus it ignores all writes to the MSR.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

84e0cefa

KVM: i8259: Make ICW1 conform to spec · 9ed049c3

Avi Kivity authored Aug 30, 2010

ICW is not a full reset, instead it resets a limited number of registers
in the PIC.  Change ICW1 emulation to only reset those registers.
Signed-off-by: Avi Kivity <avi@redhat.com>

9ed049c3

KVM: x86 emulator: clean up control flow in x86_emulate_insn() · 7d9ddaed

Avi Kivity authored Aug 30, 2010

x86_emulate_insn() is full of things like

    if (rc != X86EMUL_CONTINUE)
        goto done;
    break;

consolidate all of those at the end of the switch statement.
Signed-off-by: Avi Kivity <avi@redhat.com>

7d9ddaed

KVM: x86 emulator: fix group 11 decoding for reg != 0 · a4d4a7c1
Avi Kivity authored Aug 03, 2010
```
These are all undefined.
Signed-off-by: Avi Kivity <avi@redhat.com>
```
a4d4a7c1
KVM: x86 emulator: use single stage decoding for mov instructions · b9eac5f4
Avi Kivity authored Aug 03, 2010
```
Signed-off-by: Avi Kivity <avi@redhat.com>
```
b9eac5f4

KVM: Don't save/restore MSR_IA32_PERF_STATUS · e90aa41e

Avi Kivity authored Sep 01, 2010

It is read/only; restoring it only results in annoying messages.
Signed-off-by: Avi Kivity <avi@redhat.com>

e90aa41e