- 30 Jan, 2008 40 commits
-
-
Avi Kivity authored
By forcing clean huge pages to be read-only, we have separate roles for the shadow of a clean large page and the shadow of a dirty large page. This is necessary because different ptes will be instantiated for the two cases, even for read faults. Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Avi Kivity authored
We must set the bit before the shift, otherwise the wrong bit gets set. Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Avi Kivity authored
This is more consistent with the accessed bit management, and makes the dirty bit available earlier for other purposes. Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Anthony Liguori authored
This time, the biggest change is gpa_to_hpa. The translation of GPA to HPA does not depend on the VCPU state unlike GVA to GPA so there's no need to pass in the kvm_vcpu. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Anthony Liguori authored
Some of the MMU functions take a struct kvm_vcpu even though they affect all VCPUs. This patch cleans up some of them to instead take a struct kvm. This makes things a bit more clear. The main thing that was confusing me was whether certain functions need to be called on all VCPUs. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Carsten Otte authored
Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Izik Eidus authored
Instead of having the kernel allocate memory to the guest, let userspace allocate it and pass the address to the kernel. This is required for s390 support, but also enables features like memory sharing and using hugetlbfs backed memory. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Mike Day authored
Signed-off-by: Mike D. Day <ncmike@ncultra.org> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Rusty Russell authored
Since vcpu->apic is of the correct type, there's not need to cast. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Rusty Russell authored
Move kvm_create_lapic() into kvm_vcpu_init(), rather than having svm and vmx do it. And make it return the error rather than a fairly random -ENOMEM. This also solves the problem that neither svm.c nor vmx.c actually handles the error path properly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Rusty Russell authored
Instead of the asymetry of kvm_free_apic, implement kvm_free_lapic(). And guess what? I found a minor bug: we don't need to hrtimer_cancel() from kvm_main.c, because we do that in kvm_free_apic(). Also: 1) kvm_vcpu_uninit should be the reverse order from kvm_vcpu_init. 2) Don't set apic->regs_page to zero before freeing apic. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Izik Eidus authored
The user is now able to set how many mmu pages will be allocated to the guest. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Izik Eidus authored
Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Izik Eidus authored
When kvm uses user-allocated pages in the future for the guest, we won't be able to use page->private for rmap, since page->rmap is reserved for the filesystem. So we move the rmap base pointers to the memory slot. A side effect of this is that we need to store the gfn of each gpte in the shadow pages, since the memory slot is addressed by gfn, instead of hfn like struct page. Signed-off-by: Izik Eidus <izik@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Avi Kivity authored
Now that smp_call_function_single() knows how to call a function on the current cpu, there's no need to check explicitly. Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Avi Kivity authored
Noted by Eddie Dong. Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Laurent Vivier authored
This patch modifies the management of REX prefix according behavior I saw in Xen 3.1. In Xen, this modification has been introduced by Jan Beulich. http://lists.xensource.com/archives/html/xen-changelog/2007-01/msg00081.htmlSigned-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Laurent Vivier authored
The only valid case is on protected page access, other cases are errors. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Qing He authored
Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Laurent Vivier authored
Remove no_wb, use dst.type = OP_NONE instead, idea stollen from xen-3.1 Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Laurent Vivier authored
Remove _eflags and use directly ctxt->eflags. Caching eflags is not needed as it is restored to vcpu by kvm_main.c:emulate_instruction() from ctxt->eflags only if emulation doesn't fail. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Laurent Vivier authored
To improve readability, move push, writeback, and grp 1a/2/3/4/5/9 emulation parts into functions. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Ryan Harper authored
This patch removes the fault injected when the guest attempts to set reserved bits in cr3. X86 hardware doesn't generate a fault when setting reserved bits. The result of this patch is that vmware-server, running within a kvm guest, boots and runs memtest from an iso. Signed-off-by: Ryan Harper <ryanh@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Avi Kivity authored
When we allow guest page faults to reach the guests directly, we lose the fault tracking which allows us to detect demand paging. So we provide an alternate mechnism by clearing the accessed bit when we set a pte, and checking it later to see if the guest actually used it. Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Avi Kivity authored
There are two classes of page faults trapped by kvm: - host page faults, where the fault is needed to allow kvm to install the shadow pte or update the guest accessed and dirty bits - guest page faults, where the guest has faulted and kvm simply injects the fault back into the guest to handle The second class, guest page faults, is pure overhead. We can eliminate some of it on vmx using the following evil trick: - when we set up a shadow page table entry, if the corresponding guest pte is not present, set up the shadow pte as not present - if the guest pte _is_ present, mark the shadow pte as present but also set one of the reserved bits in the shadow pte - tell the vmx hardware not to trap faults which have the present bit clear With this, normal page-not-present faults go directly to the guest, bypassing kvm entirely. Unfortunately, this trick only works on Intel hardware, as AMD lacks a way to discriminate among page faults based on error code. It is also a little risky since it uses reserved bits which might become unreserved in the future, so a module parameter is provided to disable it. Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Avi Kivity authored
KVM avoids reloading the efer msr when the difference between the guest and host values consist of the long mode bits (which are switched by hardware) and the NX bit (which is emulated by the KVM MMU). This patch also allows KVM to ignore SCE (syscall enable) when the guest is running in 32-bit mode. This is because the syscall instruction is not available in 32-bit mode on Intel processors, so the SCE bit is effectively meaningless. Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Laurent Vivier authored
Move emulate_ctxt to kvm_vcpu to keep emulate context when we exit from kvm module. Call x86_decode_insn() only when needed. Modify x86_emulate_insn() to not modify the context if it must be re-entered. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Laurent Vivier authored
emulate_instruction() calls now x86_decode_insn() and x86_emulate_insn(). x86_emulate_insn() is x86_emulate_memop() without the decoding part. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Laurent Vivier authored
Split the decoding process into a new function x86_decode_insn(). Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Laurent Vivier authored
Move all x86_emulate_memop() common variables between decode and execute to a structure decode_cache. This will help in later separating decode and emulate. struct decode_cache { u8 twobyte; u8 b; u8 lock_prefix; u8 rep_prefix; u8 op_bytes; u8 ad_bytes; struct operand src; struct operand dst; unsigned long *override_base; unsigned int d; unsigned long regs[NR_VCPU_REGS]; unsigned long eip; /* modrm */ u8 modrm; u8 modrm_mod; u8 modrm_reg; u8 modrm_rm; u8 use_modrm_ea; unsigned long modrm_ea; unsigned long modrm_val; }; Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Laurent Vivier authored
Remove #ifdef functions never used Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Anthony Liguori authored
This patch refactors the current hypercall infrastructure to better support live migration and SMP. It eliminates the hypercall page by trapping the UD exception that would occur if you used the wrong hypercall instruction for the underlying architecture and replacing it with the right one lazily. A fall-out of this patch is that the unhandled hypercalls no longer trap to userspace. There is very little reason though to use a hypercall to communicate with userspace as PIO or MMIO can be used. There is no code in tree that uses userspace hypercalls. [avi: fix #ud injection on vmx] Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Anthony Liguori authored
Add vmmcall/vmcall to x86_emulate. Future patch will implement functionality for these instructions. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86Linus Torvalds authored
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: (890 commits) x86: fix nodemap_size according to nodeid bits x86: fix overlap between pagetable with bss section x86: add PCI IDs to k8topology_64.c x86: fix early_ioremap pagetable ops x86: use the same pgd_list for PAE and 64-bit x86: defer cr3 reload when doing pud_clear() x86: early boot debugging via FireWire (ohci1394_dma=early) x86: don't special-case pmd allocations as much x86: shrink some ifdefs in fault.c x86: ignore spurious faults x86: remove nx_enabled from fault.c x86: unify fault_32|64.c x86: unify fault_32|64.c with ifdefs x86: unify fault_32|64.c by ifdef'd function bodies x86: arch/x86/mm/init_32.c printk fixes x86: arch/x86/mm/init_32.c cleanup x86: arch/x86/mm/init_64.c printk fixes x86: unify ioremap x86: fixes some bugs about EFI memory map handling x86: use reboot_type on EFI 32 ...
-
Linus Torvalds authored
Both the old e1000 driver and the new e1000e driver can drive some PCI-Express e1000 cards, and we should avoid ambiguity about which driver will pick up the support for those cards when both drivers are enabled. This solves the problem by having the old driver support those cards if the new driver isn't configured, but otherwise ceding support for PCI Express versions of the e1000 chipset to the newer driver. Thus allowing both legacy configurations where only the old driver is active (and handles all chips it knows about) and the new configuration with the new driver handling the more modern PCIE variants. Acked-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
We want IPV6HEADER matching for the non-advanced default netfilter configuration, since it's part of the standard netfilter setup of at least some distributions (eg Fedora). Otherwise NETFILTER_ADVANCED loses much of its point, since even non-advanced users would have to enable all the advanced options just to get a working IPv6 netfilter setup. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yinghai Lu authored
memnode.map is s16 array because of nodeid is 16 bit now. so need to increase the nodemap_size according to that bits. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Yinghai Lu authored
one early crash on one 8 node 256g machine: Command line: console=uart8250,io,0x3f8,115200n8 initrd=kernel.org/mydisk11_x86_64.gz rw root=/dev/ram0 debug initcall_debug apic=debug acpi.debug_level=0x0000000f pci=routeirq ip=dhcp load_ramdisk=1 ramdisk_size=131072 BOOT_IMAGE=kernel.org/bzImage_2.6.25_k8.1 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009bc00 (usable) BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000dffe0000 (usable) BIOS-e820: 00000000dffe0000 - 00000000dffee000 (ACPI data) BIOS-e820: 00000000dffee000 - 00000000dffff050 (ACPI NVS) BIOS-e820: 00000000dffff050 - 00000000e0000000 (reserved) BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000ff700000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 0000004020000000 (usable) Early serial console at I/O port 0x3f8 (options '115200n8') console [uart0] enabled end_pfn_map = 67239936 Kernel panic - not syncing: Duplicated early reservation d40000-e42000 Pid: 0, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #3 Call Trace: [<ffffffff80221545>] lapic_get_maxlvt+0x0/0x10 [<ffffffff80221657>] clear_local_APIC+0x5/0xcf [<ffffffff80221726>] disable_local_APIC+0x5/0x17 [<ffffffff8021fe16>] smp_send_stop+0x46/0x4c [<ffffffff80235293>] panic+0x94/0x13e [<ffffffff80bc3b03>] sctp_eps_proc_init+0x12/0x34 [<ffffffff80b9f1c5>] reserve_early+0x30/0x6c [<ffffffff80803925>] init_memory_mapping+0x2cd/0x2dc [<ffffffff80b9dc01>] setup_arch+0x21f/0x44e [<ffffffff80b978be>] start_kernel+0x6f/0x2c7 [<ffffffff80b971cc>] _sinittext+0x1cc/0x1d3 it turns out there is overlap between pgtable and bss... in System.map we have ffffffff80d40420 b rsi_table ffffffff80d40620 B krb5_seq_lock ffffffff80d40628 b i.20437 ffffffff80d40630 b xprt_rdma_inline_write_padding ffffffff80d40638 b sunrpc_table_header ffffffff80d40640 b zero ffffffff80d40644 b min_memreg ffffffff80d40648 b rpcrdma_tk_lock_g ffffffff80d40650 B sctp_assocs_id_lock ffffffff80d40658 B proc_net_sctp ffffffff80d40660 B sctp_assocs_id ffffffff80d40680 B sysctl_sctp_mem ffffffff80d40690 B sysctl_sctp_rmem ffffffff80d406a0 B sysctl_sctp_wmem ffffffff80d406b0 b sctp_ctl_socket ffffffff80d406b8 b sctp_pf_inet6_specific ffffffff80d406c0 b sctp_pf_inet_specific ffffffff80d406c8 b sctp_af_v4_specific ffffffff80d406d0 b sctp_af_v6_specific ffffffff80d406d8 b sctp_rand.33270 ffffffff80d406dc b sctp_memory_pressure ffffffff80d406e0 b sctp_sockets_allocated ffffffff80d406e4 b sctp_memory_allocated ffffffff80d406e8 b sctp_sysctl_header ffffffff80d406f0 b zero ffffffff80d406f4 A __bss_stop ffffffff80d406f4 A _end need to round up table_start to PAGE_SIZE. also make the panic more informative. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Joachim Deguara authored
This just adds the PCI IDs of AMD's family 10h and 11h CPU's northbridges to k8topology discovery. Signed-off-by: Joachim Deguara <joachim.deguara@amd.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Jeremy Fitzhardinge authored
Put appropriate pagetable update hooks in so that paravirt knows what's going on in there. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-