• Konrad Rzeszutek Wilk's avatar
    xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM · 2e2fb754
    Konrad Rzeszutek Wilk authored
    When the Xen hypervisor boots a PV kernel it hands it two pieces
    of information: nr_pages and a made up E820 entry.
    
    The nr_pages value defines the range from zero to nr_pages of PFNs
    which have a valid Machine Frame Number (MFN) underneath it. The
    E820 mirrors that (with the VGA hole):
    BIOS-provided physical RAM map:
     Xen: 0000000000000000 - 00000000000a0000 (usable)
     Xen: 00000000000a0000 - 0000000000100000 (reserved)
     Xen: 0000000000100000 - 0000000080800000 (usable)
    
    The fun comes when a PV guest that is run with a machine E820 - that
    can either be the initial domain or a PCI PV guest, where the E820
    looks like the normal thing:
    
    BIOS-provided physical RAM map:
     Xen: 0000000000000000 - 000000000009e000 (usable)
     Xen: 000000000009ec00 - 0000000000100000 (reserved)
     Xen: 0000000000100000 - 0000000020000000 (usable)
     Xen: 0000000020000000 - 0000000020200000 (reserved)
     Xen: 0000000020200000 - 0000000040000000 (usable)
     Xen: 0000000040000000 - 0000000040200000 (reserved)
     Xen: 0000000040200000 - 00000000bad80000 (usable)
     Xen: 00000000bad80000 - 00000000badc9000 (ACPI NVS)
    ..
    With that overlaying the nr_pages directly on the E820 does not
    work as there are gaps and non-RAM regions that won't be used
    by the memory allocator. The 'xen_release_chunk' helps with that
    by punching holes in the P2M (PFN to MFN lookup tree) for those
    regions and tells us that:
    
    Freeing  20000-20200 pfn range: 512 pages freed
    Freeing  40000-40200 pfn range: 512 pages freed
    Freeing  bad80-badf4 pfn range: 116 pages freed
    Freeing  badf6-bae7f pfn range: 137 pages freed
    Freeing  bb000-100000 pfn range: 282624 pages freed
    Released 283999 pages of unused memory
    
    Those 283999 pages are subtracted from the nr_pages and are returned
    to the hypervisor. The end result is that the initial domain
    boots with 1GB less memory as the nr_pages has been subtracted by
    the amount of pages residing within the PCI hole. It can balloon up
    to that if desired using 'xl mem-set 0 8092', but the balloon driver
    is not always compiled in for the initial domain.
    
    This patch, implements the populate hypercall (XENMEM_populate_physmap)
    which increases the the domain with the same amount of pages that
    were released.
    
    The other solution (that did not work) was to transplant the MFN in
    the P2M tree - the ones that were going to be freed were put in
    the E820_RAM regions past the nr_pages. But the modifications to the
    M2P array (the other side of creating PTEs) were not carried away.
    As the hypervisor is the only one capable of modifying that and the
    only two hypercalls that would do this are: the update_va_mapping
    (which won't work, as during initial bootup only PFNs up to nr_pages
    are mapped in the guest) or via the populate hypercall.
    
    The end result is that the kernel can now boot with the
    nr_pages without having to subtract the 283999 pages.
    
    On a 8GB machine, with various dom0_mem= parameters this is what we get:
    
    no dom0_mem
    -Memory: 6485264k/9435136k available (5817k kernel code, 1136060k absent, 1813812k reserved, 2899k data, 696k init)
    +Memory: 7619036k/9435136k available (5817k kernel code, 1136060k absent, 680040k reserved, 2899k data, 696k init)
    
    dom0_mem=3G
    -Memory: 2616536k/9435136k available (5817k kernel code, 1136060k absent, 5682540k reserved, 2899k data, 696k init)
    +Memory: 2703776k/9435136k available (5817k kernel code, 1136060k absent, 5595300k reserved, 2899k data, 696k init)
    
    dom0_mem=max:3G
    -Memory: 2696732k/4281724k available (5817k kernel code, 1136060k absent, 448932k reserved, 2899k data, 696k init)
    +Memory: 2702204k/4281724k available (5817k kernel code, 1136060k absent, 443460k reserved, 2899k data, 696k init)
    
    And the 'xm list' or 'xl list' now reflect what the dom0_mem=
    argument is.
    Acked-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
    [v2: Use populate hypercall]
    [v3: Remove debug printks]
    [v4: Simplify code]
    Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    2e2fb754
setup.c 14 KB