Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!

Tuesday, 13 October 2020

On Tue, Oct 13, 2020 at 01:33:28AM -0400, harry harry wrote:
...
 > > Do you mean that GPAs are different from their
corresponding HVAs when
 > > KVM does the walks (as you said above) in software?
 >
 > What do you mean by "different"?  GPAs and HVAs are two completely
 different
 > address spaces.

 Let me give you one concrete example as follows to explain the meaning of
 ``different''.

 Suppose a program is running in a single-vCPU VM. The program allocates and
 references one page (e.g., array[1024*4]). Assume that allocating and
 referencing the page in the guest OS triggers a page fault and host OS
 allocates a machine page to back it.

 Assume that GVA of array[0] is 0x000000000021 and its corresponding GPA is
 0x0000000000000081. I think array[0]'s corresponding HVA should also be
 0x0000000000000081, which is the same as array[0]'s GPA. If array[0]'s HVA
 is not 0x0000000000000081, array[0]'s GPA is* different* from its
 corresponding HVA.

 Now, let's assume array[0]'s GPA is different from its corresponding HVA. I
 think there might be one issue like this: I think MMU's hardware logic to
 translate ``GPA ->[extended/nested page tables] -> HPA''[1] should be the
 same as ``VA-> [page tables] -> PA"[2]; if true, how does KVM find the
 correct HPA with the different HVA (e.g., array[0]'s HVA is not
 0x0000000000000081) when there are EPT violations? 
This is where memslots come in.  Think of memslots as a one-level page tablea
that translate GPAs to HVAs.  A memslot, set by userspace, tells KVM the
corresponding HVA for a given GPA.

Before the guest is running (assuming host userspace isn't broken), the
userspace VMM will first allocate virtual memory (HVA) for all physical
memory it wants to map into the guest (GPA).  It then tells KVM how to
translate a given GPA to its HVA by creating a memslot.

To avoid getting lost in a tangent about page offsets, let's assume array[0]'s
GPA = 0xa000.  For KVM to create a GPA->HPA mapping for the guest, there _must_
be a memslot that translates GPA 0xa000 to an HVA[*].  Let's say HVA = 0xb000.

On an EPT violation, KVM does a memslot lookup to translate the GPA (0xa000) to
its HVA (0xb000), and then walks the host page tables to translate the HVA into
a HPA (let's say that ends up being 0xc000).  KVM then stuffs 0xc000 into the
EPT tables, which yields:

  GPA    -> HVA    (KVM memslots)
  0xa000    0xb000

  HVA    -> HPA    (host page tables)
  0xb000    0xc000

  GPA    -> HPA    (extended page tables)
  0xa000    0xc000

To keep the EPT tables synchronized with the host page tables, if HVA->HPA
changes, e.g. HVA 0xb000 is remapped to HPA 0xd000, then KVM will get notified
by the host kernel that the HVA has been unmapped and will find and unmap
the corresponding GPA (again via memslots) to HPA translations.

Ditto for the case where userspace moves a memslot, e.g. if HVA is changed
to 0xe000, KVM will first unmap all old GPA->HPA translations so that accesses
to GPA 0xa000 from the guest will take an EPT violation and see the new HVA
(and presumably a new HPA).

[*] If there is no memslot, KVM will exit to userspace on the EPT violation,
    with some information about what GPA the guest was accessing.  This is how
    emulated MMIO is implemented, e.g. userspace intentionally doesn't back a
    GPA with a memslot so that it can trap guest accesses to said GPA for the
    purpose of emulating a device.

...
 [1] Please note that this hardware walk is the last step, which only
 translates the guest physical address to the host physical address through
 the four-level nested page table.
 [2] Please note that this hardware walk assumes translating the VA to the
 PA without virtualization involvement.

 Please note that the above addresses are not real and just use for
 explanations.

 Thanks,
 Harry 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!