On Tue, Oct 13, 2020 at 01:33:28AM -0400, harry harry wrote:
> > Do you mean that GPAs are different from their
corresponding HVAs when
> > KVM does the walks (as you said above) in software?
>
> What do you mean by "different"? GPAs and HVAs are two completely
different
> address spaces.
Let me give you one concrete example as follows to explain the meaning of
``different''.
Suppose a program is running in a single-vCPU VM. The program allocates and
references one page (e.g., array[1024*4]). Assume that allocating and
referencing the page in the guest OS triggers a page fault and host OS
allocates a machine page to back it.
Assume that GVA of array[0] is 0x000000000021 and its corresponding GPA is
0x0000000000000081. I think array[0]'s corresponding HVA should also be
0x0000000000000081, which is the same as array[0]'s GPA. If array[0]'s HVA
is not 0x0000000000000081, array[0]'s GPA is* different* from its
corresponding HVA.
Now, let's assume array[0]'s GPA is different from its corresponding HVA. I
think there might be one issue like this: I think MMU's hardware logic to
translate ``GPA ->[extended/nested page tables] -> HPA''[1] should be the
same as ``VA-> [page tables] -> PA"[2]; if true, how does KVM find the
correct HPA with the different HVA (e.g., array[0]'s HVA is not
0x0000000000000081) when there are EPT violations?
This is where memslots come in. Think of memslots as a one-level page tablea
that translate GPAs to HVAs. A memslot, set by userspace, tells KVM the
corresponding HVA for a given GPA.
Before the guest is running (assuming host userspace isn't broken), the
userspace VMM will first allocate virtual memory (HVA) for all physical
memory it wants to map into the guest (GPA). It then tells KVM how to
translate a given GPA to its HVA by creating a memslot.
To avoid getting lost in a tangent about page offsets, let's assume array[0]'s
GPA = 0xa000. For KVM to create a GPA->HPA mapping for the guest, there _must_
be a memslot that translates GPA 0xa000 to an HVA[*]. Let's say HVA = 0xb000.
On an EPT violation, KVM does a memslot lookup to translate the GPA (0xa000) to
its HVA (0xb000), and then walks the host page tables to translate the HVA into
a HPA (let's say that ends up being 0xc000). KVM then stuffs 0xc000 into the
EPT tables, which yields:
GPA -> HVA (KVM memslots)
0xa000 0xb000
HVA -> HPA (host page tables)
0xb000 0xc000
GPA -> HPA (extended page tables)
0xa000 0xc000
To keep the EPT tables synchronized with the host page tables, if HVA->HPA
changes, e.g. HVA 0xb000 is remapped to HPA 0xd000, then KVM will get notified
by the host kernel that the HVA has been unmapped and will find and unmap
the corresponding GPA (again via memslots) to HPA translations.
Ditto for the case where userspace moves a memslot, e.g. if HVA is changed
to 0xe000, KVM will first unmap all old GPA->HPA translations so that accesses
to GPA 0xa000 from the guest will take an EPT violation and see the new HVA
(and presumably a new HPA).
[*] If there is no memslot, KVM will exit to userspace on the EPT violation,
with some information about what GPA the guest was accessing. This is how
emulated MMIO is implemented, e.g. userspace intentionally doesn't back a
GPA with a memslot so that it can trap guest accesses to said GPA for the
purpose of emulating a device.
[1] Please note that this hardware walk is the last step, which only
translates the guest physical address to the host physical address through
the four-level nested page table.
[2] Please note that this hardware walk assumes translating the VA to the
PA without virtualization involvement.
Please note that the above addresses are not real and just use for
explanations.
Thanks,
Harry