On Thu, 13 Sep 2018 15:36:38 +0100
"Dr. David Alan Gilbert" <dgilbert(a)redhat.com> wrote:
* Igor Mammedov (imammedo(a)redhat.com) wrote:
> On Tue, 11 Sep 2018 17:30:31 +0400
> Marc-André Lureau <marcandre.lureau(a)redhat.com> wrote:
>
> > Hi
> >
> > On Tue, Sep 11, 2018 at 5:14 PM, Igor Mammedov <imammedo(a)redhat.com>
wrote:
> > > On Tue, 11 Sep 2018 12:49:12 +0100
> > > "Dr. David Alan Gilbert" <dgilbert(a)redhat.com> wrote:
> > >
> > >> * Marc-André Lureau (marcandre.lureau(a)redhat.com) wrote:
> > >> > Hi
> > >> >
> > >> > On Tue, Sep 11, 2018 at 3:26 PM, Dr. David Alan Gilbert
> > >> > <dgilbert(a)redhat.com> wrote:
> > >> > > * Marc-André Lureau (marcandre.lureau(a)redhat.com) wrote:
> > >> > >> Hi
> > >> > >>
> > >> > >> On Tue, Sep 11, 2018 at 2:32 PM, Dr. David Alan Gilbert
> > >> > >> <dgilbert(a)redhat.com> wrote:
> > >> > >> > * Marc-André Lureau (marcandre.lureau(a)redhat.com)
wrote:
> > >> > >> >> Hi
> > >> > >> >>
> > >> > >> >> On Tue, Sep 11, 2018 at 12:37 PM, Michal
Privoznik <mprivozn(a)redhat.com> wrote:
> > >> > >> >> > On 09/11/2018 12:46 AM, John Ferlan wrote:
> > >> > >> >> >>
> > >> > >> >> >> On 09/07/2018 07:32 AM,
marcandre.lureau(a)redhat.com wrote:
> > >> > >> >> >>> From: Marc-André Lureau
<marcandre.lureau(a)redhat.com>
> > >> > >> >> >>>
> > >> > >> >> >>
> > >> > >> >> >> Would be nice to have a few more words
here. If you provide them I can
> > >> > >> >> >> add them... The if statement is
difficult to read unless you know what
> > >> > >> >> >> each field really means.
> > >> > >> >> >>
> > >> > >> >> >> secondary question - should we
document what gets used?, e.g.:
> > >> > >> >> >>
> > >> > >> >> >>
https://libvirt.org/formatdomain.html#elementsMemoryBacking
> > >> > >> >> >>
> > >> > >> >> >> Seems to me the preference to use
memfd is for memory backing using
> > >> > >> >> >> anonymous source for nvdimm's
without a defined path, but sometimes my
> > >> > >> >> >> wording doesn't match reality.
> > >> > >> >> >
> > >> > >> >> > I don't think we want to tell users
what backend are we going to use
> > >> > >> >> > under what conditions. Firstly, these
conditions will change (as they
> > >> > >> >> > did in the past). Secondly, what backend
libvirt decides to use is no
> > >> > >> >> > business of users. I mean, they care about
providing XML that matches
> > >> > >> >> > their demands. It's libvirt's job
to fulfil them.
> > >> > >> >> >
> > >> > >> >> > Look at this from the other way: if an
user wants to have
> > >> > >> >> > memory-backend-file for his domain, how
would they enforce it once memfd
> > >> > >> >> > is merged? Sure, they can tweak their
memoryBacking settings, but that
> > >> > >> >> > would work only until we decide to change
the decision process for mem
> > >> > >> >> > backend.
> > >> > >> >> >
> > >> > >> >> > What I am more worried about is migration.
What happens if I migrate a
> > >> > >> >> > hugepages domain from older libvirt to a
newer one (the former doesn't
> > >> > >> >> > support memfd, the latter does). On the
source the domain was started
> > >> > >> >> > with memory-backend-file (or
memory-backend-ram with -mem-path). And
> > >> > >> >> > during migration, the generated cmd line
would use memfd. And I don't
> > >> > >> >> > think qemu is capable of dealing with this
discrepancy, is it?
> > >> > >> >>
> > >> > >> >>
> > >> > >> >> Actually, qemu doesn't care about the
hostmem backend kind, it should
> > >> > >> >> handle the migration ok.
> > >> > >> >>
> > >> > >> >> However, there seems to be a bug in qemu, and
hostmem backend don't
> > >> > >> >> use the right qom object name.
> > >> > >> >
> > >> > >> > Can you give me the command lines you're using?
> > >> > >>
> > >> > >> qemu -m 4096 -object memory-backend-ram,id=mem,size=4G
-numa
> > >> > >> node,memdev=mem -monitor stdio
> > >> > >> qemu -m 4096 -object
> > >> > >> memory-backend-file,id=mem,size=4G,mem-path=/tmp/foo
-numa
> > >> > >> node,memdev=mem -monitor stdio
> > >> > >> qemu -m 4096 -object memory-backend-memfd,id=mem,size=4G
-numa
> > >> > >> node,memdev=mem -monitor stdio
> > >> > >
> > >> > > There seem to be two different problems (at least);
there's that
> > >> > > escaping problem where the /'s are shown as \x2f in into
qom-tree,
> > >> >
> > >> > That's not a problem, this is done in
memory_region_escape_name()
> > >> >
> > >> > > but info ramblock looks saner, but is still showing the
difference:
> > >> > >
> > >> > > ./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object
memory-backend-ram,id=mem,size=1G -numa node,memdev=mem -monitor stdio
> > >> > > (qemu) info ramblock
> > >> > > Block Name PSize Offset
Used Total
> > >> > > mem 4 KiB 0x0000000000000000
0x0000000040000000 0x0000000040000000
> > >> > >
> > >> > > ./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object
memory-backend-file,id=mem,size=1G,mem-path=/tmp/foo -numa node,memdev=mem -monitor stdio
> > >> > > (qemu) info ramblock
> > >> > > Block Name PSize Offset
Used Total
> > >> > > /objects/mem 4 KiB 0x0000000000000000
0x0000000040000000 0x0000000040000000
> > >> > >
> > >> > > ./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object
memory-backend-memfd,id=mem,size=1G -numa node,memdev=mem -monitor stdio
> > >> > > QEMU 3.0.50 monitor - type 'help' for more
information
> > >> > > (qemu) info ramblock
> > >> > > Block Name PSize Offset
Used Total
> > >> > > /objects/mem 4 KiB 0x0000000000000000
0x0000000040000000 0x0000000040000000
> > >> > >
> > >> > > hostmem-file.c is using object_get_canonical_path to get the
RAMBlock
> > >> > > where as hostmem-ram.c is using
object_get_canonical_path_**component**
> > >> > >
> > >> > > The problem is if we change either of them then again we
break
> > >> > > migration compatibility.
> > >> >
> > >> > Yes, that was the object of my question :)
> > >> >
> > >> > > We could wire it to a machine type and/or property, so that
> > >> > > memory-backend-ram would use the long name on newere qemus
with an
> > >> > > appropriate flag?
> > >> >
> > >> > Good idea, I can prepare a patch.
> > >>
> > >> Great; if you add the property to use the longname, then turn that
> > >> property on in the newer machine type it should work. A qemu that
has
> > >> the property can then be assumed to the right thing when set.
> > > compat properties mechanism is applicable only for device based objects
> > > and backends are not based on it. So it won't be so easy, one
basically
> > > would need to re-implement or event better extend compat props mechanism
> > > to backends.
> > >
> >
> > indeed
> >
> > >
> > >> > However, libvirt will have to learn of this migration issue with
older
> > >> > version, it's probably not worth to try to make more
workarounds.
> > >>
> > >> Yeh I'm not sure what your heuristics look like for these
choices.
> > >> But for a VM without this fix then you can't convert from
backend-ram to
> > >> memfd.
> > > I wouldn't try migrate from one to backend type to another
automatically
> > > if domain used backend-ram than libvirt should start target with the same
> > > backend (it not only ram block name in migration stream, but could also
> > > involve ramblock's alignment, padding, guard pages or something else
as
> > > it's different backends and potentially can change its default
behavior
> > > independently from each other).
> >
> > Then libvirt can't transparently use memfd, and we will go back to my
> > initial suggestion to have a new memory backing source kind in the
> > domain XML named "memfd".
> less magic the better, the only downside is that implementation
> details of a QEMU backend sip through abstraction libvirt is
> supposed to produce for it's users and a question how users are
> supposed to pick a backend variant for their needs.
>
> >
> > Are "ramblock's alignment, padding, guard pages" exposed in
domain
> > XML? Didn't they change over time in qemu wtihout libvirt noticing?
> > Why allocation with memfd couldn't be transparently be changed the
> > same way?
> >
> > > Redefining meaning of 'anonymous' from backend-ram to memfd is
fine only
> > > if libvirt is able to distinguish old domains with ram backend vs memfd
> > > (so it could start domains accordingly, i.e. no cross migration).
> >
> > And memory-backend-file used as anonymous memory (without explicit path etc).
> >
> > > Otherwise we would be creating time bomb, that would explode
> > > when 2 independent backends change in incompatible manner.
> >
> > If there is such a limitation, qemu should prevent it then. It seems
> > qemu let you migrate from/to the various hostmem-* (as long they use
> > the same name, which is the case for -file and -memfd at this point).
> > Why restrict that now?
> it works buy luck not by design. Even though qemu doesn't block it,
> it doesn't mean that's the right thing to do.
> Rule of the thumb with migration is that CLI on destination should
> match one one source (i.e. no magical cli replacements).
> If it's not then user is to blame.
The rule isn't actually that strong. We normally allow the backends to
change as long as the guest visible parts don't. For example, it's
perfectly legal to migrate between a qemu that's got it's virtio-blk
wired to a NFS disk to a qemu that's got it wired to iSCSI - the guest
view in the two cases is the same but the command line is quite
different. Similarly for networking you can flip to different tap
setups.
So as long as the change:
a) looks identical to the guest
well,
in case of memory-backend, alignment of backing storage/memory_region
affects guest ABI (RAM layout) (maybe there are other variables).
So one switching memory backends should ensure that new backend
and it's options will match old backend properties (on QEMU side
we don't have anything ensure it (hopefully migration
would fail due to different GPA layout, but that's it).
Considering that a backend author typically cares only about
his own backend, I wouldn't bet on backend switching being
not regressed.
Hence I don't really support idea of magical backend switching and
putting extra effort on QEMU side to make it work and maintain.
It would be more robust to add an additional type of anonymous memory
in domain description or let libvirt toggle anonymous meaning/backend
based on domain machine type/version and host capabilities (has memfd
or not/supported hugepage sizes/...).
This way old domains will continue using old backends and new ones
will use new ones.
b) Doesn't have any backend specific migration data
then a migration should work and I'd expect it to work.
Dave
> >
> > >
> > >> Dave
> > >>
> > >> >
> > >> > > Dave
> > >> > >
> > >> > >
> > >> > >
> > >> > >> >
> > >> > >> > Dave
> > >> > >> >
> > >> > >> >> with memory-backend-ram:
> > >> > >> >>
> > >> > >> >> (qemu) info qom-tree /objects
> > >> > >> >> /objects (container)
> > >> > >> >> /mem (memory-backend-file)
> > >> > >> >> /mem[0] (qemu:memory-region)
> > >> > >> >>
> > >> > >> >> But with memory-backend-file or
memory-backend-memfd:
> > >> > >> >>
> > >> > >> >> (qemu) info qom-tree /objects
> > >> > >> >> /objects (container)
> > >> > >> >> /mem (memory-backend-file)
> > >> > >> >> /\x2fobjects\x2fmem[0]
(qemu:memory-region)
> > >> > >> >>
> > >> > >> >>
> > >> > >> >> This causes migration to fail because of the
object naming mismatch.
> > >> > >> >>
> > >> > >> >> It can migrate from/to -file and -memfd, since
they use the same
> > >> > >> >> "broken" name, but not with -ram.
> > >> > >> >>
> > >> > >> >> I don't know how we can solve this
migration issue without breaking
> > >> > >> >> things further. Any idea David?
> > >> > >> >>
> > >> > >> >> > Or is memfd going to be used only for
hugepages + <source
> > >> > >> >> > type='anonymous'/> case (which
is not allowed now and thus migration
> > >> > >> >> > scenario I'm describing can't
happen)?
> > >> > >> >>
> > >> > >> >> With those patches, memfd is used for anonymous
memory (shared or not,
> > >> > >> >> hpt or not) with an explicit numa
configuration.
> > >> > >> >>
> > >> > >> >> thanks
> > >> > >> > --
> > >> > >> > Dr. David Alan Gilbert / dgilbert(a)redhat.com /
Manchester, UK
> > >> > > --
> > >> > > Dr. David Alan Gilbert / dgilbert(a)redhat.com / Manchester,
UK
> > >> --
> > >> Dr. David Alan Gilbert / dgilbert(a)redhat.com / Manchester, UK
> > >
>
--
Dr. David Alan Gilbert / dgilbert(a)redhat.com / Manchester, UK