On Thu, 29 Oct 2015 16:16:57 -0200
Eduardo Habkost <ehabkost(a)redhat.com> wrote:
(CCing Michal and libvir-list, so libvirt team is aware of this
restriction)
On Thu, Oct 29, 2015 at 02:36:37PM +0100, Igor Mammedov wrote:
> On Tue, 27 Oct 2015 14:36:35 -0200
> Eduardo Habkost <ehabkost(a)redhat.com> wrote:
>
> > On Tue, Oct 27, 2015 at 10:14:56AM +0100, Igor Mammedov wrote:
> > > On Tue, 27 Oct 2015 10:53:08 +0200
> > > "Michael S. Tsirkin" <mst(a)redhat.com> wrote:
> > >
> > > > On Tue, Oct 27, 2015 at 09:48:37AM +0100, Igor Mammedov wrote:
> > > > > On Tue, 27 Oct 2015 10:31:21 +0200
> > > > > "Michael S. Tsirkin" <mst(a)redhat.com> wrote:
> > > > >
> > > > > > On Mon, Oct 26, 2015 at 02:24:32PM +0100, Igor Mammedov
wrote:
> > > > > > > Yep it's workaround but it works around QEMU's
broken virtio
> > > > > > > implementation in a simple way without need for guest
side changes.
> > > > > > >
> > > > > > > Without foreseeable virtio fix it makes memory hotplug
unusable and even
> > > > > > > more so if there were a virtio fix it won't fix
old guests since you've
> > > > > > > said that virtio fix would require changes of both
QEMU and guest sides.
> > > > > >
> > > > > > What makes it not foreseeable?
> > > > > > Apparently only the fact that we have a work-around in
place so no one
> > > > > > works on it. I can code it up pretty quickly, but I'm
flat out of time
> > > > > > for testing as I'm going on vacation soon, and hard
freeze is pretty
> > > > > > close.
> > > > > I can lend a hand for testing part.
> > > > >
> > > > > >
> > > > > > GPA space is kind of cheap, but wasting it in chunks of
512M
> > > > > > seems way too aggressive.
> > > > > hotplug region is sized with 1Gb alignment reserve per DIMM so
we aren't
> > > > > actually wasting anything here.
> > > > >
> > > >
> > > > If I allocate two 1G DIMMs, what will be the gap size? 512M? 1G?
> > > > It's too much either way.
> > > minimum would be 512, and if backend is 1Gb-hugepage gap will be
> > > backend's natural alignment (i.e. 1Gb).
> >
> > Is backend configuration even allowed to affect the machine ABI? We need
> > to be able to change backend configuration when migrating the VM to
> > another host.
> for now, one has to use the same type of backend on both sides
> i.e. if source uses 1Gb huge pages backend then target also
> need to use it.
>
The page size of the backend don't even depend on QEMU arguments, but on
the kernel command-line or hugetlbfs mount options. So it's possible to
have exactly the same QEMU command-line on source and destination (with
an explicit versioned machine-type), and get a VM that can't be
migrated? That means we are breaking our guarantees about migration and
guest ABI.
> We could change this for the next machine type to always force
> max alignment (1Gb), then it would be possible to change
> between backends with different alignments.
I'm not sure what's the best solution here. If always using 1GB is too
aggressive, we could require management to ask for an explicit alignment
as a -machine option if they know they will need a specific backend page
size.
BTW, are you talking about the behavior introduced by
aa8580cddf011e8cedcf87f7a0fdea7549fc4704 ("pc: memhp: force gaps between
DIMM's GPA") only, or the backend page size was already affecting GPA
allocation before that commit?
backend alignment was there since beginning,
we always over-reserve 1GB per slot since we don't know in advance what
alignment hotplugged backend would require.