[libvirt] RAM backend and guest ABI (was Re: [Qemu-devel] [PATCH v2] pc: memhp: enforce minimal 128Mb) alignment for pc-dimm

Thursday, 29 October 2015

(CCing Michal and libvir-list, so libvirt team is aware of this
restriction)

On Thu, Oct 29, 2015 at 02:36:37PM +0100, Igor Mammedov wrote:
...
 On Tue, 27 Oct 2015 14:36:35 -0200
 Eduardo Habkost <ehabkost(a)redhat.com&gt; wrote:

 > On Tue, Oct 27, 2015 at 10:14:56AM +0100, Igor Mammedov wrote:
 > > On Tue, 27 Oct 2015 10:53:08 +0200
 > > "Michael S. Tsirkin" <mst(a)redhat.com&gt; wrote:
 > > 
 > > > On Tue, Oct 27, 2015 at 09:48:37AM +0100, Igor Mammedov wrote:
 > > > > On Tue, 27 Oct 2015 10:31:21 +0200
 > > > > "Michael S. Tsirkin" <mst(a)redhat.com&gt; wrote:
 > > > > 
 > > > > > On Mon, Oct 26, 2015 at 02:24:32PM +0100, Igor Mammedov wrote:
 > > > > > > Yep it's workaround but it works around QEMU's
broken virtio
 > > > > > > implementation in a simple way without need for guest side
changes.
 > > > > > > 
 > > > > > > Without foreseeable virtio fix it makes memory hotplug
unusable and even
 > > > > > > more so if there were a virtio fix it won't fix old
guests since you've
 > > > > > > said that virtio fix would require changes of both QEMU and
guest sides.
 > > > > > 
 > > > > > What makes it not foreseeable?
 > > > > > Apparently only the fact that we have a work-around in place so
no one
 > > > > > works on it.  I can code it up pretty quickly, but I'm flat
out of time
 > > > > > for testing as I'm going on vacation soon, and hard freeze
is pretty
 > > > > > close.
 > > > > I can lend a hand for testing part.
 > > > > 
 > > > > > 
 > > > > > GPA space is kind of cheap, but wasting it in chunks of 512M
 > > > > > seems way too aggressive.
 > > > > hotplug region is sized with 1Gb alignment reserve per DIMM so we
aren't
 > > > > actually wasting anything here.
 > > > >
 > > > 
 > > > If I allocate two 1G DIMMs, what will be the gap size? 512M? 1G?
 > > > It's too much either way.
 > > minimum would be 512, and if backend is 1Gb-hugepage gap will be
 > > backend's natural alignment (i.e. 1Gb).
 > 
 > Is backend configuration even allowed to affect the machine ABI? We need
 > to be able to change backend configuration when migrating the VM to
 > another host.
 for now, one has to use the same type of backend on both sides
 i.e. if source uses 1Gb huge pages backend then target also
 need to use it.

The page size of the backend don't even depend on QEMU arguments, but on
the kernel command-line or hugetlbfs mount options. So it's possible to
have exactly the same QEMU command-line on source and destination (with
an explicit versioned machine-type), and get a VM that can't be
migrated? That means we are breaking our guarantees about migration and
guest ABI.

...
 We could change this for the next machine type to always force
 max alignment (1Gb), then it would be possible to change
 between backends with different alignments. 
I'm not sure what's the best solution here. If always using 1GB is too
aggressive, we could require management to ask for an explicit alignment
as a -machine option if they know they will need a specific backend page
size.

BTW, are you talking about the behavior introduced by
aa8580cddf011e8cedcf87f7a0fdea7549fc4704 ("pc: memhp: force gaps between
DIMM's GPA") only, or the backend page size was already affecting GPA
allocation before that commit?

-- 
Eduardo

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005