Re: [libvirt] RAM backend and guest ABI (was Re: [Qemu-devel] [PATCH v2] pc: memhp: enforce minimal 128Mb) alignment for pc-dimm

Friday, 30 October 2015

On Thu, 29 Oct 2015 16:16:57 -0200
Eduardo Habkost <ehabkost(a)redhat.com&gt; wrote:

...
 (CCing Michal and libvir-list, so libvirt team is aware of this
 restriction)

 On Thu, Oct 29, 2015 at 02:36:37PM +0100, Igor Mammedov wrote:
 > On Tue, 27 Oct 2015 14:36:35 -0200
 > Eduardo Habkost <ehabkost(a)redhat.com&gt; wrote:
 > 
 > > On Tue, Oct 27, 2015 at 10:14:56AM +0100, Igor Mammedov wrote:
 > > > On Tue, 27 Oct 2015 10:53:08 +0200
 > > > "Michael S. Tsirkin" <mst(a)redhat.com&gt; wrote:
 > > > 
 > > > > On Tue, Oct 27, 2015 at 09:48:37AM +0100, Igor Mammedov wrote:
 > > > > > On Tue, 27 Oct 2015 10:31:21 +0200
 > > > > > "Michael S. Tsirkin" <mst(a)redhat.com&gt; wrote:
 > > > > > 
 > > > > > > On Mon, Oct 26, 2015 at 02:24:32PM +0100, Igor Mammedov
wrote:
 > > > > > > > Yep it's workaround but it works around QEMU's
broken virtio
 > > > > > > > implementation in a simple way without need for guest
side changes.
 > > > > > > > 
 > > > > > > > Without foreseeable virtio fix it makes memory hotplug
unusable and even
 > > > > > > > more so if there were a virtio fix it won't fix
old guests since you've
 > > > > > > > said that virtio fix would require changes of both
QEMU and guest sides.
 > > > > > > 
 > > > > > > What makes it not foreseeable?
 > > > > > > Apparently only the fact that we have a work-around in
place so no one
 > > > > > > works on it.  I can code it up pretty quickly, but I'm
flat out of time
 > > > > > > for testing as I'm going on vacation soon, and hard
freeze is pretty
 > > > > > > close.
 > > > > > I can lend a hand for testing part.
 > > > > > 
 > > > > > > 
 > > > > > > GPA space is kind of cheap, but wasting it in chunks of
512M
 > > > > > > seems way too aggressive.
 > > > > > hotplug region is sized with 1Gb alignment reserve per DIMM so
we aren't
 > > > > > actually wasting anything here.
 > > > > >
 > > > > 
 > > > > If I allocate two 1G DIMMs, what will be the gap size? 512M? 1G?
 > > > > It's too much either way.
 > > > minimum would be 512, and if backend is 1Gb-hugepage gap will be
 > > > backend's natural alignment (i.e. 1Gb).
 > > 
 > > Is backend configuration even allowed to affect the machine ABI? We need
 > > to be able to change backend configuration when migrating the VM to
 > > another host.
 > for now, one has to use the same type of backend on both sides
 > i.e. if source uses 1Gb huge pages backend then target also
 > need to use it.
 > 

 The page size of the backend don't even depend on QEMU arguments, but on
 the kernel command-line or hugetlbfs mount options. So it's possible to
 have exactly the same QEMU command-line on source and destination (with
 an explicit versioned machine-type), and get a VM that can't be
 migrated? That means we are breaking our guarantees about migration and
 guest ABI.

 > We could change this for the next machine type to always force
 > max alignment (1Gb), then it would be possible to change
 > between backends with different alignments.

 I'm not sure what's the best solution here. If always using 1GB is too
 aggressive, we could require management to ask for an explicit alignment
 as a -machine option if they know they will need a specific backend page
 size.

 BTW, are you talking about the behavior introduced by
 aa8580cddf011e8cedcf87f7a0fdea7549fc4704 ("pc: memhp: force gaps between
 DIMM's GPA") only, or the backend page size was already affecting GPA
 allocation before that commit? backend alignment was there since beginning,
we always over-reserve 1GB per slot since we don't know in advance what
alignment hotplugged backend would require.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] RAM backend and guest ABI (was Re: [Qemu-devel] [PATCH v2] pc: memhp: enforce minimal 128Mb) alignment for pc-dimm