[PATCH] qemu.conf: add max-ram-below-4g option

Limit the amount of ram below 4G. This helps in scenarios like GPU passthrough when the GPA used by DMA device is conflict with the decode window of a host bridge and the address translation request to iommu isn't launched, which causes address overlapping. Note that currently this can be triggered by some abnormal behavior of hardware. In the general case, this option needs to be configured when virtual machines share the same host, which is why using qemu.conf to support per-host configuration. Signed-off-by: yezhiyong <yezhiyong@bytedance.com> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Signed-off-by: zhangruien <zhangruien@bytedance.com> --- src/qemu/qemu.conf | 8 ++++++++ src/qemu/qemu_command.c | 4 ++++ src/qemu/qemu_conf.c | 5 +++++ src/qemu/qemu_conf.h | 1 + 4 files changed, 18 insertions(+) diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 8722dc169c..f09c89486e 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -898,6 +898,14 @@ # NOTE: big files will be stored here #memory_backing_dir = "/var/lib/libvirt/qemu/ram" +# Limit the amount of ram below 4G. This helps in scenarios like +# GPU passthrough when the GPA used by DMA device is comflict with +# the decode window of a host bridge and the address translation +# request to iommu isn't launched, which causes address overlapping. +# Note that currently this can be triggered by some abnormal behavior +# of hardware. +#max_ram_below_4g = "2G" + # Path to the SCSI persistent reservations helper. This helper is # used whenever <reservations/> are enabled for SCSI LUN devices. #pr_helper = "/usr/bin/qemu-pr-helper" diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 2f69a79bc0..fd5e14c500 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -6960,6 +6960,10 @@ qemuBuildMachineCommandLine(virCommand *cmd, cfg->dumpGuestCore ? "on" : "off"); } + if (cfg->maxRamBelow4G) + virBufferAsprintf(&buf, ",max-ram-below-4g=%s", + cfg->maxRamBelow4G); + if (def->mem.nosharepages) virBufferAddLit(&buf, ",mem-merge=off"); diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index 916a3d36ee..b718995870 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -384,6 +384,8 @@ static void virQEMUDriverConfigDispose(void *obj) g_strfreev(cfg->capabilityfilters); g_free(cfg->deprecationBehavior); + + g_free(cfg->maxRamBelow4G); } @@ -1001,6 +1003,9 @@ virQEMUDriverConfigLoadMemoryEntry(virQEMUDriverConfig *cfg, g_autofree char *dir = NULL; int rc; + if (virConfGetValueString(conf, "max_ram_below_4g", &cfg->maxRamBelow4G) < 0) + return -1; + if ((rc = virConfGetValueString(conf, "memory_backing_dir", &dir)) < 0) { return -1; } else if (rc > 0) { diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index 2f64e39a18..ff558e2fdb 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -216,6 +216,7 @@ struct _virQEMUDriverConfig { bool virtiofsdDebug; char *memoryBackingDir; + char *maxRamBelow4G; uid_t swtpm_user; gid_t swtpm_group; -- 2.24.3 (Apple Git-128)

On Wed, Apr 21, 2021 at 04:25:28PM +0800, yezhiyong wrote:
Limit the amount of ram below 4G. This helps in scenarios like GPU passthrough when the GPA used by DMA device is conflict with the decode window of a host bridge and the address translation request to iommu isn't launched, which causes address overlapping. Note that currently this can be triggered by some abnormal behavior of hardware.
In the general case, this option needs to be configured when virtual machines share the same host, which is why using qemu.conf to support per-host configuration.
This memory layout setting affects the guest ABI, which in turn means it is something that must be preserved upon live migration. Using qemu.conf for this is dangerous because nothing will be ensuring that guest ABI is perserved. IOW, to support this feature definitely needs an attribute in the main doain XML schema.
Signed-off-by: yezhiyong <yezhiyong@bytedance.com> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Signed-off-by: zhangruien <zhangruien@bytedance.com> --- src/qemu/qemu.conf | 8 ++++++++ src/qemu/qemu_command.c | 4 ++++ src/qemu/qemu_conf.c | 5 +++++ src/qemu/qemu_conf.h | 1 + 4 files changed, 18 insertions(+)
diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 8722dc169c..f09c89486e 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -898,6 +898,14 @@ # NOTE: big files will be stored here #memory_backing_dir = "/var/lib/libvirt/qemu/ram"
+# Limit the amount of ram below 4G. This helps in scenarios like +# GPU passthrough when the GPA used by DMA device is comflict with +# the decode window of a host bridge and the address translation +# request to iommu isn't launched, which causes address overlapping. +# Note that currently this can be triggered by some abnormal behavior +# of hardware. +#max_ram_below_4g = "2G" + # Path to the SCSI persistent reservations helper. This helper is # used whenever <reservations/> are enabled for SCSI LUN devices. #pr_helper = "/usr/bin/qemu-pr-helper" diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 2f69a79bc0..fd5e14c500 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -6960,6 +6960,10 @@ qemuBuildMachineCommandLine(virCommand *cmd, cfg->dumpGuestCore ? "on" : "off"); }
+ if (cfg->maxRamBelow4G) + virBufferAsprintf(&buf, ",max-ram-below-4g=%s", + cfg->maxRamBelow4G); + if (def->mem.nosharepages) virBufferAddLit(&buf, ",mem-merge=off");
diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index 916a3d36ee..b718995870 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -384,6 +384,8 @@ static void virQEMUDriverConfigDispose(void *obj) g_strfreev(cfg->capabilityfilters);
g_free(cfg->deprecationBehavior); + + g_free(cfg->maxRamBelow4G); }
@@ -1001,6 +1003,9 @@ virQEMUDriverConfigLoadMemoryEntry(virQEMUDriverConfig *cfg, g_autofree char *dir = NULL; int rc;
+ if (virConfGetValueString(conf, "max_ram_below_4g", &cfg->maxRamBelow4G) < 0) + return -1; + if ((rc = virConfGetValueString(conf, "memory_backing_dir", &dir)) < 0) { return -1; } else if (rc > 0) { diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index 2f64e39a18..ff558e2fdb 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -216,6 +216,7 @@ struct _virQEMUDriverConfig { bool virtiofsdDebug;
char *memoryBackingDir; + char *maxRamBelow4G;
uid_t swtpm_user; gid_t swtpm_group; -- 2.24.3 (Apple Git-128)
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Wed, Apr 21, 2021 at 16:25:28 +0800, yezhiyong wrote:
Limit the amount of ram below 4G. This helps in scenarios like GPU passthrough when the GPA used by DMA device is conflict with the decode window of a host bridge and the address translation request to iommu isn't launched, which causes address overlapping. Note that currently this can be triggered by some abnormal behavior of hardware.
In the general case, this option needs to be configured when virtual machines share the same host, which is why using qemu.conf to support per-host configuration.
Is this really something you want to set for EVERY VM all the time, because it doesn't seem to me to be the case. If there is any reason not to have this for all VMs it must not be done via qemu.conf option but rather a XML knob.
Signed-off-by: yezhiyong <yezhiyong@bytedance.com> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Signed-off-by: zhangruien <zhangruien@bytedance.com> --- src/qemu/qemu.conf | 8 ++++++++ src/qemu/qemu_command.c | 4 ++++ src/qemu/qemu_conf.c | 5 +++++ src/qemu/qemu_conf.h | 1 + 4 files changed, 18 insertions(+)
diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 8722dc169c..f09c89486e 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -898,6 +898,14 @@ # NOTE: big files will be stored here #memory_backing_dir = "/var/lib/libvirt/qemu/ram"
+# Limit the amount of ram below 4G. This helps in scenarios like +# GPU passthrough when the GPA used by DMA device is comflict with +# the decode window of a host bridge and the address translation +# request to iommu isn't launched, which causes address overlapping. +# Note that currently this can be triggered by some abnormal behavior +# of hardware. +#max_ram_below_4g = "2G"

I agree with you. If it is added to the XML file, where should it be placed? I think the following may be OK: <domain> ... <maxMemory slots='16' unit='KiB'>1524288</maxMemory> <memory unit='KiB'>524288</memory> <currentMemory unit='KiB'>524288</currentMemory> <below4gMemory unit='GiB'>2</below4gMemory> ... </domain> Peter Krempa <pkrempa@redhat.com> 于2021年4月21日周三 下午4:46写道:
On Wed, Apr 21, 2021 at 16:25:28 +0800, yezhiyong wrote:
Limit the amount of ram below 4G. This helps in scenarios like GPU passthrough when the GPA used by DMA device is conflict with the decode window of a host bridge and the address translation request to iommu isn't launched, which causes address overlapping. Note that currently this can be triggered by some abnormal behavior of hardware.
In the general case, this option needs to be configured when virtual machines share the same host, which is why using qemu.conf to support per-host configuration.
Is this really something you want to set for EVERY VM all the time, because it doesn't seem to me to be the case.
If there is any reason not to have this for all VMs it must not be done via qemu.conf option but rather a XML knob.
Signed-off-by: yezhiyong <yezhiyong@bytedance.com> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Signed-off-by: zhangruien <zhangruien@bytedance.com> --- src/qemu/qemu.conf | 8 ++++++++ src/qemu/qemu_command.c | 4 ++++ src/qemu/qemu_conf.c | 5 +++++ src/qemu/qemu_conf.h | 1 + 4 files changed, 18 insertions(+)
diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 8722dc169c..f09c89486e 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -898,6 +898,14 @@ # NOTE: big files will be stored here #memory_backing_dir = "/var/lib/libvirt/qemu/ram"
+# Limit the amount of ram below 4G. This helps in scenarios like +# GPU passthrough when the GPA used by DMA device is comflict with +# the decode window of a host bridge and the address translation +# request to iommu isn't launched, which causes address overlapping. +# Note that currently this can be triggered by some abnormal behavior +# of hardware. +#max_ram_below_4g = "2G"

On Wed, Apr 21, 2021 at 06:19:47PM +0800, Zhiyong Ye wrote:
I agree with you. If it is added to the XML file, where should it be placed? I think the following may be OK: <domain> ... <maxMemory slots='16' unit='KiB'>1524288</maxMemory> <memory unit='KiB'>524288</memory> <currentMemory unit='KiB'>524288</currentMemory> <below4gMemory unit='GiB'>2</below4gMemory> ... </domain>
I think we could probably just make it an attribute on the existing element, eg <memory unit="MiB" below4g="2000">4000</memory>. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Sure,I‘ll push a new version later. On Wed, Apr 21, 2021 at 6:35 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Wed, Apr 21, 2021 at 06:19:47PM +0800, Zhiyong Ye wrote:
I agree with you. If it is added to the XML file, where should it be placed? I think the following may be OK: <domain> ... <maxMemory slots='16' unit='KiB'>1524288</maxMemory> <memory unit='KiB'>524288</memory> <currentMemory unit='KiB'>524288</currentMemory> <below4gMemory unit='GiB'>2</below4gMemory> ... </domain>
I think we could probably just make it an attribute on the existing element, eg <memory unit="MiB" below4g="2000">4000</memory>.
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (4)
-
Daniel P. Berrangé
-
Peter Krempa
-
yezhiyong
-
Zhiyong Ye