[libvirt] [RFC] vhost-user + shared memory + NUMA

Hello! vhost-user has a small limitation: guest memory must be shared. However, this simple requirement is satisfied by Libvirt only in very complicated case: 1. We have to specify NUMA configuration, because we can have "shared" attribute only for node descriptors inside "NUMA" section. 2. We have to specify huge page size, because memory-backend-file is used only in this case. Isn't it a problem? In order to do a simple thing (use vhost-user) we have to add two more quote unobvious things, making the whole stuff significantly more complicated. This creates even more problems on the level above, for example in order to get OpenStack working with userspace networking, we have to edit all flavors and rebuild all instances. And no single documentation mentions it. Shouldn't Libvirt simply detect usage of vhost-user, and build some minimal configuration with shared memory? For example, it could be memory-backend-file on /dev/mem. If the community agrees that it's a good idea, improving the usability, i can propose patches. Kind regards, Pavel Fedin Senior Engineer Samsung Electronics Research center Russia

On Thu, Feb 11, 2016 at 01:28:47PM +0300, Pavel Fedin wrote:
Hello!
vhost-user has a small limitation: guest memory must be shared. However, this simple requirement is satisfied by Libvirt only in very complicated case: 1. We have to specify NUMA configuration, because we can have "shared" attribute only for node descriptors inside "NUMA" section. 2. We have to specify huge page size, because memory-backend-file is used only in this case.
Isn't it a problem? In order to do a simple thing (use vhost-user) we have to add two more quote unobvious things, making the whole stuff significantly more complicated. This creates even more problems on the level above, for example in order to get OpenStack working with userspace networking, we have to edit all flavors and rebuild all instances. And no single documentation mentions it.
Shouldn't Libvirt simply detect usage of vhost-user, and build some minimal configuration with shared memory? For example, it could be memory-backend-file on /dev/mem. If the community agrees that it's a good idea, improving the usability, i can propose patches.
Historically QEMU had a pointless check on the path passed in, to enforce that it was only hugetlbfs, so could not just pass in a regular tmpfs file. I think we removed that in QEMU 2.5. I think it is a valid enhance <memoryBacking> to allow specification of "shared" memory backing which would be mapping to a regular tmpfs. I don't think we should magically do anything based on existance of vhost-user though - changes in way the guest memory is allocated should always require explicit user configuration. We could however report an error VIR_ERR_CONFIG_UNSUPPORTED if the user provided a vhost-user device and forgot to request shared memory, given that its an unusable combination. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Hello!
Historically QEMU had a pointless check on the path passed in, to enforce that it was only hugetlbfs, so could not just pass in a regular tmpfs file. I think we removed that in QEMU 2.5. I think it is a valid enhance <memoryBacking> to allow specification of "shared" memory backing which would be mapping to a regular tmpfs.
I don't think we should magically do anything based on existance of vhost-user though - changes in way the guest memory is allocated should always require explicit user configuration.
Ok, then would it be a good compromise if we require <memoryBacking>, and only implicitly add "shared" if we have vhost-user devices? This way we would not change the way the guest memory is allocated. IMHO being able to manually specify "shared" both in <numa> and in <memoryBacking> would be ambiguous. Kind regards, Pavel Fedin Senior Engineer Samsung Electronics Research center Russia

On Thu, Feb 11, 2016 at 01:54:49PM +0300, Pavel Fedin wrote:
Hello!
Historically QEMU had a pointless check on the path passed in, to enforce that it was only hugetlbfs, so could not just pass in a regular tmpfs file. I think we removed that in QEMU 2.5. I think it is a valid enhance <memoryBacking> to allow specification of "shared" memory backing which would be mapping to a regular tmpfs.
I don't think we should magically do anything based on existance of vhost-user though - changes in way the guest memory is allocated should always require explicit user configuration.
Ok, then would it be a good compromise if we require <memoryBacking>, and only implicitly add "shared" if we have vhost-user devices? This way we would not change the way the guest memory is allocated.
Adding shared implicitly *will* change the way guest memory is allocated, as it will have to use tmpfs to make it shared.
IMHO being able to manually specify "shared" both in <numa> and in <memoryBacking> would be ambiguous.
That's not really any different to what we have already with NUMA. The top level setting would apply as the default, and the NUMA level settings override it if needed. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Hello!
Ok, then would it be a good compromise if we require <memoryBacking>, and only implicitly add "shared" if we have vhost-user devices? This way we would not change the way the guest memory is allocated.
Adding shared implicitly *will* change the way guest memory is allocated, as it will have to use tmpfs to make it shared.
You perhaps didn't get my idea. I meant - we will still need to specify <memoryBacking> with huge pages, just no <numa>. Therefore, the memory will be allocated via file backend from hugetlbfs. Only mode will be changed implicitly (private -> shared).
IMHO being able to manually specify "shared" both in <numa> and in <memoryBacking> would be ambiguous.
That's not really any different to what we have already with NUMA. The top level setting would apply as the default, and the NUMA level settings override it if needed.
Well, the only little drawback would be necessity to add "shared" by itself. This would require additional patching to clients (e. g. openstack). Kind regards, Pavel Fedin Senior Engineer Samsung Electronics Research center Russia
participants (2)
-
Daniel P. Berrange
-
Pavel Fedin