[libvirt] Add support for QEMU file descriptor sets

Add two API calls to virCommand for generating the parameters necessary for passing to the QEMU -add-fd option, for example "set=10,fd=20", and the file descriptor set for the path= option, such as for example "/dev/fdset/10" Regards, Stefan --- src/libvirt_private.syms | 2 + src/util/vircommand.c | 65 +++++++++++++++++++++++++++++++++++++++++++---- src/util/vircommand.h | 6 ++++ 3 files changed, 68 insertions(+), 5 deletions(-) Index: libvirt/src/libvirt_private.syms =================================================================== --- libvirt.orig/src/libvirt_private.syms +++ libvirt/src/libvirt_private.syms @@ -165,6 +165,8 @@ virCommandSetPreExecHook; virCommandSetWorkingDirectory; virCommandToString; virCommandTransferFD; +virCommandTransferFDGetDevSet; +virCommandTransferFDGetFDSet; virCommandWait; virCommandWriteArgLog; virFork; Index: libvirt/src/util/vircommand.c =================================================================== --- libvirt.orig/src/util/vircommand.c +++ libvirt/src/util/vircommand.c @@ -109,7 +109,7 @@ struct _virCommand { * Returns true if @set contains @fd, * false otherwise. */ -static bool +static int virCommandFDIsSet(int fd, const int *set, int set_size) @@ -118,9 +118,9 @@ virCommandFDIsSet(int fd, while (i < set_size) if (set[i++] == fd) - return true; + return i--; - return false; + return -1; } /* @@ -145,7 +145,7 @@ virCommandFDSet(int fd, if (fd < 0 || !set || !set_size) return -1; - if (virCommandFDIsSet(fd, *set, *set_size)) + if (virCommandFDIsSet(fd, *set, *set_size) >= 0) return 0; if (VIR_REALLOC_N(*set, *set_size + 1) < 0) { @@ -523,7 +523,7 @@ virExecWithHook(const char *const*argv, for (i = 3; i < openmax; i++) { if (i == infd || i == childout || i == childerr) continue; - if (!keepfd || !virCommandFDIsSet(i, keepfd, keepfd_size)) { + if (!keepfd || virCommandFDIsSet(i, keepfd, keepfd_size) < 0) { tmpfd = i; VIR_MASS_CLOSE(tmpfd); } else if (virSetInherit(i, true) < 0) { @@ -896,6 +896,61 @@ virCommandTransferFD(virCommandPtr cmd, VIR_FORCE_CLOSE(fd); } +/** + * virCommandTransferFDGetFDSet: + * @cmd: the command to modify + * @fd: fd to reassign to the child + * + * Get the parameters for the the QEMU -add-fd command line option + * for the given file descriptor. The file descriptor must previously + * have been 'transferred' in a virCommandTransfer() call. + * This function for example returns "set=10,fd=20" for file descriptor 20. + */ +char * +virCommandTransferFDGetFDSet(virCommandPtr cmd, int fd) +{ + char *result = NULL; + int idx = virCommandFDIsSet(fd, cmd->transfer, cmd->transfer_size); + + if (idx >= 0) { + if (virAsprintf(&result, "set=%d,fd=%d", idx, fd) < 0) { + virReportOOMError(); + } + } else { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("file descriptor %d has not been transferred"), fd); + } + + return result; +} + +/** + * virCommandTransferFDGetDevSet: + * @cmd: the command to modify + * @fd: fd to reassign to the child + * + * Get the parameters for the the QEMU path= parameter where a file + * descriptor is accessed via a file descriptor set, for example + * /dev/fdset/10. The file descriptor must previously have been + * 'transferred' in a virCommandTransfer() call. + */ +char * +virCommandTransferFDGetDevSet(virCommandPtr cmd, int fd) +{ + char *result = NULL; + int idx = virCommandFDIsSet(fd, cmd->transfer, cmd->transfer_size); + + if (idx >= 0) { + if (virAsprintf(&result, "/dev/fdset/%d", idx) < 0) { + virReportOOMError(); + } + } else { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("file descriptor %d has not been transferred"), fd); + } + return result; +} + /** * virCommandSetPidFile: Index: libvirt/src/util/vircommand.h =================================================================== --- libvirt.orig/src/util/vircommand.h +++ libvirt/src/util/vircommand.h @@ -58,6 +58,12 @@ void virCommandPreserveFD(virCommandPtr void virCommandTransferFD(virCommandPtr cmd, int fd); +char *virCommandTransferFDGetFDSet(virCommandPtr cmd, + int fd); + +char *virCommandTransferFDGetDevSet(virCommandPtr cmd, + int fd); + void virCommandSetPidFile(virCommandPtr cmd, const char *pidfile) ATTRIBUTE_NONNULL(2);

On Tue, Jan 22, 2013 at 01:09:50PM -0500, Stefan Berger wrote:
Add two API calls to virCommand for generating the parameters necessary for passing to the QEMU -add-fd option, for example "set=10,fd=20", and the file descriptor set for the path= option, such as for example "/dev/fdset/10"
Regards, Stefan
--- src/libvirt_private.syms | 2 + src/util/vircommand.c | 65 +++++++++++++++++++++++++++++++++++++++++++---- src/util/vircommand.h | 6 ++++ 3 files changed, 68 insertions(+), 5 deletions(-)
This is the wrong place to be adding QEMU specific APIs - it should be in qemu_command.{c,h} Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 01/22/2013 01:29 PM, Daniel P. Berrange wrote:
On Tue, Jan 22, 2013 at 01:09:50PM -0500, Stefan Berger wrote:
Add two API calls to virCommand for generating the parameters necessary for passing to the QEMU -add-fd option, for example "set=10,fd=20", and the file descriptor set for the path= option, such as for example "/dev/fdset/10"
Regards, Stefan
--- src/libvirt_private.syms | 2 + src/util/vircommand.c | 65 +++++++++++++++++++++++++++++++++++++++++++---- src/util/vircommand.h | 6 ++++ 3 files changed, 68 insertions(+), 5 deletions(-) This is the wrong place to be adding QEMU specific APIs - it should be in qemu_command.{c,h}
Assuming I move the code into qemu_command.c, are the function calls ok? Stefan

On 01/22/2013 12:07 PM, Stefan Berger wrote:
On 01/22/2013 01:29 PM, Daniel P. Berrange wrote:
On Tue, Jan 22, 2013 at 01:09:50PM -0500, Stefan Berger wrote:
Add two API calls to virCommand for generating the parameters necessary for passing to the QEMU -add-fd option, for example "set=10,fd=20", and the file descriptor set for the path= option, such as for example "/dev/fdset/10"
Regards, Stefan
--- src/libvirt_private.syms | 2 + src/util/vircommand.c | 65 +++++++++++++++++++++++++++++++++++++++++++---- src/util/vircommand.h | 6 ++++ 3 files changed, 68 insertions(+), 5 deletions(-) This is the wrong place to be adding QEMU specific APIs - it should be in qemu_command.{c,h}
Assuming I move the code into qemu_command.c, are the function calls ok?
Thanks for starting on this; it is something I have also been planning to work on. First, we need more than just moving it into qemu_command.c; we also need a qemu_capabilities change to add a capability to know when the feature is available. Or even two: 'add-fd' monitor command was added in 1.2, but -add-fd command line option wasn't until 1.3. Then again, maybe it's just simpler to state that if you are targeting qemu 1.2, there is nothing worth using the new add-fd QMP command for that cannot already be done with older fallbacks, and it is only 1.3 and later where the new fd passing becomes essential. Then, based on the capability being present or absent, any code in qemu_command that wants to pass an fd to qemu needs to call into a helper routine that will either use the new -add-fd argument, fall back to older ad-hoc processing, or error out if there is no fallback for that particular usage. Then we also need to use the QMP commands to update the fdsets after qemu is started; depending on what the fd is used for, we can close the fd once qemu has dup'd it into internal use, so that qemu doesn't hang on to the fd that libvirt passed in forever even when it's internal use is hot-unplugged. We also need to figure out how to use the QMP commands during hotplug operations, instead of the command line additions at qemu startup. And there's still the nagging issue that even as of qemu 1.4, there is still no way to specify block device backing chains via fds; so even if we can pass in an fd for some use cases (like TPM), we are still waiting for qemu to give us a way to do it in all use cases (like NFS security for qcow2 backing chains across snapshots, block copy, and block commit). But one thing at a time. How much of this work are you planning on attempting, and how much do you need me to do? -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 01/25/2013 06:35 PM, Eric Blake wrote:
On 01/22/2013 12:07 PM, Stefan Berger wrote:
On 01/22/2013 01:29 PM, Daniel P. Berrange wrote:
On Tue, Jan 22, 2013 at 01:09:50PM -0500, Stefan Berger wrote: Thanks for starting on this; it is something I have also been planning to work on.
Glad to be able to help. When doing this my main concern was getting my stack of TPM patches for libvirt going again to be able to test, so the approach may have been a lot more 'naive' than your goal of supporting the new functionality as a replacement for all existing file descriptors. Considering that TPM would appear after -add-fd appeared, things are a bit easier...
First, we need more than just moving it into qemu_command.c; we also
I did this in the meantime.
need a qemu_capabilities change to add a capability to know when the feature is available. Or even two: 'add-fd' monitor command was added
A new capability for the command line support should be fairly easy.
in 1.2, but -add-fd command line option wasn't until 1.3. Then again, maybe it's just simpler to state that if you are targeting qemu 1.2, there is nothing worth using the new add-fd QMP command for that cannot already be done with older fallbacks, and it is only 1.3 and later where the new fd passing becomes essential.
Then, based on the capability being present or absent, any code in qemu_command that wants to pass an fd to qemu needs to call into a helper routine that will either use the new -add-fd argument, fall back to older ad-hoc processing, or error out if there is no fallback for that particular usage.
Let's take this command line parameter here as an example. tap,fd=21,id=hostnet1,vhost=on,vhostfd=23 We could write a function where we pass "fd" and the '21' as parameters and it creates "fd=21", which we then append. We could do this as one patch with all of them, so also with "vhostfd" and '23'. Not many changes here, just one more VIR_FREE() call needed for each file descriptor string. In a subsequent patch we now test inside that function whether the new command line parameter is available using the capability (from another patch) and create fd=/dev/set/10 and vhostfd=/dev/set/20 respectively *and* already add "-add-fd set=10,fd=21" and "-add-fd set=20,fd=23" to virCommand using the functions I already posted (as the lowest layer). Would it be that simple, or do we need to add more parameters to -add-fd set=10,fd=21,xyz under certain circumstances?
Then we also need to use the QMP commands to update the fdsets after qemu is started; depending on what the fd is used for, we can close the fd once qemu has dup'd it into internal use, so that qemu doesn't hang on to the fd that libvirt passed in forever even when it's internal use is hot-unplugged.
Hm, isn't that already solved by the virCommandTransferFD command plus subsequent closing of the fd post forking? I thought the -add-fd way of passing fd's was a straight forward replacement for example for the command line above, but maybe there's more to it... I quickly checked the qemu_command.c sources and it looks like all fd's used (searched by 'fd=') are virCommandTransferFD'ed except for 'ioeventfd=%s' which looks like it's Type-to-string conversion.
We also need to figure out how to use the QMP commands during hotplug operations, instead of the command line additions at qemu startup.
And there's still the nagging issue that even as of qemu 1.4, there is still no way to specify block device backing chains via fds; so even if we can pass in an fd for some use cases (like TPM), we are still waiting for qemu to give us a way to do it in all use cases (like NFS security for qcow2 backing chains across snapshots, block copy, and block commit). But one thing at a time.
How much of this work are you planning on attempting, and how much do you need me to do?
Initially my plan was to only support TPM (which doesn't hotplug for example) ... :-/ Stefan

On 01/26/2013 08:44 PM, Stefan Berger wrote:
On 01/25/2013 06:35 PM, Eric Blake wrote:
On 01/22/2013 12:07 PM, Stefan Berger wrote:
On 01/22/2013 01:29 PM, Daniel P. Berrange wrote:
On Tue, Jan 22, 2013 at 01:09:50PM -0500, Stefan Berger wrote: Thanks for starting on this; it is something I have also been planning to work on.
Glad to be able to help. When doing this my main concern was getting my stack of TPM patches for libvirt going again to be able to test, so the approach may have been a lot more 'naive' than your goal of supporting the new functionality as a replacement for all existing file descriptors. Considering that TPM would appear after -add-fd appeared, things are a bit easier...
First, we need more than just moving it into qemu_command.c; we also
I did this in the meantime.
need a qemu_capabilities change to add a capability to know when the feature is available. Or even two: 'add-fd' monitor command was added
A new capability for the command line support should be fairly easy.
in 1.2, but -add-fd command line option wasn't until 1.3. Then again, maybe it's just simpler to state that if you are targeting qemu 1.2, there is nothing worth using the new add-fd QMP command for that cannot already be done with older fallbacks, and it is only 1.3 and later where the new fd passing becomes essential.
Then, based on the capability being present or absent, any code in qemu_command that wants to pass an fd to qemu needs to call into a helper routine that will either use the new -add-fd argument, fall back to older ad-hoc processing, or error out if there is no fallback for that particular usage.
Let's take this command line parameter here as an example.
tap,fd=21,id=hostnet1,vhost=on,vhostfd=23
We could write a function where we pass "fd" and the '21' as parameters and it creates "fd=21", which we then append. We could do this as one patch with all of them, so also with "vhostfd" and '23'. Not many changes here, just one more VIR_FREE() call needed for each file descriptor string.
In a subsequent patch we now test inside that function whether the new command line parameter is available using the capability (from another patch) and create fd=/dev/set/10 and vhostfd=/dev/set/20 respectively *and* already add "-add-fd set=10,fd=21" and "-add-fd set=20,fd=23" to virCommand using the functions I already posted (as the lowest layer).
Would it be that simple, or do we need to add more parameters to -add-fd set=10,fd=21,xyz under certain circumstances?
Ok, so likely this was a bad example and fd= does not need to be replaced with the fd sets. There are five occurrences of path= in the code, which could be handled similarly as described above with an additional open() and virCommandTransferFD(). My guess is that you want to eliminate the syscall open() usage within QEMU. Is that right ? Regards, Stefan

On 01/26/2013 09:17 PM, Stefan Berger wrote:
In a subsequent patch we now test inside that function whether the new command line parameter is available using the capability (from another patch) and create fd=/dev/set/10 and vhostfd=/dev/set/20 respectively *and* already add "-add-fd set=10,fd=21" and "-add-fd set=20,fd=23" to virCommand using the functions I already posted (as the lowest layer).
Would it be that simple, or do we need to add more parameters to -add-fd set=10,fd=21,xyz under certain circumstances?
Ideally, we'd also want to use the opaque parameter of each -add-fd call, so that when we later use QMP commands to query what fdsets exist, the opaque parameter can tell us what filenames we passed in (it makes bookkeeping easier if you know that fd 5 was tied to /dev/tpm, compared to just knowing that it is an open fd but not what it came from). Then, if I remember right, qemu's add-fd commands are set up so that every time code in qemu calls qemu_open(), it dup's the original fd; so even if we use -add-fd on the command line, qemu doesn't directly read() and write() the fd we passed, but its own copy. Depending on the type of fd (such as passing the write end of a pipe, where libvirt keeps the read end, but the read end won't see an EOF until _all_ copies of the write end have been closed), that means we need to tell qemu to close the original once all copies are in place. So even if we don't need to support hotplug right away, we _do_ need to support at least the remove-fd QMP command after 'qemu -S -add-fd...' has exec'd, but before issuing the 'cont' QMP command to run the guest.
Ok, so likely this was a bad example and fd= does not need to be replaced with the fd sets. There are five occurrences of path= in the code, which could be handled similarly as described above with an additional open() and virCommandTransferFD(). My guess is that you want to eliminate the syscall open() usage within QEMU. Is that right ?
Yes, at least one end goal of using -add-fd on the command line (and 'add-fd' over the monitor during hotplug) is that we can use SELinux to prevent qemu from calling any open(). This in turn means that qemu can use _only_ file descriptors that have been handed in by the management app, and are therefore securely labeled. This comes into play when using NFS: there is no way to apply SELinux labels to NFS files (well, NFS 4 might eventually have a solution, but it's not there yet; and lots of people are still on NFS 3). Therefore, if you use NFS storage (which is surprisingly common when setting up shared storage for migration purposes), you have to enable the 'virt_use_nfs' SELinux bool, but this means that SELinux no longer prevents qemu from opening _any_ file residing on NFS, and thus a compromised guest could read or even write disk images associated with unrelated guests. But if qemu is forced to use only the fds that it inherits from management, then we won't need the virt_use_nfs sledgehammer, or its loss of security when enabled. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Mon, Jan 28, 2013 at 02:54:28PM -0700, Eric Blake wrote:
On 01/26/2013 09:17 PM, Stefan Berger wrote:
In a subsequent patch we now test inside that function whether the new command line parameter is available using the capability (from another patch) and create fd=/dev/set/10 and vhostfd=/dev/set/20 respectively *and* already add "-add-fd set=10,fd=21" and "-add-fd set=20,fd=23" to virCommand using the functions I already posted (as the lowest layer).
Would it be that simple, or do we need to add more parameters to -add-fd set=10,fd=21,xyz under certain circumstances?
Ideally, we'd also want to use the opaque parameter of each -add-fd call, so that when we later use QMP commands to query what fdsets exist, the opaque parameter can tell us what filenames we passed in (it makes bookkeeping easier if you know that fd 5 was tied to /dev/tpm, compared to just knowing that it is an open fd but not what it came from).
That implies that you trust the output from QEMU monitor commands to be telling you the truth. We can't do that, so IMHO we can't rely on the 'opaque' parameter. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 01/29/2013 03:23 AM, Daniel P. Berrange wrote:
On Mon, Jan 28, 2013 at 02:54:28PM -0700, Eric Blake wrote:
On 01/26/2013 09:17 PM, Stefan Berger wrote:
In a subsequent patch we now test inside that function whether the new command line parameter is available using the capability (from another patch) and create fd=/dev/set/10 and vhostfd=/dev/set/20 respectively *and* already add "-add-fd set=10,fd=21" and "-add-fd set=20,fd=23" to virCommand using the functions I already posted (as the lowest layer).
Would it be that simple, or do we need to add more parameters to -add-fd set=10,fd=21,xyz under certain circumstances?
Ideally, we'd also want to use the opaque parameter of each -add-fd call, so that when we later use QMP commands to query what fdsets exist, the opaque parameter can tell us what filenames we passed in (it makes bookkeeping easier if you know that fd 5 was tied to /dev/tpm, compared to just knowing that it is an open fd but not what it came from).
That implies that you trust the output from QEMU monitor commands to be telling you the truth. We can't do that, so IMHO we can't rely on the 'opaque' parameter.
Fair enough - libvirt will have to track in vm->priv anything that it needs for reconstructing state, even across libvirtd restarts. Still, we _ought_ to populate the opaque member with useful information, so that debugging is easier (that is, while libvirt can't rely on query-fdsets, a developer using 'virsh qemu-monitor-command' on a trusted guest certainly should be able to). -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (3)
-
Daniel P. Berrange
-
Eric Blake
-
Stefan Berger