[libvirt] [RFC]: Secure migration

All, I've been looking at implementing secure migration for libvirt. I've taken a look at two approaches; I'll outline them, and then tell you which one I think we should do, and why. Note that in the discussion below, I'm talking mostly about the qemu migration protocol; I believe both of the approaches will lend themselves to, say, Xen, but the details will be different. For the sake of the outlines, I'm using three specific terms: "src" is used to refer to the source of the migration, that is, where the VM is running *before* migration begins; "dst" is used to refer to the destination of the migration, that is, where we want the VM to be at the end; and "controller" refers to the controlling program (usually virsh), which can be run on an entirely separate machine. -------------------------------------------------------------------------------- 1) virsh on the controller connects to the src, and initiates the migration command. In turn, this causes the controller to also connect to the dst. Now, during the "Prepare" step on the dst, we setup a qemu container to listen to some port on localhost. Next, the "Perform" step is called on the src machine. This causes an external program to be forked on the src; this external program connects to the dst (using virConnectOpen), then waits for migration data from the qemu instance. As the migration data comes in, it uses a new RPC call (something like "writeMigrationData") to pass the data to the dst. The dst then takes that data, and writes it out to the waiting qemu container. Pros: Uses existing RPC mechanism, so you don't have to open a new port on the destination Cons: There is a "hidden" dependency between the src and dst, which may be difficult for users to get right (more on this below). 2) virsh on the controller connects to the src, and initiates the migration command. In turn, this causes the controller to also connect to the dst. Now, during the "Prepare" step on the dst, we setup a qemu container to listen to some port (call it 1234) on localhost. It also forks an external program (or a thread) to listen for an incoming gnutls connection. Next, the "Perform" step is call on the src machine. This forks an external program (or thread) to listen for incoming data from a localhost migration, do the gnutls handshake with the dst, and dump the data over the gnutls connection to the dst. Pros: Works mostly transparently; the user doesn't have to set anything up different than they do today for "unsecure" migration Cons: Requires opening a new (but still well-known) port in the firewall on the dst side. -------------------------------------------------------------------------------- More about the "hidden" dependencies for 1). There are quite a few examples of where the controller might have access to both machines, but the src libvirtd does not. A few examples below: a) libvirtd is using TLS + the "tls_allowed_dn_list" in /etc/libvirt/libvirtd.conf on the dst machine. They have it configured so that they can access the machine via TLS on the controller machine, but not from the src machine. b) libvirtd is using SASL digest-md5 on the dst machine. When the src machine tries to connect, it needs a name and password to do so, which it doesn't have. (technically, I guess we could proxy that response back to the controller, and have the user fill it there, and then return back to the src, and then the dst, but it seems roundabout) c) libvirtd is using SASL gssapi on the dst machine. When the src machine tries to connect to the dst, it needs to have the right configuration (i.e. /etc/krb5.conf and /etc/sasl2/libvirt.conf need to work), and it also has to get some kind of principal from the kerberos server (but which principal?). Because of the hidden dependency problem, I think solution 2) is actually more viable; yes, it also has a dependency (opening a hole in the firewall), but that can be documented and will work no matter what other authentication you are doing between the controller and src and dst machines. However, I am open to being convinced otherwise. Thoughts? -- Chris Lalancette

From: libvir-list-bounces@redhat.com [mailto:libvir-list- bounces@redhat.com] On Behalf Of Chris Lalancette ... 2) virsh on the controller connects to the src, and initiates the migration command. In turn, this causes the controller to also connect to the dst. Now, during the "Prepare" step on the dst, we setup a qemu container to listen to some port (call it 1234) on localhost. It also forks an external program (or a thread) to listen for an incoming gnutls connection. Next, the "Perform" step is call on the src machine. This forks an external program (or thread) to listen for incoming data from a localhost migration, do the gnutls handshake with the dst, and dump the data over the gnutls connection to the dst. [IH] how is the connection secured? Do you assume both hosts share Kerberos/certificates trust? Does the controller pass a shared encryption key to both parties? (I also like this approach better, since it keeps the existing qemu migration, which is hard enough to stabilize)

Itamar Heim wrote:
From: libvir-list-bounces@redhat.com [mailto:libvir-list- bounces@redhat.com] On Behalf Of Chris Lalancette ... 2) virsh on the controller connects to the src, and initiates the migration command. In turn, this causes the controller to also connect to the dst. Now, during the "Prepare" step on the dst, we setup a qemu container to listen to some port (call it 1234) on localhost. It also forks an external program (or a thread) to listen for an incoming gnutls connection. Next, the "Perform" step is call on the src machine. This forks an external program (or thread) to listen for incoming data from a localhost migration, do the gnutls handshake with the dst, and dump the data over the gnutls connection to the dst. [IH] how is the connection secured? Do you assume both hosts share Kerberos/certificates trust? Does the controller pass a shared encryption key to both parties? (I also like this approach better, since it keeps the existing qemu migration, which is hard enough to stabilize)
Yes, in this case, the controller is passing a shared encryption key, which both sides will use to do the encryption and "prove" that they are the intended recipients. Note that this requires the user to have secure channels to each of the remote libvirtd's (otherwise it would be trivial to sniff the encryption key as we pass it to them), but that's pretty much a requirement in any production situation anyway. -- Chris Lalancette

On Tue, Mar 03, 2009 at 01:46:25PM +0100, Chris Lalancette wrote:
-------------------------------------------------------------------------------- 1) virsh on the controller connects to the src, and initiates the migration command. In turn, this causes the controller to also connect to the dst. Now, during the "Prepare" step on the dst, we setup a qemu container to listen to some port on localhost. Next, the "Perform" step is called on the src machine. This causes an external program to be forked on the src; this external program connects to the dst (using virConnectOpen), then waits for migration data from the qemu instance. As the migration data comes in, it uses a new RPC call (something like "writeMigrationData") to pass the data to the dst. The dst then takes that data, and writes it out to the waiting qemu container.
Pros: Uses existing RPC mechanism, so you don't have to open a new port on the destination
Cons: There is a "hidden" dependency between the src and dst, which may be difficult for users to get right (more on this below).
2) virsh on the controller connects to the src, and initiates the migration command. In turn, this causes the controller to also connect to the dst. Now, during the "Prepare" step on the dst, we setup a qemu container to listen to some port (call it 1234) on localhost. It also forks an external program (or a thread) to listen for an incoming gnutls connection. Next, the "Perform" step is call on the src machine. This forks an external program (or thread) to listen for incoming data from a localhost migration, do the gnutls handshake with the dst, and dump the data over the gnutls connection to the dst.
Pros: Works mostly transparently; the user doesn't have to set anything up different than they do today for "unsecure" migration
Cons: Requires opening a new (but still well-known) port in the firewall on the dst side. --------------------------------------------------------------------------------
I'm having a little trouble understanding the finer differences between these two architectures, but the way I imagined is like this - libvirtd on source has a unix domain / localhost TCP socket for QEMU migrations - qemu on src does a 'migrate to unix domain socket' - libvirtd on source connects to libvirtd on dest using normal RPC protocol (secured with whatever the admin's configured) - libvirtd on dest spawns qemu and lets it restore from stdio which is fed from the data it gets over the RPC layer +---------source----- ---+ +------------dest-----------+ | | | | qemu ---------> libvirtd --------------> libvirtd -------------> qmeu UNIX socket TCP libvirt RPC -incoming stdio I don't believe there is need to spawn any external programs here really (well, perhaps "-incoming cat' for new QEMU lacking the 'stdio' protocol)
a) libvirtd is using TLS + the "tls_allowed_dn_list" in /etc/libvirt/libvirtd.conf on the dst machine. They have it configured so that they can access the machine via TLS on the controller machine, but not from the src machine.
That's not a real problem - that's just a configuration setup thing - the admin already needs to correctly managed this setup for API usage, so its no harder to get it right for migration
b) libvirtd is using SASL digest-md5 on the dst machine. When the src machine tries to connect, it needs a name and password to do so, which it doesn't have. (technically, I guess we could proxy that response back to the controller, and have the user fill it there, and then return back to the src, and then the dst, but it seems roundabout)
There could be a dedicated "migration" user account and password configured in the config for /etc/libvirtd/ that is used for migration only. You don't particularly want to relay the credentials via the virsh client, because that potentially gives them credentials / authorization capabilities you dont want them to have.
c) libvirtd is using SASL gssapi on the dst machine. When the src machine tries to connect to the dst, it needs to have the right configuration (i.e. /etc/krb5.conf and /etc/sasl2/libvirt.conf need to work), and it also has to get some kind of principal from the kerberos server (but which principal?).
Each server has a kerberos principal, so I'd expect they can directly auth to each other, assuming suitable ACL configs.
Because of the hidden dependency problem, I think solution 2) is actually more viable; yes, it also has a dependency (opening a hole in the firewall), but that can be documented and will work no matter what other authentication you are doing between the controller and src and dst machines. However, I am open to being convinced otherwise. Thoughts?
These are all just minor auth credentials/acl config tasks that the admin has to deal with for normal remote usage already, so I don't see that they present a particular problem for migration There is also the possibility of plugging in a one-time-key auth credential for migrations, where the dest passes the neccessary key to the source libvirtd, via the client app initiating the migration - this is what the 'cookie' parameter in Prepare/Perform was there for. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

c) libvirtd is using SASL gssapi on the dst machine. When the src machine tries to connect to the dst, it needs to have the right configuration (i.e. /etc/krb5.conf and /etc/sasl2/libvirt.conf need to work), and it also has to get some kind of principal from the kerberos server (but which
From: libvir-list-bounces@redhat.com [mailto:libvir-list- bounces@redhat.com] On Behalf Of Daniel P. Berrange ... principal?).
Each server has a kerberos principal, so I'd expect they can directly auth to each other, assuming suitable ACL configs. [IH] you are assuming anyone using Libvirt will be using Kerberos?

On Tue, Mar 03, 2009 at 12:52:41PM -0500, Itamar Heim wrote:
c) libvirtd is using SASL gssapi on the dst machine. When the src machine tries to connect to the dst, it needs to have the right configuration (i.e. /etc/krb5.conf and /etc/sasl2/libvirt.conf need to work), and it also has to get some kind of principal from the kerberos server (but which
From: libvir-list-bounces@redhat.com [mailto:libvir-list- bounces@redhat.com] On Behalf Of Daniel P. Berrange ... principal?).
Each server has a kerberos principal, so I'd expect they can directly auth to each other, assuming suitable ACL configs. [IH] you are assuming anyone using Libvirt will be using Kerberos?
No, libvirt supports any SASL auth method - I was just refering to Chris' question about gssapi credentials Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Daniel P. Berrange wrote:
On Tue, Mar 03, 2009 at 01:46:25PM +0100, Chris Lalancette wrote:
-------------------------------------------------------------------------------- 1) virsh on the controller connects to the src, and initiates the migration command. In turn, this causes the controller to also connect to the dst. Now, during the "Prepare" step on the dst, we setup a qemu container to listen to some port on localhost. Next, the "Perform" step is called on the src machine. This causes an external program to be forked on the src; this external program connects to the dst (using virConnectOpen), then waits for migration data from the qemu instance. As the migration data comes in, it uses a new RPC call (something like "writeMigrationData") to pass the data to the dst. The dst then takes that data, and writes it out to the waiting qemu container.
Pros: Uses existing RPC mechanism, so you don't have to open a new port on the destination
Cons: There is a "hidden" dependency between the src and dst, which may be difficult for users to get right (more on this below).
2) virsh on the controller connects to the src, and initiates the migration command. In turn, this causes the controller to also connect to the dst. Now, during the "Prepare" step on the dst, we setup a qemu container to listen to some port (call it 1234) on localhost. It also forks an external program (or a thread) to listen for an incoming gnutls connection. Next, the "Perform" step is call on the src machine. This forks an external program (or thread) to listen for incoming data from a localhost migration, do the gnutls handshake with the dst, and dump the data over the gnutls connection to the dst.
Pros: Works mostly transparently; the user doesn't have to set anything up different than they do today for "unsecure" migration
Cons: Requires opening a new (but still well-known) port in the firewall on the dst side. --------------------------------------------------------------------------------
I'm having a little trouble understanding the finer differences between these two architectures, but the way I imagined is like this
- libvirtd on source has a unix domain / localhost TCP socket for QEMU migrations - qemu on src does a 'migrate to unix domain socket' - libvirtd on source connects to libvirtd on dest using normal RPC protocol (secured with whatever the admin's configured) - libvirtd on dest spawns qemu and lets it restore from stdio which is fed from the data it gets over the RPC layer
+---------source----- ---+ +------------dest-----------+ | | | | qemu ---------> libvirtd --------------> libvirtd -------------> qmeu UNIX socket TCP libvirt RPC -incoming stdio
I don't believe there is need to spawn any external programs here really (well, perhaps "-incoming cat' for new QEMU lacking the 'stdio' protocol)
Yes, this setup is what my option 1 above is. Option 2 is very similar, except we use a separate channel not hooked up with the normal RPC mechanism to transfer the data.
a) libvirtd is using TLS + the "tls_allowed_dn_list" in /etc/libvirt/libvirtd.conf on the dst machine. They have it configured so that they can access the machine via TLS on the controller machine, but not from the src machine.
That's not a real problem - that's just a configuration setup thing - the admin already needs to correctly managed this setup for API usage, so its no harder to get it right for migration
b) libvirtd is using SASL digest-md5 on the dst machine. When the src machine tries to connect, it needs a name and password to do so, which it doesn't have. (technically, I guess we could proxy that response back to the controller, and have the user fill it there, and then return back to the src, and then the dst, but it seems roundabout)
There could be a dedicated "migration" user account and password configured in the config for /etc/libvirtd/ that is used for migration only. You don't particularly want to relay the credentials via the virsh client, because that potentially gives them credentials / authorization capabilities you dont want them to have.
c) libvirtd is using SASL gssapi on the dst machine. When the src machine tries to connect to the dst, it needs to have the right configuration (i.e. /etc/krb5.conf and /etc/sasl2/libvirt.conf need to work), and it also has to get some kind of principal from the kerberos server (but which principal?).
Each server has a kerberos principal, so I'd expect they can directly auth to each other, assuming suitable ACL configs.
Because of the hidden dependency problem, I think solution 2) is actually more viable; yes, it also has a dependency (opening a hole in the firewall), but that can be documented and will work no matter what other authentication you are doing between the controller and src and dst machines. However, I am open to being convinced otherwise. Thoughts?
These are all just minor auth credentials/acl config tasks that the admin has to deal with for normal remote usage already, so I don't see that they present a particular problem for migration
Yes, they are certainly all solvable from the admin's point-of-view, so they are not show stoppers. The thing is that I think admins will have a difficult time discovering what the problems are when migration doesn't work for them. There are just so many combinations that it's very easy for the admin to get one of them wrong, and then it may be difficult to figure out exactly what they need to do to get it working. On the other hand, having a dedicated channel makes it relatively easy; if the admin is having problems, then the answer is going to be "open port XYZ on the destination", and that will usually solve the problem. -- Chris Lalancette

On Wed, Mar 04, 2009 at 11:03:17AM +0100, Chris Lalancette wrote:
These are all just minor auth credentials/acl config tasks that the admin has to deal with for normal remote usage already, so I don't see that they present a particular problem for migration
Yes, they are certainly all solvable from the admin's point-of-view, so they are not show stoppers. The thing is that I think admins will have a difficult time discovering what the problems are when migration doesn't work for them. There are just so many combinations that it's very easy for the admin to get one of them wrong, and then it may be difficult to figure out exactly what they need to do to get it working. On the other hand, having a dedicated channel makes it relatively easy; if the admin is having problems, then the answer is going to be "open port XYZ on the destination", and that will usually solve the problem.
From my POV, I think getting the auth fixed is a matter of installing proper files on a machine and of the responsability of the sysadmins basically and purely within their realm. On the other hand opening a new port is a decision involving network admins and security, it's not the same scope within a company with strict policies. I would really stay with the existing RPC model and avoid the requirement of adding a new open port, from a pure sysadmin "upgrade" perspective this can turn into a nightmare, Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Thu, Mar 05, 2009 at 09:58:46AM +0100, Daniel Veillard wrote:
On Wed, Mar 04, 2009 at 11:03:17AM +0100, Chris Lalancette wrote:
These are all just minor auth credentials/acl config tasks that the admin has to deal with for normal remote usage already, so I don't see that they present a particular problem for migration
Yes, they are certainly all solvable from the admin's point-of-view, so they are not show stoppers. The thing is that I think admins will have a difficult time discovering what the problems are when migration doesn't work for them. There are just so many combinations that it's very easy for the admin to get one of them wrong, and then it may be difficult to figure out exactly what they need to do to get it working. On the other hand, having a dedicated channel makes it relatively easy; if the admin is having problems, then the answer is going to be "open port XYZ on the destination", and that will usually solve the problem.
From my POV, I think getting the auth fixed is a matter of installing proper files on a machine and of the responsability of the sysadmins basically and purely within their realm. On the other hand opening a new port is a decision involving network admins and security, it's not the same scope within a company with strict policies. I would really stay with the existing RPC model and avoid the requirement of adding a new open port, from a pure sysadmin "upgrade" perspective this can turn into a nightmare,
We've discussed this further on IRC and decided that if we need to get a better authentication system for migration, then we should extend the server RPC auth call to return a choice of multiple auth schemes. So, for example, we could allow normal virsh cients to run SASL/TLS, and for migration to run a one-time-key. The REMOTE_AUTH_LIST rpc command already allows for this struct remote_auth_list_ret { remote_auth_type types<REMOTE_AUTH_TYPE_LIST_MAX>; }; we currently just always return a 1 element list. We can easily add more auth options to the list without breaking existing clients too. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
participants (4)
-
Chris Lalancette
-
Daniel P. Berrange
-
Daniel Veillard
-
Itamar Heim