[Libvir] PATCH: RFC supporting SASL authentication

History: we currently provide two TCP sockets, one clear text, no auth, the other TLS with x509 client certificate verification for auth. We also provide a UNIX domain socket relying on file perms to restrict access. I have previously provided a policykit patch for the latter allowing admin defined auth policy for UNIX socket access. The PolicyKit patch was flawed in that it did not provide a way for a client app to determine whether the UNIX socket needed PolicyKit auth ahead of time or not. This required apps to make assumptions prior to connecting which is not really viable. This mail is working towards a more flexible authentication solution, and goes straight for the big picture by integrating SASL authentication. This gives us integration with Kerberos (GSSAPI) and PAM and whatever the hell else SAS supports[1]. The critical decision in all this is the wire protocol & how to adapt it to slot in authentication in a way that is compatible with existing clients. As it stands the decision to use clear data vs TLS on the wire is based on the socket the client connects to - we do a TLS handshake at the very start. We need to be able to support SASL on all sockets, TCP, TLS & UNIX since they can all benefit from stuff like Kerberos. So I decided we need to integrate at the RPC layer for this to work. The general protocol picture ---------------------------- It helps to look at qemud/remote_protocol.x when reading this next ... Every API basically has an RPC call & reply pair. The reply messages have a status field, either 'REMOTE_OK' or 'REMOTE_ERROR'. In the former case the normal API return values follow on the wire. In the latter case a virErrroPtr object is serialized on the wire. For this patch I decided to add a 3rd code, REMOTE_AUTH. If an application tries to make an RPC call on a socket requiring authentication, and has not yet authenticated the server will return REMOTE_AUTH code. It then also returns an int (remote_auth_type) specifying the authentication method to use. I have defined two: enum remote_auth_type { REMOTE_AUTH_NONE = 0, REMOTE_AUTH_SASL = 1 }; With plans to add REMOTE_AUTH_POLKIT in a future patch. A legacy client getting back REMOTE_AUTH code will just quit the connection attempt since they don't support authentication. If the admin so desires they can still provide the TLS socket in a non-authenticated mode and only turn on SASL for the TCP socket. So the decision about whether to enable legacy clients is admin controlled. This is the best we can do. A new client getting back REMOTE_AUTH code will then read the remote_auth_type off the wire. If the requested type is one that the client supports then it can begin the authentication process, otherwise we virRaiseError and stop connecting. The SASL specific picture ------------------------- For the SASL auth the process involves a multi-phase handshake looking something like: client server 1. -> ask for mechanism list -> new ctx 2. new ctx <- list of mechanisms <- 3. start auth -> initial auth data -> start auth 4. step auth <- reply auth data <- 5. -> step auth data -> step auth goto 4. <- reply auth data <- Authentication can complete at step 4 in this process, or steps 4 & 5 can repeat an arbitrary number of times. So, to implement this if it neccessary to define 3 new RPC calls internal to the remote driver/daemon (aka not exposed to public API like the rest of the RPC calls). These are: REMOTE_PROC_AUTH_SASL_INIT = 66, REMOTE_PROC_AUTH_SASL_START = 67, REMOTE_PROC_AUTH_SASL_STEP = 68 These are basically just punting back & forth the data going in & out of the appropriate SASL apis. sasl_{server,client}_{new,start,step} See the man pages for more info. So on the server end, if a socket is configured to require SASL auth, the server will reject all RPC calls *except* for those three above with the REMOTE_AUTH code. Once SASL auth has completed, it will allow all the normal RPC calls. The effect is basically that the client is not able to call virConnectOpen & friends until auth has completed. On the client end, the fun is in the 'call' method of remote_internal.c. This has been split in two. The original method is now 'onecall', and there is a thin wrapper named 'call'. 'call' simply invokves onecall, and if it gets back a REMOTE_AUTH code, will do the SASL handshake & then re-run the original call. So again the effect is basically that the first virConnectOpen will cause the auth handshake to be performed. The SASL implementation details ------------------------------- This is the bare minimum SASL integration. I have not attempted to hook up any callbacks for gathering credentials. This basically means that the only SASL mechanism which works is Kerberos GSSAPI - its credentials are fetched out-of-band & so don't require callbacks. We do need to consider callbacks later so we can do username/password auth, and all the various other methods SASL has. As well as authentication, some SASL mechanisms provide a way to negotiate a data encryption layer for the subsequent session. GSSAPI is the only commonly used mechanism which supports this. I have not implemented this yet though. What we would do though is to enable this capability on the the plain TCP socket only. This would make the TCP socket truely secure, and avoid any extra overhead on the TLS socket or UNIX domain socket. I have set the wire packet size for the SASL negotiation to 8192 bytes at a time. This has been sufficient so far, but I need to validate this before we commit, because this will be wire ABI sensitive. Or I could change the XDR spec to be variable length. Anyway needs re-visiting This only deals with authentication. I have not attempted any authoriztion controls. So anyone who has a valid kerberos principle can connect to the server. We clearly need a local group list as we do for the x509 client certificates. Ultimately we could try LDAP lookups & other intersting suggestions. The SASL stuff is detected in configure & enabled/disabled as appropriate. I need to add extra config file params though to let the sysadmin control what socket uses what auth mechanism. The setup / usage details ------------------------- As I said, I only used GSSAPI so far. To use this all your clients need to be able to kinit & get a ticket. For testing I have setup my own personal Kerberos server using Fedora 7 & FreeIPA (http://freeipa.org/). Each libvirt server needs to be issued with a Kerberos service principle. I am using the word 'libvirt' as the service name. The service principle must match the FQDN of the server host. So on your Kerberos server you can issue a service principle with kadmin.local
addprinc libvirt/cherry.virt.boston.redhat.com@VIRT.BOSTON.REDHAT.COM ktadd -k /tmp/cherry-libvirt.tab libvirt/cherry.virt.boston.redhat.com@VIRT.BOSTON.REDHAT.COM quit
Then copy the /tmp/cherry-libvirt.tab file to /etc/libvirt/krb5.tab on the server in question. (Change the hostnames & REALM as needed of course). If you want to change the GSSAPI mechanism, the settings are in the file /etc/sasl2/libvirt.conf. Though again only GSSAPI works so far. If you edit the /etc/libvirt/libvirtd.conf file you can enable listen_tcp=1 and run run libvirtd --listen.
From the client machine it should now be possible to do
$ kinit berrange@VIRT.BOSTON.REDHAT.COM $ virsh --connect xen+tcp://cherry.virt.boston.redhat.com/ You probably screwed something up though along the way because Kerberos is nasty like that. If you use --verbose with libvirtd it'll show details of any errors during authentication. On the client end you can add in a param on the connect URI of ?debug=stderr and it'll print some info about the auth process to stderr. Check that it shows 'GSSAPI' as a valid mechanism. If it doesn't, then your server is not configured as it should be - check keytab file - check you have cyrus-sasl-gssapi RPM. If that's ok, then check 'klist' on the client - if the first phase of chatter between the client & the KDC worked you should see the principle of the server cached on the client. If you don't, then check the KDC logs in /var/log/krb5kdc.log Kerberos/GSSAPI error reporting sucks really bad. That said, I'm sure I'm missing something, because the errors I get out of SASL are even worse than normal. BTW, you can see various 'XXX' in my patch. Basically all of them need to be addressed before I'd consider this patch suitable to merge. configure.in | 39 +++ include/libvirt/virterror.h | 1 libvirt.spec.in | 3 qemud/Makefile.am | 21 +- qemud/internal.h | 8 qemud/libvirtd.init.in | 3 qemud/libvirtd.sysconf | 3 qemud/qemud.c | 29 ++ qemud/remote.c | 370 ++++++++++++++++++++++++++++++++++-- qemud/remote_dispatch_localvars.h | 5 qemud/remote_dispatch_proc_switch.h | 24 ++ qemud/remote_dispatch_prototypes.h | 3 qemud/remote_protocol.c | 164 +++++++++++++++ qemud/remote_protocol.h | 59 +++++ qemud/remote_protocol.x | 53 ++++- src/Makefile.am | 3 src/remote_internal.c | 307 +++++++++++++++++++++++++++++ src/virsh.c | 4 src/virterror.c | 6 tests/Makefile.am | 2 20 files changed, 1073 insertions(+), 34 deletions(-) Regards, Dan. [1] Not entirely true. Some auth methods require credentials to be fetched from the user, so we can only support methods for which we have callbacks. This patch ignores the callback question, so only support GSSAPI where the credentials are fetched out-of-band. This can be addressed later. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

On Wed, Oct 17, 2007 at 10:21:56PM +0100, Daniel P. Berrange wrote:
For this patch I decided to add a 3rd code, REMOTE_AUTH. If an application tries to make an RPC call on a socket requiring authentication, and has not yet authenticated the server will return REMOTE_AUTH code. It then also returns an int (remote_auth_type) specifying the authentication method to use. I have defined two:
enum remote_auth_type { REMOTE_AUTH_NONE = 0, REMOTE_AUTH_SASL = 1 };
With plans to add REMOTE_AUTH_POLKIT in a future patch.
I really like this, very similar to how HTTP auth works, and the good think is precisely that it does work in a very general and flexible way !
A legacy client getting back REMOTE_AUTH code will just quit the connection attempt since they don't support authentication. If the admin so desires they can still provide the TLS socket in a non-authenticated mode and only turn on SASL for the TCP socket. So the decision about whether to enable legacy clients is admin controlled. This is the best we can do.
A new client getting back REMOTE_AUTH code will then read the remote_auth_type off the wire. If the requested type is one that the client supports then it can begin the authentication process, otherwise we virRaiseError and stop connecting.
Yup the fallbacks are fine by me. The only thing that could be improved in the global picture would be a way for the client to export the authentication methods it supports on the first step, unfortunately that would break wire compatibility (unless XDR has some way to carry metadata payload, but I don't think taht's the case).
The SASL specific picture -------------------------
For the SASL auth the process involves a multi-phase handshake looking something like:
client server 1. -> ask for mechanism list -> new ctx
2. new ctx <- list of mechanisms <-
3. start auth -> initial auth data -> start auth
4. step auth <- reply auth data <-
5. -> step auth data -> step auth
goto 4. <- reply auth data <-
Authentication can complete at step 4 in this process, or steps 4 & 5 can repeat an arbitrary number of times.
So, to implement this if it neccessary to define 3 new RPC calls internal to the remote driver/daemon (aka not exposed to public API like the rest of the RPC calls). These are:
REMOTE_PROC_AUTH_SASL_INIT = 66, REMOTE_PROC_AUTH_SASL_START = 67, REMOTE_PROC_AUTH_SASL_STEP = 68
These are basically just punting back & forth the data going in & out of the appropriate SASL apis. sasl_{server,client}_{new,start,step} See the man pages for more info.
So on the server end, if a socket is configured to require SASL auth, the server will reject all RPC calls *except* for those three above with the REMOTE_AUTH code. Once SASL auth has completed, it will allow all the normal RPC calls. The effect is basically that the client is not able to
I'm really wondering about the plural for 'normal RPC calls', i.e. once the caller has been authentified are we sure we can keep the socket alive on both ends for the duration of the virConnectPtr connection ? I hope it's the case for such complex handshake and that at the client level this stays true (for example remote monitoring via virt-manager effectively keeps the same connection).
call virConnectOpen & friends until auth has completed.
On the client end, the fun is in the 'call' method of remote_internal.c. This has been split in two. The original method is now 'onecall', and there is a thin wrapper named 'call'. 'call' simply invokves onecall, and if it gets back a REMOTE_AUTH code, will do the SASL handshake & then re-run the original call. So again the effect is basically that the first virConnectOpen will cause the auth handshake to be performed.
Makes sense to me
The SASL implementation details -------------------------------
This is the bare minimum SASL integration. I have not attempted to hook up any callbacks for gathering credentials. This basically means that the only SASL mechanism which works is Kerberos GSSAPI - its credentials are fetched out-of-band & so don't require callbacks. We do need to consider callbacks later so we can do username/password auth, and all the various other methods SASL has.
Designing an asynchronous API for generic authentication which could be used by the multiple potential back-ends sounds a challenge to me, first because callbacks in APIs always make life harder (think threading and asynchronous operations), but also because when it comes to security it's so easy to get things wrong without noticing !
As well as authentication, some SASL mechanisms provide a way to negotiate a data encryption layer for the subsequent session. GSSAPI is the only commonly used mechanism which supports this. I have not implemented this yet though. What we would do though is to enable this capability on the the plain TCP socket only. This would make the TCP socket truely secure, and avoid any extra overhead on the TLS socket or UNIX domain socket.
I have set the wire packet size for the SASL negotiation to 8192 bytes at a time. This has been sufficient so far, but I need to validate this before we commit, because this will be wire ABI sensitive. Or I could change the XDR spec to be variable length. Anyway needs re-visiting
Hum, we will need to ask some security expert there, yup.
This only deals with authentication. I have not attempted any authoriztion controls. So anyone who has a valid kerberos principle can connect to the server. We clearly need a local group list as we do for the x509 client certificates. Ultimately we could try LDAP lookups & other intersting suggestions.
We definitely need an extensible framework in any case, there is just too many possibilities, and it's usually corporate policies that could be fairly rigid, we need to be able to adapt.
The SASL stuff is detected in configure & enabled/disabled as appropriate. I need to add extra config file params though to let the sysadmin control what socket uses what auth mechanism.
The setup / usage details -------------------------
I really need to play with this ... [...]
-- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|
diff -r 2ebd10b676a9 configure.in --- a/configure.in Wed Oct 17 15:03:04 2007 -0400 +++ b/configure.in Wed Oct 17 15:03:07 2007 -0400 @@ -329,6 +329,40 @@ AC_CHECK_TYPE(gnutls_session, [#include <gnutls/gnutls.h>]) CFLAGS="$old_cflags" LDFLAGS="$old_ldflags" + + +dnl Cyrus SASL +AC_ARG_WITH(sasl, + [ --with-sasl use cyrus SASL for authentication], + [], + [with_sasl=yes]) + +SASL_CFLAGS= +SASL_LIBS= +if test "$with_sasl" != "no"; then + if test "$with_sasl" != "yes"; then + SASL_CFLAGS="-I$with_sasl" + SASL_LIBS="-L$with_sasl"
Hum, using the same flags for -I and -L looks suspicious to me [...]
+/* + * Initializes the SASL session in prepare for authentication + * and gives the client a list of allowed mechansims to choose + * + * XXX request wire encryption for non-TLS links + * XXX callbacks for stuff like password verification ? + * XXX fill in real hostname (from config file?) + * XXX fill in user realm (from config file ?) + * XXX fill in local & remote IP + */ +static int +remoteDispatchAuthSaslInit (struct qemud_client *client, + remote_message_header *req, + void *args ATTRIBUTE_UNUSED, + remote_auth_sasl_init_ret *ret) +{ +#if HAVE_SASL + const char *mechlist = NULL; + int err; + + qemudDebug("Initialize SASL auth\n"); + if (client->auth != REMOTE_AUTH_SASL || + client->saslconn != NULL) { + qemudLog(QEMUD_ERR, "client tried illegal SASL init request"); + remoteDispatchFailAuth(client, req); + return -2; + } + + err = sasl_server_new("libvirt", + NULL, /* XXX Server FQDN */ + NULL, /* XXX User realm */ + NULL, /* XXX IP local */ + NULL, /* XXX IP remote */ + NULL, /* XXX Callbacks */
Hum, what kind of callback is allowed there ?
+ SASL_SUCCESS_DATA, + &client->saslconn); + if (err != SASL_OK) { + qemudLog(QEMUD_ERR, "sasl context setup failed %d (%s)", + err, sasl_errstring(err, NULL, NULL)); + remoteDispatchFailAuth(client, req); + client->saslconn = NULL; + return -2; + } + + err = sasl_listmech(client->saslconn, + NULL, /* XXX username */ + "", /* Prefix */ + ",", /* Separator */ + "", /* Suffix */ + &mechlist, + NULL, + NULL);
Also we start to see the identity problems showing up there. Maybe I'm logged as veillard on the monitoring machine but the realm used to monitor domain instances on the cluster is 'virtadmin' and I could have token for both kind but we risk using the wrong one when doing the connection. [...]
--- a/qemud/remote_protocol.x Wed Oct 17 15:03:04 2007 -0400 +++ b/qemud/remote_protocol.x Wed Oct 17 15:05:13 2007 -0400 @@ -80,6 +80,10 @@ const REMOTE_NETWORK_NAME_LIST_MAX = 256
/* Upper limit on list of scheduler parameters. */ const REMOTE_DOMAIN_SCHEDULER_PARAMETERS_MAX = 16; + +/* Upper limit on SASL auth negotiation packet */ +/* XXX is this large enough for all SASL mechs ? */ +const REMOTE_AUTH_SASL_DATA_MAX = 8192;
/* UUID. VIR_UUID_BUFLEN definition comes from libvirt.h */ typedef opaque remote_uuid[VIR_UUID_BUFLEN]; @@ -121,6 +125,12 @@ struct remote_error { int int1; int int2; remote_network net; +}; + +/* Sent back with REMOTE_NEED_AUTH */ +enum remote_auth_type { + REMOTE_AUTH_NONE = 0, + REMOTE_AUTH_SASL = 1 };
/* Wire encoding of virVcpuInfo. */ @@ -610,6 +620,37 @@ struct remote_network_set_autostart_args struct remote_network_set_autostart_args { remote_nonnull_network net; int autostart; +}; + +struct remote_auth_sasl_init_ret { + remote_nonnull_string mechlist; +}; + +struct remote_auth_sasl_start_args { + remote_nonnull_string mech; + int nil; + unsigned datalen; + char data[REMOTE_AUTH_SASL_DATA_MAX]; +}; + +struct remote_auth_sasl_start_ret { + int complete; + int nil; + unsigned datalen; + char data[REMOTE_AUTH_SASL_DATA_MAX]; +}; + +struct remote_auth_sasl_step_args { + int nil; + unsigned datalen; + char data[REMOTE_AUTH_SASL_DATA_MAX]; +}; + +struct remote_auth_sasl_step_ret { + int complete; + int nil; + unsigned datalen; + char data[REMOTE_AUTH_SASL_DATA_MAX]; };
Hum, maybe a variable lenght string is really the good solution in the long term, even if it's a bit more complex
/*----- Protocol. -----*/ @@ -683,7 +724,10 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_MIGRATE_PERFORM = 62, REMOTE_PROC_DOMAIN_MIGRATE_FINISH = 63, REMOTE_PROC_DOMAIN_BLOCK_STATS = 64, - REMOTE_PROC_DOMAIN_INTERFACE_STATS = 65 + REMOTE_PROC_DOMAIN_INTERFACE_STATS = 65, + REMOTE_PROC_AUTH_SASL_INIT = 66, + REMOTE_PROC_AUTH_SASL_START = 67, + REMOTE_PROC_AUTH_SASL_STEP = 68 };
okay the 3 new calls
/* Custom RPC structure. */ @@ -714,7 +758,12 @@ enum remote_message_status { /* For replies, indicates that an error happened, and a struct * remote_error follows. */ - REMOTE_ERROR = 1 + REMOTE_ERROR = 1, + + /* Require authentication before this call is allowed + * remote_auth_type follows. + */ + REMOTE_AUTH = 2 };
[...]
diff -r 2ebd10b676a9 src/remote_internal.c --- a/src/remote_internal.c Wed Oct 17 15:03:04 2007 -0400 +++ b/src/remote_internal.c Wed Oct 17 16:15:14 2007 -0400 @@ -46,6 +46,9 @@ #include <gnutls/gnutls.h> #include <gnutls/x509.h> #include "gnutls_1_0_compat.h" +#if HAVE_SASL +#include <sasl/sasl.h> +#endif #include <libxml/uri.h>
#include "internal.h" @@ -71,6 +74,8 @@ struct private_data { int counter; /* Generates serial numbers for RPC. */ char *uri; /* Original (remote) URI. */ int networkOnly; /* Only used for network API */ + char *hostname; /* Original hostname */ + FILE *debugLog; /* Debug remote protocol */ };
being able to debug a silent SASL auth sounds a really good idea <grin/>
#define GET_PRIVATE(conn,retcode) \ @@ -89,7 +94,14 @@ struct private_data { return (retcode); \ }
-static int call (virConnectPtr conn, struct private_data *priv, int in_open, int proc_nr, xdrproc_t args_filter, char *args, xdrproc_t ret_filter, char *ret); +static int call (virConnectPtr conn, struct private_data *priv, + int in_open, int proc_nr, + xdrproc_t args_filter, char *args, + xdrproc_t ret_filter, char *ret); +static int onecall (virConnectPtr conn, struct private_data *priv, + int in_open, int in_auth, int proc_nr, + xdrproc_t args_filter, char *args, + xdrproc_t ret_filter, char *ret);
the call split, okay [...]
+#if HAVE_SASL +/* Perform the SASL authentication process + * + * XXX negotiate a session encryption layer for non-TLS sockets + * XXX fetch credentials from a libvirt client app callback + * XXX fill in local/remote IP details + * XXX use the real hostname from the connect URI + * XXX max packet size spec + * XXX better mechanism negotiation ? Ask client app ? + */ +static int +remoteAuthSasl (virConnectPtr conn, struct private_data *priv, + int in_open /* if we are in virConnectOpen */) { + sasl_conn_t *saslconn; + remote_auth_sasl_init_ret iret; + remote_auth_sasl_start_args sargs; + remote_auth_sasl_start_ret sret; + remote_auth_sasl_step_args pargs; + remote_auth_sasl_step_ret pret; + const char *clientout, *serverin; + unsigned int clientoutlen, serverinlen; + const char *mech; + int err, complete; + + remoteDebug(priv, "Client initialize SASL authentication"); + err = sasl_client_init(NULL); + if (err != SASL_OK) { + __virRaiseError (in_open ? NULL : conn, NULL, NULL, VIR_FROM_REMOTE, + VIR_ERR_AUTH_FAILED, VIR_ERR_ERROR, NULL, NULL, NULL, 0, 0, + "failed to initialize SASL library: %d (%s)", + err, sasl_errstring(err, NULL, NULL)); + return -1; + } + + err = sasl_client_new("libvirt", + priv->hostname, + NULL, /* XXX local addr */ + NULL, /* XXX remote addr */ + NULL, /* XXX callbacks */ + SASL_SUCCESS_DATA, + &saslconn); + if (err != SASL_OK) { + __virRaiseError (in_open ? NULL : conn, NULL, NULL, VIR_FROM_REMOTE, + VIR_ERR_AUTH_FAILED, VIR_ERR_ERROR, NULL, NULL, NULL, 0, 0, + "Failed to create SASL client context: %d (%s)", + err, sasl_errstring(err, NULL, NULL)); + return -1; + } + + /* First call is to inquire about supported mechanisms in the server */ + + memset (&iret, 0, sizeof iret); + if (onecall (conn, priv, in_open, 1, REMOTE_PROC_AUTH_SASL_INIT, + (xdrproc_t) xdr_void, (char *)NULL, + (xdrproc_t) xdr_remote_auth_sasl_init_ret, (char *) &iret) != 0) { + sasl_dispose(&saslconn); + return -1; /* virError already set by onecall */ + } + + + /* Start the auth negotiation on the client end first */ + remoteDebug(priv, "Client start negotiation mechlist '%s'", iret.mechlist); + err = sasl_client_start(saslconn, + iret.mechlist, + NULL, /* XXX interactions */ + &clientout, + &clientoutlen, + &mech); + if (err != SASL_OK && err != SASL_CONTINUE) { + __virRaiseError (in_open ? NULL : conn, NULL, NULL, VIR_FROM_REMOTE, + VIR_ERR_AUTH_FAILED, VIR_ERR_ERROR, NULL, NULL, NULL, 0, 0, + "Failed to start SASL negotiation: %d (%s)", + err, sasl_errstring(err, NULL, NULL)); + free(iret.mechlist); + sasl_dispose(&saslconn); + return -1; + } + free(iret.mechlist); + + if (clientoutlen > REMOTE_AUTH_SASL_DATA_MAX) { + __virRaiseError (in_open ? NULL : conn, NULL, NULL, VIR_FROM_REMOTE, + VIR_ERR_AUTH_FAILED, VIR_ERR_ERROR, NULL, NULL, NULL, 0, 0, + "SASL negotiation data too long: %d bytes", clientoutlen); + sasl_dispose(&saslconn); + return -1; + } + /* NB, distinction of NULL vs "" is *critical* in SASL */ + if (clientout) + memcpy(sargs.data, clientout, clientoutlen); + else + sargs.nil = 1; + sargs.datalen = clientoutlen; + sargs.mech = (char*)mech; + remoteDebug(priv, "Server start negotiation with mech %s. Data %d bytes %p", mech, clientoutlen, clientout); + + /* Now send the initial auth data to the server */ + memset (&sret, 0, sizeof sret); + if (onecall (conn, priv, in_open, 1, REMOTE_PROC_AUTH_SASL_START, + (xdrproc_t) xdr_remote_auth_sasl_start_args, (char *) &sargs, + (xdrproc_t) xdr_remote_auth_sasl_start_ret, (char *) &sret) != 0) { + sasl_dispose(&saslconn); + return -1; /* virError already set by onecall */ + } + + complete = sret.complete; + /* NB, distinction of NULL vs "" is *critical* in SASL */ + serverin = sret.nil ? NULL : sret.data; + serverinlen = sret.datalen; + remoteDebug(priv, "Client step result complete: %d. Data %d bytes %p", + complete, serverinlen, serverin); + + /* Loop-the-loop... + * Even if the server has completed, the client must *always* do at least one step + * in this loop to verify the server isn't lieing about something. Mutual auth */ + for (;;) { + err = sasl_client_step(saslconn, + serverin, + serverinlen, + NULL, /* XXX interactions */ + &clientout, + &clientoutlen); + if (err != SASL_OK && err != SASL_CONTINUE) { + __virRaiseError (in_open ? NULL : conn, NULL, NULL, VIR_FROM_REMOTE, + VIR_ERR_AUTH_FAILED, VIR_ERR_ERROR, NULL, NULL, NULL, 0, 0, + "Failed step %d %s", + err, sasl_errstring(err, NULL, NULL)); + sasl_dispose(&saslconn); + return -1; + } + remoteDebug(priv, "Client step result %d. Data %d bytes %p", err, clientoutlen, clientout); + + /* Previous server call showed completion & we're now locally complete too */ + if (complete && err == SASL_OK) + break; + + /* Not done, prepare to talk with the server for another iteration */ + /* NB, distinction of NULL vs "" is *critical* in SASL */ + if (clientout) { + memcpy(pargs.data, clientout, clientoutlen); + pargs.nil = 0; + } else { + pargs.nil = 1; + } + pargs.datalen = clientoutlen; + remoteDebug(priv, "Server step with %d bytes %p", clientoutlen, clientout); + + memset (&pret, 0, sizeof pret); + if (onecall (conn, priv, in_open, 1, REMOTE_PROC_AUTH_SASL_STEP, + (xdrproc_t) xdr_remote_auth_sasl_step_args, (char *) &pargs, + (xdrproc_t) xdr_remote_auth_sasl_step_ret, (char *) &pret) != 0) { + sasl_dispose(&saslconn); + return -1; /* virError already set by onecall */ + } + + complete = pret.complete; + /* NB, distinction of NULL vs "" is *critical* in SASL */ + serverin = pret.nil ? NULL : pret.data; + serverinlen = pret.datalen; + + remoteDebug(priv, "Client step result complete: %d. Data %d bytes %p", + complete, serverinlen, serverin); + + /* This server call shows complete, and earlier client step was OK */ + if (complete && err == SASL_OK) + break; + } + + remoteDebug(priv, "SASL authentication complete"); + /* XXX keep this around for wire encoding */ + sasl_dispose(&saslconn); + return 0; +} +#endif
Hum, head hurts at this point ... [...]
diff -r 2ebd10b676a9 src/virsh.c --- a/src/virsh.c Wed Oct 17 15:03:04 2007 -0400 +++ b/src/virsh.c Wed Oct 17 15:03:07 2007 -0400 @@ -4512,8 +4512,10 @@ vshInit(vshControl * ctl) * vshConnectionUsability, except ones which don't need a connection * such as "help". */ - if (!ctl->conn) + if (!ctl->conn) { vshError(ctl, FALSE, _("failed to connect to the hypervisor")); + return FALSE; + }
return TRUE; }
Hum, could you push that change to CVS independantly ? Thanks a lot, obviously there is a bit more to be done before commiting, but I don't think we need to define the authentication API as a prerequisite to push this in, it's okay to rely on background auth as a first step. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

On Thu, Oct 18, 2007 at 10:44:57AM -0400, Daniel Veillard wrote:
On Wed, Oct 17, 2007 at 10:21:56PM +0100, Daniel P. Berrange wrote:
A legacy client getting back REMOTE_AUTH code will just quit the connection attempt since they don't support authentication. If the admin so desires they can still provide the TLS socket in a non-authenticated mode and only turn on SASL for the TCP socket. So the decision about whether to enable legacy clients is admin controlled. This is the best we can do.
A new client getting back REMOTE_AUTH code will then read the remote_auth_type off the wire. If the requested type is one that the client supports then it can begin the authentication process, otherwise we virRaiseError and stop connecting.
Yup the fallbacks are fine by me. The only thing that could be improved in the global picture would be a way for the client to export the authentication methods it supports on the first step, unfortunately that would break wire compatibility (unless XDR has some way to carry metadata payload, but I don't think taht's the case).
Having the client export its auth types isn't really neccessary, since the server only supports a single primary auth type for any given socket. The SASL auth type, does have many sub-auth types - eg GSSAPI, PLAIN, OTP, etc and that's what the 'list mechanisms' called below provides info about. The client asks server for all SASL mechanisms that are allowed - it then chooss the best one it supports. If we need this for the primary libvirt auth types too, I'd change the wire protocol a little, so that when sending the REMOTE_AUTH code, we would send back a list of allowed primary auth types. The client can then invoke the RPC call for whichever one they support. This would let us support both PolicyKit & SASL on the same socket, which could be nice i guess.
So, to implement this if it neccessary to define 3 new RPC calls internal to the remote driver/daemon (aka not exposed to public API like the rest of the RPC calls). These are:
REMOTE_PROC_AUTH_SASL_INIT = 66, REMOTE_PROC_AUTH_SASL_START = 67, REMOTE_PROC_AUTH_SASL_STEP = 68
These are basically just punting back & forth the data going in & out of the appropriate SASL apis. sasl_{server,client}_{new,start,step} See the man pages for more info.
So on the server end, if a socket is configured to require SASL auth, the server will reject all RPC calls *except* for those three above with the REMOTE_AUTH code. Once SASL auth has completed, it will allow all the normal RPC calls. The effect is basically that the client is not able to
I'm really wondering about the plural for 'normal RPC calls', i.e. once the caller has been authentified are we sure we can keep the socket alive on both ends for the duration of the virConnectPtr connection ? I hope it's the case for such complex handshake and that at the client level this stays true (for example remote monitoring via virt-manager effectively keeps the same connection).
I'm not sure what you mean here ? The whole remote session, takes place over a TCP socket. We are requiring authentication upfront, before allowing the virConnectOpen call. Once the client closes the socket, they have to open a new one & start the whole procss from scratch. Both the client & server cope with the other end dropping the connection in the middle of a handshake by just cleaning up their state.
The SASL implementation details -------------------------------
This is the bare minimum SASL integration. I have not attempted to hook up any callbacks for gathering credentials. This basically means that the only SASL mechanism which works is Kerberos GSSAPI - its credentials are fetched out-of-band & so don't require callbacks. We do need to consider callbacks later so we can do username/password auth, and all the various other methods SASL has.
Designing an asynchronous API for generic authentication which could be used by the multiple potential back-ends sounds a challenge to me, first because callbacks in APIs always make life harder (think threading and asynchronous operations), but also because when it comes to security it's so easy to get things wrong without noticing !
Well if we registered a global function virConnectCallbacks(...) to register a set of callbacks that would be sufficient. The callbacks would be guarenteed to always execute in the same thread as the virConnectOpen() call itself. So we'd just have to document that you should make sure that the callbacks are able to run in parallel - ie multiple threads can be in virConnectOpen calls at once. Threading isn't really the thing to worry about here. The hard bit is actally designing what data goes in & out of the callbacks. SASL has a pretty large list of possible callbacks. I listed a couple here http://www.redhat.com/archives/libvir-list/2007-January/msg00024.html but now I've looked at SASL more closely there's many more to consider & I think I can do a better job than my original proposal.
I have set the wire packet size for the SASL negotiation to 8192 bytes at a time. This has been sufficient so far, but I need to validate this before we commit, because this will be wire ABI sensitive. Or I could change the XDR spec to be variable length. Anyway needs re-visiting
Hum, we will need to ask some security expert there, yup.
Well its really a SASL implementation detail. There is a way you can specify a maximum network packet size you're prepared to accept. I need to check whether this appllies to the initial handshake, or just the post-handshake session encryption.
This only deals with authentication. I have not attempted any authoriztion controls. So anyone who has a valid kerberos principle can connect to the server. We clearly need a local group list as we do for the x509 client certificates. Ultimately we could try LDAP lookups & other intersting suggestions.
We definitely need an extensible framework in any case, there is just too many possibilities, and it's usually corporate policies that could be fairly rigid, we need to be able to adapt.
Yep, not sure what todo here really, aisde from a local group file.
diff -r 2ebd10b676a9 configure.in --- a/configure.in Wed Oct 17 15:03:04 2007 -0400 +++ b/configure.in Wed Oct 17 15:03:07 2007 -0400 @@ -329,6 +329,40 @@ AC_CHECK_TYPE(gnutls_session, [#include <gnutls/gnutls.h>]) CFLAGS="$old_cflags" LDFLAGS="$old_ldflags" + + +dnl Cyrus SASL +AC_ARG_WITH(sasl, + [ --with-sasl use cyrus SASL for authentication], + [], + [with_sasl=yes]) + +SASL_CFLAGS= +SASL_LIBS= +if test "$with_sasl" != "no"; then + if test "$with_sasl" != "yes"; then + SASL_CFLAGS="-I$with_sasl" + SASL_LIBS="-L$with_sasl"
Hum, using the same flags for -I and -L looks suspicious to me
Why, it just lets you do --with-sasl=/opt/sasl Yes, it assumes people install the libraries & headers into the same prefix, but frankly that's not unreasonable.
[...]
+/* + * Initializes the SASL session in prepare for authentication + * and gives the client a list of allowed mechansims to choose + * + * XXX request wire encryption for non-TLS links + * XXX callbacks for stuff like password verification ? + * XXX fill in real hostname (from config file?) + * XXX fill in user realm (from config file ?) + * XXX fill in local & remote IP + */ +static int +remoteDispatchAuthSaslInit (struct qemud_client *client, + remote_message_header *req, + void *args ATTRIBUTE_UNUSED, + remote_auth_sasl_init_ret *ret) +{ +#if HAVE_SASL + const char *mechlist = NULL; + int err; + + qemudDebug("Initialize SASL auth\n"); + if (client->auth != REMOTE_AUTH_SASL || + client->saslconn != NULL) { + qemudLog(QEMUD_ERR, "client tried illegal SASL init request"); + remoteDispatchFailAuth(client, req); + return -2; + } + + err = sasl_server_new("libvirt", + NULL, /* XXX Server FQDN */ + NULL, /* XXX User realm */ + NULL, /* XXX IP local */ + NULL, /* XXX IP remote */ + NULL, /* XXX Callbacks */
Hum, what kind of callback is allowed there ?
*Many* different callbacks. Rather than listing them here, take a look at /usr/include/sasl/sasl.h - search for 'struct sasl_callback' - following the struct definition are many callback function signatures. Also look at the 'sasl_callbacks' many page. This is specifically for server side calbacks. These deal with stuff like pluggable password checking API, and an authorization checking API. There is separate client side callbacks elsewhere, but I'd probably not use them. Instead I'd go for what SASL calls 'interactions'. Basically the 'step' API will fail & give you back a list of interactions that are needed. The advantage of this is that you can feed these back to the client app all in one go, rather than one-at-a-time. This makes is practical to build a UI prompting for all data at once.
+ err = sasl_listmech(client->saslconn, + NULL, /* XXX username */ + "", /* Prefix */ + ",", /* Separator */ + "", /* Suffix */ + &mechlist, + NULL, + NULL);
Also we start to see the identity problems showing up there. Maybe I'm logged as veillard on the monitoring machine but the realm used to monitor domain instances on the cluster is 'virtadmin' and I could have token for both kind but we risk using the wrong one when doing the connection.
This is server side & in fact the username field here is rarly used. The purpose here is to allow you to filter the list of authentication schemes given back to the client based on client's username. Trouble is you can't really trust the username at this point since we've done no auth, so it is of limited use. The docs say you rarely need to worry about this field, just leaving it NULL.
[...]
+#if HAVE_SASL +/* Perform the SASL authentication process + * + * XXX negotiate a session encryption layer for non-TLS sockets + * XXX fetch credentials from a libvirt client app callback + * XXX fill in local/remote IP details + * XXX use the real hostname from the connect URI + * XXX max packet size spec + * XXX better mechanism negotiation ? Ask client app ? + */ +static int +remoteAuthSasl (virConnectPtr conn, struct private_data *priv, + int in_open /* if we are in virConnectOpen */) {
[snip]
Hum, head hurts at this point ...
There is alot of line noise there with the error handling, but bascialy it comes down to -> sasl_client_new -> Run the REMOTE_PROC_AUTH_SASL_INIT rpc call -> sasl_client_start -> Run the REMOTE_PROC_AUTH_SASL_START rpc call repeat until authenticated: -> sasl_client_step -> Run the REMOTE_PROC_AUTH_SASL_STEP rpc call Each of teh rpc calls basically ends up doing the equivalent SASL func on the server eg, sasl_server_new, sasl_server_start and sasl_server_step So it is pretty symmetric in sequence of calls. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

On Thu, Oct 18, 2007 at 06:22:17PM +0100, Daniel P. Berrange wrote:
On Thu, Oct 18, 2007 at 10:44:57AM -0400, Daniel Veillard wrote:
On Wed, Oct 17, 2007 at 10:21:56PM +0100, Daniel P. Berrange wrote:
A legacy client getting back REMOTE_AUTH code will just quit the connection attempt since they don't support authentication. If the admin so desires they can still provide the TLS socket in a non-authenticated mode and only turn on SASL for the TCP socket. So the decision about whether to enable legacy clients is admin controlled. This is the best we can do.
A new client getting back REMOTE_AUTH code will then read the remote_auth_type off the wire. If the requested type is one that the client supports then it can begin the authentication process, otherwise we virRaiseError and stop connecting.
Yup the fallbacks are fine by me. The only thing that could be improved in the global picture would be a way for the client to export the authentication methods it supports on the first step, unfortunately that would break wire compatibility (unless XDR has some way to carry metadata payload, but I don't think taht's the case).
Having the client export its auth types isn't really neccessary, since the server only supports a single primary auth type for any given socket. The SASL auth type, does have many sub-auth types - eg GSSAPI, PLAIN, OTP, etc and that's what the 'list mechanisms' called below provides info about. The client asks server for all SASL mechanisms that are allowed - it then chooss the best one it supports.
If we need this for the primary libvirt auth types too, I'd change the wire protocol a little, so that when sending the REMOTE_AUTH code, we would send back a list of allowed primary auth types. The client can then invoke the RPC call for whichever one they support. This would let us support both PolicyKit & SASL on the same socket, which could be nice i guess.
yes that's the symetric approach, that should work very well too
So, to implement this if it neccessary to define 3 new RPC calls internal to the remote driver/daemon (aka not exposed to public API like the rest of the RPC calls). These are:
REMOTE_PROC_AUTH_SASL_INIT = 66, REMOTE_PROC_AUTH_SASL_START = 67, REMOTE_PROC_AUTH_SASL_STEP = 68
These are basically just punting back & forth the data going in & out of the appropriate SASL apis. sasl_{server,client}_{new,start,step} See the man pages for more info.
So on the server end, if a socket is configured to require SASL auth, the server will reject all RPC calls *except* for those three above with the REMOTE_AUTH code. Once SASL auth has completed, it will allow all the normal RPC calls. The effect is basically that the client is not able to
I'm really wondering about the plural for 'normal RPC calls', i.e. once the caller has been authentified are we sure we can keep the socket alive on both ends for the duration of the virConnectPtr connection ? I hope it's the case for such complex handshake and that at the client level this stays true (for example remote monitoring via virt-manager effectively keeps the same connection).
I'm not sure what you mean here ? The whole remote session, takes place over a TCP socket. We are requiring authentication upfront, before allowing the virConnectOpen call. Once the client closes the socket, they have to open a new one & start the whole procss from scratch. Both the client & server cope with the other end dropping the connection in the middle of a handshake by just cleaning up their state.
yes I had that question at the beginning but having read the code makes thing clear, no problem, it's only at virConnectOpen() so fine.
The SASL implementation details -------------------------------
This is the bare minimum SASL integration. I have not attempted to hook up any callbacks for gathering credentials. This basically means that the only SASL mechanism which works is Kerberos GSSAPI - its credentials are fetched out-of-band & so don't require callbacks. We do need to consider callbacks later so we can do username/password auth, and all the various other methods SASL has.
Designing an asynchronous API for generic authentication which could be used by the multiple potential back-ends sounds a challenge to me, first because callbacks in APIs always make life harder (think threading and asynchronous operations), but also because when it comes to security it's so easy to get things wrong without noticing !
Well if we registered a global function virConnectCallbacks(...) to register a set of callbacks that would be sufficient. The callbacks would be guarenteed to always execute in the same thread as the virConnectOpen() call itself. So we'd just have to document that you should make sure that the callbacks are able to run in parallel - ie multiple threads can be in virConnectOpen calls at once.
Threading isn't really the thing to worry about here. The hard bit is actally
yes again, because all this happens basically synchronously at time of virConnectOpen(). Some may raise the fact that this connection may stay open longuer than for example the validity of the kerberos token used to get it. Maybe we will never need async operations because it's okay for tools to stay open (or just reconnect after some time.
designing what data goes in & out of the callbacks. SASL has a pretty large list of possible callbacks. I listed a couple here
http://www.redhat.com/archives/libvir-list/2007-January/msg00024.html
but now I've looked at SASL more closely there's many more to consider & I think I can do a better job than my original proposal.
okay :-)
I have set the wire packet size for the SASL negotiation to 8192 bytes at a time. This has been sufficient so far, but I need to validate this before we commit, because this will be wire ABI sensitive. Or I could change the XDR spec to be variable length. Anyway needs re-visiting
Hum, we will need to ask some security expert there, yup.
Well its really a SASL implementation detail. There is a way you can specify a maximum network packet size you're prepared to accept. I need to check whether this appllies to the initial handshake, or just the post-handshake session encryption.
This only deals with authentication. I have not attempted any authoriztion controls. So anyone who has a valid kerberos principle can connect to the server. We clearly need a local group list as we do for the x509 client certificates. Ultimately we could try LDAP lookups & other intersting suggestions.
We definitely need an extensible framework in any case, there is just too many possibilities, and it's usually corporate policies that could be fairly rigid, we need to be able to adapt.
Yep, not sure what todo here really, aisde from a local group file.
diff -r 2ebd10b676a9 configure.in --- a/configure.in Wed Oct 17 15:03:04 2007 -0400 +++ b/configure.in Wed Oct 17 15:03:07 2007 -0400 @@ -329,6 +329,40 @@ AC_CHECK_TYPE(gnutls_session, [#include <gnutls/gnutls.h>]) CFLAGS="$old_cflags" LDFLAGS="$old_ldflags" + + +dnl Cyrus SASL +AC_ARG_WITH(sasl, + [ --with-sasl use cyrus SASL for authentication], + [], + [with_sasl=yes]) + +SASL_CFLAGS= +SASL_LIBS= +if test "$with_sasl" != "no"; then + if test "$with_sasl" != "yes"; then + SASL_CFLAGS="-I$with_sasl" + SASL_LIBS="-L$with_sasl"
Hum, using the same flags for -I and -L looks suspicious to me
Why, it just lets you do
--with-sasl=/opt/sasl
Yes, it assumes people install the libraries & headers into the same prefix, but frankly that's not unreasonable.
prefix ? I think -I and -L indicates paths, that's why I'm surprized.
+ err = sasl_server_new("libvirt", + NULL, /* XXX Server FQDN */ + NULL, /* XXX User realm */ + NULL, /* XXX IP local */ + NULL, /* XXX IP remote */ + NULL, /* XXX Callbacks */
Hum, what kind of callback is allowed there ?
*Many* different callbacks. Rather than listing them here, take a look at /usr/include/sasl/sasl.h - search for 'struct sasl_callback' - following the struct definition are many callback function signatures. Also look at the 'sasl_callbacks' many page.
okay, I see urgh they dropped type checking for args of the call basically there is no generic signature, what I was hoping to see
This is specifically for server side calbacks. These deal with stuff like pluggable password checking API, and an authorization checking API.
There is separate client side callbacks elsewhere, but I'd probably not use them. Instead I'd go for what SASL calls 'interactions'. Basically the 'step' API will fail & give you back a list of interactions that are needed. The advantage of this is that you can feed these back to the client app all in one go, rather than one-at-a-time. This makes is practical to build a UI prompting for all data at once.
I'm not sure I understand yet, but it's my fault :-)
+ err = sasl_listmech(client->saslconn, + NULL, /* XXX username */ + "", /* Prefix */ + ",", /* Separator */ + "", /* Suffix */ + &mechlist, + NULL, + NULL);
Also we start to see the identity problems showing up there. Maybe I'm logged as veillard on the monitoring machine but the realm used to monitor domain instances on the cluster is 'virtadmin' and I could have token for both kind but we risk using the wrong one when doing the connection.
This is server side & in fact the username field here is rarly used. The purpose here is to allow you to filter the list of authentication schemes given back to the client based on client's username. Trouble is you can't really trust the username at this point since we've done no auth, so it is of limited use. The docs say you rarely need to worry about this field, just leaving it NULL.
okay
[...]
+#if HAVE_SASL +/* Perform the SASL authentication process + * + * XXX negotiate a session encryption layer for non-TLS sockets + * XXX fetch credentials from a libvirt client app callback + * XXX fill in local/remote IP details + * XXX use the real hostname from the connect URI + * XXX max packet size spec + * XXX better mechanism negotiation ? Ask client app ? + */ +static int +remoteAuthSasl (virConnectPtr conn, struct private_data *priv, + int in_open /* if we are in virConnectOpen */) {
[snip]
Hum, head hurts at this point ...
There is alot of line noise there with the error handling, but bascialy it comes down to
-> sasl_client_new
-> Run the REMOTE_PROC_AUTH_SASL_INIT rpc call
-> sasl_client_start
-> Run the REMOTE_PROC_AUTH_SASL_START rpc call
repeat until authenticated: -> sasl_client_step
-> Run the REMOTE_PROC_AUTH_SASL_STEP rpc call
Each of teh rpc calls basically ends up doing the equivalent SASL func on the server eg, sasl_server_new, sasl_server_start and sasl_server_step So it is pretty symmetric in sequence of calls.
Okay :-) thanks a lot for explaining ! Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
participants (2)
-
Daniel P. Berrange
-
Daniel Veillard