Re: [Libvir] PATCH: RFC supporting SASL authentication

18 Oct 2007

On Wed, Oct 17, 2007 at 10:21:56PM +0100, Daniel P. Berrange wrote:
...
For this patch I decided to add a 3rd code,  REMOTE_AUTH. If an application
tries to make an RPC call on a socket requiring authentication, and has not
yet authenticated the server will return REMOTE_AUTH code. It then also returns
an int (remote_auth_type) specifying the authentication method to use. I have 
defined two:
enum remote_auth_type {
    REMOTE_AUTH_NONE = 0,
    REMOTE_AUTH_SASL = 1
  };
With plans to add REMOTE_AUTH_POLKIT in a future patch.
I really like this, very similar to how HTTP auth works, and the good think
is precisely that it does work in a very general and flexible way !
...
A legacy client getting back REMOTE_AUTH code will just quit the connection
attempt since they don't support authentication. If the admin so desires
they can still provide the TLS socket in a non-authenticated mode and only
turn on SASL for the TCP socket. So the decision about whether to enable
legacy clients is admin controlled. This is the best we can do.
A new client getting back REMOTE_AUTH code will then read the remote_auth_type
off the wire. If the requested type is one that the client supports then it
can begin the authentication process, otherwise we virRaiseError and stop
connecting.
Yup the fallbacks are fine by me.
The only thing that could be improved in the global picture would be a way
for the client to export the authentication methods it supports on the first
step, unfortunately that would break wire compatibility (unless XDR has some
way to carry metadata payload, but I don't think taht's the case).
...
The SASL specific picture
-------------------------
For the SASL auth the process involves a multi-phase handshake looking
something like:
client                                server
1.           -> ask for mechanism list ->  new ctx
2.  new ctx  <-  list of mechanisms  <-
3.  start auth  -> initial auth data  ->   start auth
4.  step auth   <- reply auth data   <-
5.              -> step auth data ->      step auth
goto 4.     <- reply auth data <-
Authentication can complete at step 4 in this process, or steps 4 & 5
can repeat an arbitrary number of times.
So, to implement this if it neccessary to define 3 new RPC calls internal
to the remote driver/daemon (aka not exposed to public API like the rest
of the RPC calls). These are:
REMOTE_PROC_AUTH_SASL_INIT = 66,
    REMOTE_PROC_AUTH_SASL_START = 67,
    REMOTE_PROC_AUTH_SASL_STEP = 68
These are basically just punting back & forth the data going in & out of
the appropriate SASL apis. sasl_{server,client}_{new,start,step} See
the man pages for more info.
So on the server end, if a socket is configured to require SASL auth, the
server will reject all RPC calls *except* for those three above with the
REMOTE_AUTH code.  Once SASL auth has completed, it will allow all the 
normal RPC calls. The effect is basically that the client is not able to
I'm really wondering about the plural for 'normal RPC calls', i.e. once
the caller has been authentified are we sure we can keep the socket alive
on both ends for the duration of the virConnectPtr connection ? I hope
it's the case for such complex handshake and that at the client level
this stays true (for example remote monitoring via virt-manager effectively
keeps the same connection).
...
call virConnectOpen & friends until auth has completed.
On the client end, the fun is in the 'call' method of remote_internal.c. 
This has been split in two. The original method is now 'onecall', and there
is a thin wrapper named 'call'.  'call' simply invokves onecall, and if it
gets back a REMOTE_AUTH code, will do the SASL handshake & then re-run the
original call. So again the effect is basically that the first virConnectOpen
will cause the auth handshake to be performed.
Makes sense to me
...
The SASL implementation details
-------------------------------
This is the bare minimum SASL integration. I have not attempted to hook
up any callbacks for gathering credentials. This basically means that the
only SASL mechanism which works is Kerberos GSSAPI - its credentials are
fetched out-of-band & so don't require callbacks.  We do need to consider
callbacks later so we can do username/password auth, and all the various
other methods SASL has.
Designing an asynchronous API for generic authentication which could
be used by the multiple potential back-ends sounds a challenge to me,
first because callbacks in APIs always make life harder (think threading
and asynchronous operations), but also because when it comes to security 
it's so easy to get things wrong without noticing !
...
As well as authentication, some SASL mechanisms provide a way to negotiate
a data encryption layer for the subsequent session. GSSAPI is the only
commonly used mechanism which supports this. I have not implemented this
yet though. What we would do though is to enable this capability on the
the plain TCP socket only.  This would make the TCP socket truely secure,
and avoid any extra overhead on the TLS socket or UNIX domain socket.
I have set the wire packet size for the SASL negotiation to 8192 bytes at
a time. This has been sufficient so far, but I need to validate this
before we commit, because this will be wire ABI sensitive. Or I could
change the XDR spec to be variable length. Anyway needs re-visiting
Hum, we will need to ask some security expert there, yup.
...
This only deals with authentication. I have not attempted any authoriztion
controls. So anyone who has a valid kerberos principle can connect to the
server. We clearly need a local group list as we do for the x509 client
certificates. Ultimately we could try LDAP lookups & other intersting
suggestions.
We definitely need an extensible framework in any case, there is just
too many possibilities, and it's usually corporate policies that could
be fairly rigid, we need to be able to adapt.
...
The SASL stuff is detected in configure & enabled/disabled as appropriate.
I need to add extra config file params though to let the sysadmin control
what socket uses what auth mechanism.
The setup / usage details
-------------------------
I really need to play with this ...

[...]
...
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=|
...
diff -r 2ebd10b676a9 configure.in

--- a/configure.in	Wed Oct 17 15:03:04 2007 -0400
+++ b/configure.in	Wed Oct 17 15:03:07 2007 -0400
@@ -329,6 +329,40 @@ AC_CHECK_TYPE(gnutls_session,
  [#include <gnutls/gnutls.h>])
 CFLAGS="$old_cflags"
 LDFLAGS="$old_ldflags"
+
+
+dnl Cyrus SASL
+AC_ARG_WITH(sasl,
+  [  --with-sasl         use cyrus SASL for authentication],
+  [],
+  [with_sasl=yes])
+
+SASL_CFLAGS=
+SASL_LIBS=
+if test "$with_sasl" != "no"; then
+  if test "$with_sasl" != "yes"; then
+    SASL_CFLAGS="-I$with_sasl"
+    SASL_LIBS="-L$with_sasl"
Hum, using the same flags for -I and -L looks suspicious to me

[...]
...
+/*
+ * Initializes the SASL session in prepare for authentication
+ * and gives the client a list of allowed mechansims to choose
+ *
+ * XXX request wire encryption for non-TLS links
+ * XXX callbacks for stuff like password verification ?
+ * XXX fill in real hostname (from config file?)
+ * XXX fill in user realm (from config file ?)
+ * XXX fill in local & remote IP
+ */
+static int
+remoteDispatchAuthSaslInit (struct qemud_client *client,
+                            remote_message_header *req,
+                            void *args ATTRIBUTE_UNUSED,
+                            remote_auth_sasl_init_ret *ret)
+{
+#if HAVE_SASL
+    const char *mechlist = NULL;
+    int err;
+
+    qemudDebug("Initialize SASL auth\n");
+    if (client->auth != REMOTE_AUTH_SASL ||
+        client->saslconn != NULL) {
+        qemudLog(QEMUD_ERR, "client tried illegal SASL init request");
+        remoteDispatchFailAuth(client, req);
+        return -2;
+    }
+
+    err = sasl_server_new("libvirt",
+                          NULL, /* XXX Server FQDN */
+                          NULL, /* XXX User realm */
+                          NULL, /* XXX IP local */
+                          NULL, /* XXX IP remote */
+                          NULL, /* XXX Callbacks */
Hum, what kind of callback is allowed there ?
...
+                          SASL_SUCCESS_DATA,
+                          &client->saslconn);
+    if (err != SASL_OK) {
+        qemudLog(QEMUD_ERR, "sasl context setup failed %d (%s)",
+                 err, sasl_errstring(err, NULL, NULL));
+        remoteDispatchFailAuth(client, req);
+        client->saslconn = NULL;
+        return -2;
+    }
+
+    err = sasl_listmech(client->saslconn,
+                        NULL, /* XXX username */
+                        "", /* Prefix */
+                        ",", /* Separator */
+                        "", /* Suffix */
+                        &mechlist,
+                        NULL,
+                        NULL);
Also we start to see the identity problems showing up there. Maybe I'm
logged as veillard on the monitoring machine but the realm used to monitor
domain instances on the cluster is 'virtadmin' and I could have token for
both kind but we risk using the wrong one when doing the connection.
[...]
...
--- a/qemud/remote_protocol.x	Wed Oct 17 15:03:04 2007 -0400
+++ b/qemud/remote_protocol.x	Wed Oct 17 15:05:13 2007 -0400
@@ -80,6 +80,10 @@ const REMOTE_NETWORK_NAME_LIST_MAX = 256
/* Upper limit on list of scheduler parameters. */
 const REMOTE_DOMAIN_SCHEDULER_PARAMETERS_MAX = 16;
+
+/* Upper limit on SASL auth negotiation packet */
+/* XXX is this large enough for all SASL mechs ? */
+const REMOTE_AUTH_SASL_DATA_MAX = 8192;
/* UUID.  VIR_UUID_BUFLEN definition comes from libvirt.h */
 typedef opaque remote_uuid[VIR_UUID_BUFLEN];
@@ -121,6 +125,12 @@ struct remote_error {
     int int1;
     int int2;
     remote_network net;
+};
+
+/* Sent back with REMOTE_NEED_AUTH */
+enum remote_auth_type {
+    REMOTE_AUTH_NONE = 0,
+    REMOTE_AUTH_SASL = 1
 };
/* Wire encoding of virVcpuInfo. */
@@ -610,6 +620,37 @@ struct remote_network_set_autostart_args
 struct remote_network_set_autostart_args {
     remote_nonnull_network net;
     int autostart;
+};
+
+struct remote_auth_sasl_init_ret {
+    remote_nonnull_string mechlist;
+};
+
+struct remote_auth_sasl_start_args {
+    remote_nonnull_string mech;
+    int nil;
+    unsigned datalen;
+    char data[REMOTE_AUTH_SASL_DATA_MAX];
+};
+
+struct remote_auth_sasl_start_ret {
+    int complete;
+    int nil;
+    unsigned datalen;
+    char data[REMOTE_AUTH_SASL_DATA_MAX];
+};
+
+struct remote_auth_sasl_step_args {
+    int nil;
+    unsigned datalen;
+    char data[REMOTE_AUTH_SASL_DATA_MAX];
+};
+
+struct remote_auth_sasl_step_ret {
+    int complete;
+    int nil;
+    unsigned datalen;
+    char data[REMOTE_AUTH_SASL_DATA_MAX];
 };
Hum, maybe a variable lenght string is really the good solution
in the long term, even if it's a bit more complex
...
/*----- Protocol. -----*/
@@ -683,7 +724,10 @@ enum remote_procedure {
     REMOTE_PROC_DOMAIN_MIGRATE_PERFORM = 62,
     REMOTE_PROC_DOMAIN_MIGRATE_FINISH = 63,
     REMOTE_PROC_DOMAIN_BLOCK_STATS = 64,
-    REMOTE_PROC_DOMAIN_INTERFACE_STATS = 65
+    REMOTE_PROC_DOMAIN_INTERFACE_STATS = 65,
+    REMOTE_PROC_AUTH_SASL_INIT = 66,
+    REMOTE_PROC_AUTH_SASL_START = 67,
+    REMOTE_PROC_AUTH_SASL_STEP = 68
 };
okay the 3 new calls
...
/* Custom RPC structure. */
@@ -714,7 +758,12 @@ enum remote_message_status {
     /* For replies, indicates that an error happened, and a struct
      * remote_error follows.
      */
-    REMOTE_ERROR = 1
+    REMOTE_ERROR = 1,
+
+    /* Require authentication before this call is allowed
+     * remote_auth_type follows.
+     */
+    REMOTE_AUTH = 2
 };
[...]
...
diff -r 2ebd10b676a9 src/remote_internal.c
--- a/src/remote_internal.c	Wed Oct 17 15:03:04 2007 -0400
+++ b/src/remote_internal.c	Wed Oct 17 16:15:14 2007 -0400
@@ -46,6 +46,9 @@
 #include <gnutls/gnutls.h>
 #include <gnutls/x509.h>
 #include "gnutls_1_0_compat.h"
+#if HAVE_SASL
+#include <sasl/sasl.h>
+#endif
 #include <libxml/uri.h>
#include "internal.h"
@@ -71,6 +74,8 @@ struct private_data {
     int counter;                /* Generates serial numbers for RPC. */
     char *uri;                  /* Original (remote) URI. */
     int networkOnly;            /* Only used for network API */
+    char *hostname;             /* Original hostname */
+    FILE *debugLog;             /* Debug remote protocol */
 };
being able to debug a silent SASL auth sounds a really good idea <grin/>
...
#define GET_PRIVATE(conn,retcode)                                       \
@@ -89,7 +94,14 @@ struct private_data {
         return (retcode);                                               \
     }
-static int call (virConnectPtr conn, struct private_data *priv, int in_open, int proc_nr, xdrproc_t args_filter, char *args, xdrproc_t ret_filter, char *ret);
+static int call (virConnectPtr conn, struct private_data *priv,
+                 int in_open, int proc_nr,
+                 xdrproc_t args_filter, char *args,
+                 xdrproc_t ret_filter, char *ret);
+static int onecall (virConnectPtr conn, struct private_data *priv,
+                    int in_open, int in_auth, int proc_nr,
+                    xdrproc_t args_filter, char *args,
+                    xdrproc_t ret_filter, char *ret);
the call split, okay

[...]
...
+#if HAVE_SASL
+/* Perform the SASL authentication process
+ *
+ * XXX negotiate a session encryption layer for non-TLS sockets
+ * XXX fetch credentials from a libvirt client app callback
+ * XXX fill in local/remote IP details
+ * XXX use the real hostname from the connect URI
+ * XXX max packet size spec
+ * XXX better mechanism negotiation ? Ask client app ?
+ */
+static int
+remoteAuthSasl (virConnectPtr conn, struct private_data *priv,
+                int in_open /* if we are in virConnectOpen */) {
+    sasl_conn_t *saslconn;
+    remote_auth_sasl_init_ret iret;
+    remote_auth_sasl_start_args sargs;
+    remote_auth_sasl_start_ret sret;
+    remote_auth_sasl_step_args pargs;
+    remote_auth_sasl_step_ret pret;
+    const char *clientout, *serverin;
+    unsigned int clientoutlen, serverinlen;
+    const char *mech;
+    int err, complete;
+
+    remoteDebug(priv, "Client initialize SASL authentication");
+    err = sasl_client_init(NULL);
+    if (err != SASL_OK) {
+        __virRaiseError (in_open ? NULL : conn, NULL, NULL, VIR_FROM_REMOTE,
+                         VIR_ERR_AUTH_FAILED, VIR_ERR_ERROR, NULL, NULL, NULL, 0, 0,
+                         "failed to initialize SASL library: %d (%s)",
+                         err, sasl_errstring(err, NULL, NULL));
+        return -1;
+    }
+
+    err = sasl_client_new("libvirt",
+                          priv->hostname,
+                          NULL, /* XXX local addr */
+                          NULL, /* XXX remote addr */
+                          NULL, /* XXX callbacks */
+                          SASL_SUCCESS_DATA,
+                          &saslconn);
+    if (err != SASL_OK) {
+        __virRaiseError (in_open ? NULL : conn, NULL, NULL, VIR_FROM_REMOTE,
+                         VIR_ERR_AUTH_FAILED, VIR_ERR_ERROR, NULL, NULL, NULL, 0, 0,
+                         "Failed to create SASL client context: %d (%s)",
+                         err, sasl_errstring(err, NULL, NULL));
+        return -1;
+    }
+
+    /* First call is to inquire about supported mechanisms in the server */
+
+    memset (&iret, 0, sizeof iret);
+    if (onecall (conn, priv, in_open, 1, REMOTE_PROC_AUTH_SASL_INIT,
+                 (xdrproc_t) xdr_void, (char *)NULL,
+                 (xdrproc_t) xdr_remote_auth_sasl_init_ret, (char *) &iret) != 0) {
+        sasl_dispose(&saslconn);
+        return -1; /* virError already set by onecall */
+    }
+
+
+    /* Start the auth negotiation on the client end first */
+    remoteDebug(priv, "Client start negotiation mechlist '%s'", iret.mechlist);
+    err = sasl_client_start(saslconn,
+                            iret.mechlist,
+                            NULL, /* XXX interactions */
+                            &clientout,
+                            &clientoutlen,
+                            &mech);
+    if (err != SASL_OK && err != SASL_CONTINUE) {
+        __virRaiseError (in_open ? NULL : conn, NULL, NULL, VIR_FROM_REMOTE,
+                         VIR_ERR_AUTH_FAILED, VIR_ERR_ERROR, NULL, NULL, NULL, 0, 0,
+                         "Failed to start SASL negotiation: %d (%s)",
+                         err, sasl_errstring(err, NULL, NULL));
+        free(iret.mechlist);
+        sasl_dispose(&saslconn);
+        return -1;
+    }
+    free(iret.mechlist);
+
+    if (clientoutlen > REMOTE_AUTH_SASL_DATA_MAX) {
+        __virRaiseError (in_open ? NULL : conn, NULL, NULL, VIR_FROM_REMOTE,
+                         VIR_ERR_AUTH_FAILED, VIR_ERR_ERROR, NULL, NULL, NULL, 0, 0,
+                         "SASL negotiation data too long: %d bytes", clientoutlen);
+        sasl_dispose(&saslconn);
+        return -1;
+    }
+    /* NB, distinction of NULL vs "" is *critical* in SASL */
+    if (clientout)
+        memcpy(sargs.data, clientout, clientoutlen);
+    else
+        sargs.nil = 1;
+    sargs.datalen = clientoutlen;
+    sargs.mech = (char*)mech;
+    remoteDebug(priv, "Server start negotiation with mech %s. Data %d bytes %p", mech, clientoutlen, clientout);
+
+    /* Now send the initial auth data to the server */
+    memset (&sret, 0, sizeof sret);
+    if (onecall (conn, priv, in_open, 1, REMOTE_PROC_AUTH_SASL_START,
+                 (xdrproc_t) xdr_remote_auth_sasl_start_args, (char *) &sargs,
+                 (xdrproc_t) xdr_remote_auth_sasl_start_ret, (char *) &sret) != 0) {
+        sasl_dispose(&saslconn);
+        return -1; /* virError already set by onecall */
+    }
+
+    complete = sret.complete;
+    /* NB, distinction of NULL vs "" is *critical* in SASL */
+    serverin = sret.nil ? NULL : sret.data;
+    serverinlen = sret.datalen;
+    remoteDebug(priv, "Client step result complete: %d. Data %d bytes %p",
+                complete, serverinlen, serverin);
+
+    /* Loop-the-loop...
+     * Even if the server has completed, the client must *always* do at least one step
+     * in this loop to verify the server isn't lieing about something. Mutual auth */
+    for (;;) {
+        err = sasl_client_step(saslconn,
+                               serverin,
+                               serverinlen,
+                               NULL, /* XXX interactions */
+                               &clientout,
+                               &clientoutlen);
+        if (err != SASL_OK && err != SASL_CONTINUE) {
+            __virRaiseError (in_open ? NULL : conn, NULL, NULL, VIR_FROM_REMOTE,
+                             VIR_ERR_AUTH_FAILED, VIR_ERR_ERROR, NULL, NULL, NULL, 0, 0,
+                             "Failed step %d %s",
+                             err, sasl_errstring(err, NULL, NULL));
+            sasl_dispose(&saslconn);
+            return -1;
+        }
+        remoteDebug(priv, "Client step result %d. Data %d bytes %p", err, clientoutlen, clientout);
+
+        /* Previous server call showed completion & we're now locally complete too */
+        if (complete && err == SASL_OK)
+            break;
+
+        /* Not done, prepare to talk with the server for another iteration */
+        /* NB, distinction of NULL vs "" is *critical* in SASL */
+        if (clientout) {
+            memcpy(pargs.data, clientout, clientoutlen);
+            pargs.nil = 0;
+        } else {
+            pargs.nil = 1;
+        }
+        pargs.datalen = clientoutlen;
+        remoteDebug(priv, "Server step with %d bytes %p", clientoutlen, clientout);
+
+        memset (&pret, 0, sizeof pret);
+        if (onecall (conn, priv, in_open, 1, REMOTE_PROC_AUTH_SASL_STEP,
+                     (xdrproc_t) xdr_remote_auth_sasl_step_args, (char *) &pargs,
+                     (xdrproc_t) xdr_remote_auth_sasl_step_ret, (char *) &pret) != 0) {
+            sasl_dispose(&saslconn);
+            return -1; /* virError already set by onecall */
+        }
+
+        complete = pret.complete;
+        /* NB, distinction of NULL vs "" is *critical* in SASL */
+        serverin = pret.nil ? NULL : pret.data;
+        serverinlen = pret.datalen;
+
+        remoteDebug(priv, "Client step result complete: %d. Data %d bytes %p",
+                    complete, serverinlen, serverin);
+
+        /* This server call shows complete, and earlier client step was OK */
+        if (complete && err == SASL_OK)
+            break;
+    }
+
+    remoteDebug(priv, "SASL authentication complete");
+    /* XXX keep this around for wire encoding */
+    sasl_dispose(&saslconn);
+    return 0;
+}
+#endif
Hum, head hurts at this point ...

[...]
...
diff -r 2ebd10b676a9 src/virsh.c
--- a/src/virsh.c	Wed Oct 17 15:03:04 2007 -0400
+++ b/src/virsh.c	Wed Oct 17 15:03:07 2007 -0400
@@ -4512,8 +4512,10 @@ vshInit(vshControl * ctl)
      * vshConnectionUsability, except ones which don't need a connection
      * such as "help".
      */
-    if (!ctl->conn)
+    if (!ctl->conn) {
         vshError(ctl, FALSE, _("failed to connect to the hypervisor"));
+        return FALSE;
+    }
return TRUE;
 }
Hum, could you push that change to CVS independantly ?

  Thanks a lot, obviously there is a bit more to be done before commiting,
but I don't think we need to define the authentication API as a prerequisite
to push this in, it's okay to rely on background auth as a first step.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard@redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/