[libvirt] [PATCH 0/3] Fix remote dispatch code trying to allocate 0-sized return buffers

This happens because of the switch to glib whose method g_malloc0 actually correctly returns NULL on size 0 which doesn't make sense to do. The outcome of that is that because virAllocN always returns 0, the dispatch code never fails allocation and passes the NULL pointer to the server-side public API which actually checks that and fails. https://bugzilla.redhat.com/show_bug.cgi?id=1772842 Erik Skultety (3): rpc: gendispatch: Fix a couple of places adding trailing spaces rpc: gendispatch: Add a check for zero size client-side buffers libvirt-domain: virConnectListDomains: Return 0 on zero-size buffer src/libvirt-domain.c | 3 +++ src/rpc/gendispatch.pl | 12 ++++++++++-- 2 files changed, 13 insertions(+), 2 deletions(-) -- 2.23.0

Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/rpc/gendispatch.pl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/rpc/gendispatch.pl b/src/rpc/gendispatch.pl index 7c868191d1..8656c8f205 100755 --- a/src/rpc/gendispatch.pl +++ b/src/rpc/gendispatch.pl @@ -1062,7 +1062,7 @@ elsif ($mode eq "server") { print "\n"; print " $conn_type $conn_var = $conn_method(client);\n"; - print " if (!$conn_var) \n"; + print " if (!$conn_var)\n"; print " goto cleanup;\n"; print "\n"; @@ -1144,7 +1144,7 @@ elsif ($mode eq "server") { print "\n"; } else { if ($modern_ret_as_list) { - print " if ((nresults = \n"; + print " if ((nresults =\n"; my $indln = " $prefix$call->{ProcName}("; print $indln; print join(",\n" . ' ' x (length $indln), @args_list); -- 2.23.0

After libvirt switched to GLib, we also started to use glib allocation primitives as of commit e85e34f3. Unlike malloc which is ambiguous with regards to size == 0 (which in our case returned a unique pointer safe to be passed to free), g_malloc0 strictly returns NULL on size == 0. This change broke our legacy APIs which rely on the caller to pre-allocate the target buffer to hold the results and pass the buffer size as an argument. Since it makes very little sense to call an API with nowhere to store the results, fix this by returning 0 directly in such case in the RPC dispatch code - there are modern API equivalents allocating the target buffer automatically anyway. https://bugzilla.redhat.com/show_bug.cgi?id=1772842 Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/rpc/gendispatch.pl | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/src/rpc/gendispatch.pl b/src/rpc/gendispatch.pl index 8656c8f205..524d31f741 100755 --- a/src/rpc/gendispatch.pl +++ b/src/rpc/gendispatch.pl @@ -1073,6 +1073,14 @@ elsif ($mode eq "server") { print " goto cleanup;\n"; print " }\n"; print "\n"; + + + print " if (args->$single_ret_list_max_var == 0) {\n"; + print " ret->$single_ret_list_name.${single_ret_list_name}_len = 0;\n"; + print " rv = 0;\n"; + print " goto cleanup;\n"; + print " }\n"; + print "\n"; } print join("\n", @getters_list); -- 2.23.0

It doesn't make sense to pass a target buffer into an API, declaring its size as 0 and expect some meaningful result. Since this used to work pre-Glib era, we shouldn't end with an error, but we can return 0 for the number of domains immediately, instead of calling into the daemon, which is exactly what the daemon would have returned anyway. Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/libvirt-domain.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 02622cb2ca..0def40fdf7 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -62,6 +62,9 @@ virConnectListDomains(virConnectPtr conn, int *ids, int maxids) virCheckNonNullArgGoto(ids, error); virCheckNonNegativeArgGoto(maxids, error); + if (maxids == 0) + return 0; + if (conn->driver->connectListDomains) { int ret = conn->driver->connectListDomains(conn, ids, maxids); if (ret < 0) -- 2.23.0

On Mon, Nov 18, 2019 at 01:18:19PM +0100, Erik Skultety wrote:
It doesn't make sense to pass a target buffer into an API, declaring its size as 0 and expect some meaningful result. Since this used to work pre-Glib era, we shouldn't end with an error, but we can return 0 for the number of domains immediately, instead of calling into the daemon, which is exactly what the daemon would have returned anyway.
Passing in size as 0 is going to be normal practice, given the calling convention of this API design.
Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/libvirt-domain.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 02622cb2ca..0def40fdf7 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -62,6 +62,9 @@ virConnectListDomains(virConnectPtr conn, int *ids, int maxids) virCheckNonNullArgGoto(ids, error); virCheckNonNegativeArgGoto(maxids, error);
+ if (maxids == 0) + return 0;
This is too late really, as we alrady checked 'ids'. IMHO, we should only mandate a non-NULL 'ids' parameter when maxids > 0 a few lines earlier All the other legacy APIs share the same validation flaw so will also need fixing. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Mon, Nov 18, 2019 at 01:04:01PM +0000, Daniel P. Berrangé wrote:
On Mon, Nov 18, 2019 at 01:18:19PM +0100, Erik Skultety wrote:
It doesn't make sense to pass a target buffer into an API, declaring its size as 0 and expect some meaningful result. Since this used to work pre-Glib era, we shouldn't end with an error, but we can return 0 for the number of domains immediately, instead of calling into the daemon, which is exactly what the daemon would have returned anyway.
Passing in size as 0 is going to be normal practice, given the calling convention of this API design.
Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/libvirt-domain.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 02622cb2ca..0def40fdf7 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -62,6 +62,9 @@ virConnectListDomains(virConnectPtr conn, int *ids, int maxids) virCheckNonNullArgGoto(ids, error); virCheckNonNegativeArgGoto(maxids, error);
+ if (maxids == 0) + return 0;
This is too late really, as we alrady checked 'ids'.
Why is ^this a problem? The pointer has to be allocated prior to calling into the API, so a failure on a NULL pointer on client-side is fine. On server side, the issue is remediated much earlier in the RPC dispatch code.
IMHO, we should only mandate a non-NULL 'ids' parameter when maxids > 0 a few lines earlier
All the other legacy APIs share the same validation flaw so will also need fixing.
Right, I was too hasty, since gendispatch took care of the server side automatically, I forgot to adjust the public APIs manually, will do :). Erik

On Mon, Nov 18, 2019 at 02:26:31PM +0100, Erik Skultety wrote:
On Mon, Nov 18, 2019 at 01:04:01PM +0000, Daniel P. Berrangé wrote:
On Mon, Nov 18, 2019 at 01:18:19PM +0100, Erik Skultety wrote:
It doesn't make sense to pass a target buffer into an API, declaring its size as 0 and expect some meaningful result. Since this used to work pre-Glib era, we shouldn't end with an error, but we can return 0 for the number of domains immediately, instead of calling into the daemon, which is exactly what the daemon would have returned anyway.
Passing in size as 0 is going to be normal practice, given the calling convention of this API design.
Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/libvirt-domain.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 02622cb2ca..0def40fdf7 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -62,6 +62,9 @@ virConnectListDomains(virConnectPtr conn, int *ids, int maxids) virCheckNonNullArgGoto(ids, error); virCheckNonNegativeArgGoto(maxids, error);
+ if (maxids == 0) + return 0;
This is too late really, as we alrady checked 'ids'.
Why is ^this a problem? The pointer has to be allocated prior to calling into the API, so a failure on a NULL pointer on client-side is fine. On server side, the issue is remediated much earlier in the RPC dispatch code.
The current regression is caused server-side, but I'm saying the problem in general is pre-existing even client side. Chances are that people have already had to work around this when calling it client side. If we fix this though, we should fix it so that we aovid the problem both client & server side. IOW, rather than taking the approach to avoiding calling the API when num==0, make it valid to do the call, so that our public API's behaviour isn't dependant on whether the client's malloc() returns NULL or not for a zero sized allocation. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Mon, Nov 18, 2019 at 01:50:03PM +0000, Daniel P. Berrangé wrote:
On Mon, Nov 18, 2019 at 02:26:31PM +0100, Erik Skultety wrote:
On Mon, Nov 18, 2019 at 01:04:01PM +0000, Daniel P. Berrangé wrote:
On Mon, Nov 18, 2019 at 01:18:19PM +0100, Erik Skultety wrote:
It doesn't make sense to pass a target buffer into an API, declaring its size as 0 and expect some meaningful result. Since this used to work pre-Glib era, we shouldn't end with an error, but we can return 0 for the number of domains immediately, instead of calling into the daemon, which is exactly what the daemon would have returned anyway.
Passing in size as 0 is going to be normal practice, given the calling convention of this API design.
Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/libvirt-domain.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 02622cb2ca..0def40fdf7 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -62,6 +62,9 @@ virConnectListDomains(virConnectPtr conn, int *ids, int maxids) virCheckNonNullArgGoto(ids, error); virCheckNonNegativeArgGoto(maxids, error);
+ if (maxids == 0) + return 0;
This is too late really, as we alrady checked 'ids'.
Why is ^this a problem? The pointer has to be allocated prior to calling into the API, so a failure on a NULL pointer on client-side is fine. On server side, the issue is remediated much earlier in the RPC dispatch code.
The current regression is caused server-side, but I'm saying the problem in general is pre-existing even client side. Chances are that people have already had to work around this when calling it client side.
If we fix this though, we should fix it so that we aovid the problem both client & server side.
IOW, rather than taking the approach to avoiding calling the API when num==0, make it valid to do the call, so that our public API's behaviour isn't dependant on whether the client's malloc() returns NULL or not for a zero sized allocation.
I could move the same check a few lines earlier above the virCheckNonNullArgGoto check, but that doesn't go with your idea of allowing the client to make the RPC call. Personally, I find it rather pointless to make an RPC call if we can deterministically tell that the result is going to be 0, effectively a NOP. Erik

On Mon, Nov 18, 2019 at 03:20:00PM +0100, Erik Skultety wrote:
On Mon, Nov 18, 2019 at 01:50:03PM +0000, Daniel P. Berrangé wrote:
On Mon, Nov 18, 2019 at 02:26:31PM +0100, Erik Skultety wrote:
On Mon, Nov 18, 2019 at 01:04:01PM +0000, Daniel P. Berrangé wrote:
On Mon, Nov 18, 2019 at 01:18:19PM +0100, Erik Skultety wrote:
It doesn't make sense to pass a target buffer into an API, declaring its size as 0 and expect some meaningful result. Since this used to work pre-Glib era, we shouldn't end with an error, but we can return 0 for the number of domains immediately, instead of calling into the daemon, which is exactly what the daemon would have returned anyway.
Passing in size as 0 is going to be normal practice, given the calling convention of this API design.
Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/libvirt-domain.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 02622cb2ca..0def40fdf7 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -62,6 +62,9 @@ virConnectListDomains(virConnectPtr conn, int *ids, int maxids) virCheckNonNullArgGoto(ids, error); virCheckNonNegativeArgGoto(maxids, error);
+ if (maxids == 0) + return 0;
This is too late really, as we alrady checked 'ids'.
Why is ^this a problem? The pointer has to be allocated prior to calling into the API, so a failure on a NULL pointer on client-side is fine. On server side, the issue is remediated much earlier in the RPC dispatch code.
The current regression is caused server-side, but I'm saying the problem in general is pre-existing even client side. Chances are that people have already had to work around this when calling it client side.
If we fix this though, we should fix it so that we aovid the problem both client & server side.
IOW, rather than taking the approach to avoiding calling the API when num==0, make it valid to do the call, so that our public API's behaviour isn't dependant on whether the client's malloc() returns NULL or not for a zero sized allocation.
I could move the same check a few lines earlier above the virCheckNonNullArgGoto check, but that doesn't go with your idea of allowing the client to make the RPC call. Personally, I find it rather pointless to make an RPC call if we can deterministically tell that the result is going to be 0, effectively a NOP.
Well there's access control checks performed server side. We're not going to leak any data by not doing the call, but we will hide the access control failure message if we bypass it. IMHO, we shouldn't be making assumptions client side about what the server side will do. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (2)
-
Daniel P. Berrangé
-
Erik Skultety