[libvirt] Storage pool support for large scale deployments

I am using libvirt for large scale deployment of virtual machines. I ran into an issue where virConnectListStoragePools and virConnectListDefinedStoragePools return an error if there are more than 256 pools. The following patch increases this limitation. Currently, I am only using in the neighborhood of 500 pools, but this is likely to grow into the tens of thousands. This patch did not cause any test failures. I wrote some test code to verify the fix worked. The test code and the results follow. TEST CODE: /* pool-test src code */ #include <libvirt.h> #include <stdlib.h> #include <stdio.h> static int get_active_pools(virConnectPtr vir_conn) { int ret_code = -1; int rc = -1; int num_pools = -1; char** pools = NULL; printf("Before virConnectNumOfStoragePools\n"); num_pools = virConnectNumOfStoragePools(vir_conn); printf("After virConnectNumOfStoragePools\n"); if(0 >= num_pools) goto cleanup; pools = malloc(sizeof(*pools) * num_pools); printf("Before virConnectListStoragePools\n"); rc = virConnectListStoragePools(vir_conn, pools, num_pools); printf("After virConnectListStoragePools\n"); if(-1 == rc) goto cleanup; for(int i = 0; i < num_pools; ++i) { if(NULL != pools[i]) free(pools[i]); } ret_code = num_pools; cleanup: if(NULL != pools) free(pools); return ret_code; } static int get_inactive_pools(virConnectPtr vir_conn) { int ret_code = -1; int rc = -1; int num_pools = -1; char** pools = NULL; printf("Before virConnectNumOfDefinedStoragePools\n"); num_pools = virConnectNumOfDefinedStoragePools(vir_conn); printf("After virConnectNumOfDefinedStoragePools\n"); if(0 >= num_pools) goto cleanup; pools = malloc(sizeof(*pools) * num_pools); printf("Before virConnectListDefinedStoragePools\n"); rc = virConnectListDefinedStoragePools(vir_conn, pools, num_pools); printf("After virConnectListDefinedStoragePools\n"); if(-1 == rc) goto cleanup; for(int i = 0; i < num_pools; ++i) { if(NULL != pools[i]) free(pools[i]); } ret_code = num_pools; cleanup: if(NULL != pools) free(pools); return ret_code; } int main(int argc, char** argv) { virConnectPtr conn = virConnectOpen("qemu:///system"); printf("Active: %d\n" , get_active_pools(conn )); printf("Inactive: %d\n", get_inactive_pools(conn)); virConnectClose(conn); return 0; } TEST RESULTS PRE-PATCH: # for i in `ls -1 /etc/libvirt/storage/*`; do virsh pool-start $(basename $i .xml); done &>/dev/null # ~/pool-test Before virConnectNumOfStoragePools After virConnectNumOfStoragePools Before virConnectListStoragePools libvir: Remote error : too many remote undefineds: 407 > 256 After virConnectListStoragePools Active: -1 Before virConnectNumOfDefinedStoragePools After virConnectNumOfDefinedStoragePools Inactive: -1 # for i in `ls -1 /etc/libvirt/storage/*`; do virsh pool-destroy $(basename $i .xml); done &>/dev/null # ~/pool-test Before virConnectNumOfStoragePools After virConnectNumOfStoragePools Active: -1 Before virConnectNumOfDefinedStoragePools After virConnectNumOfDefinedStoragePools Before virConnectListDefinedStoragePools libvir: Remote error : too many remote undefineds: 407 > 256 After virConnectListDefinedStoragePools Inactive: -1 TEST RESULTS POST-PATCH: # for i in `ls -1 /etc/libvirt/storage/*`; do virsh pool-start $(basename $i .xml); done &>/dev/null # ~/pool-test Before virConnectNumOfStoragePools After virConnectNumOfStoragePools Before virConnectListStoragePools After virConnectListStoragePools Active: 407 Before virConnectNumOfDefinedStoragePools After virConnectNumOfDefinedStoragePools Inactive: -1 # for i in `ls -1 /etc/libvirt/storage/*`; do virsh pool-destroy $(basename $i .xml); done &>/dev/null # ~/pool-test Before virConnectNumOfStoragePools After virConnectNumOfStoragePools Active: -1 Before virConnectNumOfDefinedStoragePools After virConnectNumOfDefinedStoragePools Before virConnectListDefinedStoragePools After virConnectListDefinedStoragePools Inactive: 407

256 (8 bits) is insufficient for large scale deployments. 65536 (16 bits) is a more appropriate limit and should be sufficient. You are more likely to run into other system limitations first, such as the 31998 inode link limit on ext3. --- src/remote/remote_protocol.x | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index 59774b2..58f0871 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -103,7 +103,7 @@ const REMOTE_INTERFACE_NAME_LIST_MAX = 256; const REMOTE_DEFINED_INTERFACE_NAME_LIST_MAX = 256; /* Upper limit on lists of storage pool names. */ -const REMOTE_STORAGE_POOL_NAME_LIST_MAX = 256; +const REMOTE_STORAGE_POOL_NAME_LIST_MAX = 65536; /* Upper limit on lists of storage vol names. */ const REMOTE_STORAGE_VOL_NAME_LIST_MAX = 1024; -- 1.7.2.5

On 03/15/2012 09:42 AM, Jesse J. Cook wrote:
256 (8 bits) is insufficient for large scale deployments. 65536 (16 bits) is a more appropriate limit and should be sufficient. You are more likely to run into other system limitations first, such as the 31998 inode link limit on ext3. --- src/remote/remote_protocol.x | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index 59774b2..58f0871 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -103,7 +103,7 @@ const REMOTE_INTERFACE_NAME_LIST_MAX = 256; const REMOTE_DEFINED_INTERFACE_NAME_LIST_MAX = 256;
/* Upper limit on lists of storage pool names. */ -const REMOTE_STORAGE_POOL_NAME_LIST_MAX = 256; +const REMOTE_STORAGE_POOL_NAME_LIST_MAX = 65536;
Seems we have much problem of the array length for the RPC calls. A similiar problem with VOL_NAME_LIST_MAX: https://bugzilla.redhat.com/show_bug.cgi?id=802357 Osier

On Thu, Mar 15, 2012 at 3:42 AM, Osier Yang <jyang@redhat.com> wrote:
On 03/15/2012 09:42 AM, Jesse J. Cook wrote:
256 (8 bits) is insufficient for large scale deployments. 65536 (16 bits) is a more appropriate limit and should be sufficient. You are more likely to run into other system limitations first, such as the 31998 inode link limit on ext3. --- src/remote/remote_protocol.x | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index 59774b2..58f0871 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -103,7 +103,7 @@ const REMOTE_INTERFACE_NAME_LIST_MAX = 256; const REMOTE_DEFINED_INTERFACE_NAME_LIST_MAX = 256;
/* Upper limit on lists of storage pool names. */ -const REMOTE_STORAGE_POOL_NAME_LIST_MAX = 256; +const REMOTE_STORAGE_POOL_NAME_LIST_MAX = 65536;
Seems we have much problem of the array length for the RPC calls. A similiar problem with VOL_NAME_LIST_MAX:
https://bugzilla.redhat.com/show_bug.cgi?id=802357
Osier
I will need this change as well. I can patch and test. -- Jesse

On Wed, Mar 14, 2012 at 08:42:35PM -0500, Jesse J. Cook wrote:
256 (8 bits) is insufficient for large scale deployments. 65536 (16 bits) is a more appropriate limit and should be sufficient. You are more likely to run into other system limitations first, such as the 31998 inode link limit on ext3. --- src/remote/remote_protocol.x | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index 59774b2..58f0871 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -103,7 +103,7 @@ const REMOTE_INTERFACE_NAME_LIST_MAX = 256; const REMOTE_DEFINED_INTERFACE_NAME_LIST_MAX = 256;
/* Upper limit on lists of storage pool names. */ -const REMOTE_STORAGE_POOL_NAME_LIST_MAX = 256; +const REMOTE_STORAGE_POOL_NAME_LIST_MAX = 65536;
/* Upper limit on lists of storage vol names. */ const REMOTE_STORAGE_VOL_NAME_LIST_MAX = 1024;
We have to think about what the compatibility implications are for this kind of change. eg what is the behaviour when old client talks to new server, and vica-verca. It might be fine, but I'd like someone to enumerate the before & after behaviour in all combinations. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Just to clarify, you would like to see: v0.9.10 pre-patch client talk to v0.9.10 patch server v0.9.10 patch client talk to v0.9.10 pre-patch server Would the test code I used in my cover letter be sufficient? If so, I could probably test this fairly easily and quickly today. On Thu, Mar 15, 2012 at 5:14 AM, Daniel P. Berrange <berrange@redhat.com>wrote:
On Wed, Mar 14, 2012 at 08:42:35PM -0500, Jesse J. Cook wrote:
256 (8 bits) is insufficient for large scale deployments. 65536 (16 bits) is a more appropriate limit and should be sufficient. You are more likely to run into other system limitations first, such as the 31998 inode link limit on ext3. --- src/remote/remote_protocol.x | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index 59774b2..58f0871 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -103,7 +103,7 @@ const REMOTE_INTERFACE_NAME_LIST_MAX = 256; const REMOTE_DEFINED_INTERFACE_NAME_LIST_MAX = 256;
/* Upper limit on lists of storage pool names. */ -const REMOTE_STORAGE_POOL_NAME_LIST_MAX = 256; +const REMOTE_STORAGE_POOL_NAME_LIST_MAX = 65536;
/* Upper limit on lists of storage vol names. */ const REMOTE_STORAGE_VOL_NAME_LIST_MAX = 1024;
We have to think about what the compatibility implications are for this kind of change. eg what is the behaviour when old client talks to new server, and vica-verca. It might be fine, but I'd like someone to enumerate the before & after behaviour in all combinations.
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/:| |: http://libvirt.org -o- http://virt-manager.org:| |: http://autobuild.org -o- http://search.cpan.org/~danberr/:| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc:|

On Thu, Mar 15, 2012 at 06:22:38AM -0500, Jesse Cook wrote:
Just to clarify, you would like to see:
v0.9.10 pre-patch client talk to v0.9.10 patch server v0.9.10 patch client talk to v0.9.10 pre-patch server
And for comparison v0.9.10 pre-patch client talk to v0.9.10 pre-patch server Obviously for v0.9.10 patch client talk to v0.9.10 patch server I'd expect "just works" :-)
Would the test code I used in my cover letter be sufficient? If so, I could probably test this fairly easily and quickly today.
Yep. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/15/2012 05:51 AM, Daniel P. Berrange wrote:
On Thu, Mar 15, 2012 at 06:22:38AM -0500, Jesse Cook wrote:
Just to clarify, you would like to see:
I have not actually tried this, but based on my understanding of the RPC famework, I'd expect:
v0.9.10 pre-patch client talk to v0.9.10 patch server
If there are 256 or fewer names, this works as it always did. If there are more than 256 names, then the server tries to send back more array elements than the client is prepared to receive - the client will treat the response as an invalid RPC, and my recollection is that under these circumstances, the client then forcefully disconnects the session under the assumption that the server is broken. To avoid breaking such clients, you'd probably need some way for clients to warn the server if they are prepared to receive more than the old max (similar to how we had to add VIR_TYPED_PARAM_STRING_OKAY as a way for clients to tell the server that they were prepared to handle typed strings being returned). Basically, any time you change the RPC to allow more data to be returned than what was previously possible, the server should not return that additional data unless the client has negotiated that it is aware of the additional data.
v0.9.10 patch client talk to v0.9.10 pre-patch server
No problem for the client. The server will never successfully return more than 256 names. However, if the client requests more than 256 names, I'm not sure whether the server will silently truncate to 256 or whether it will return an RPC error because it was unable to provide the reply.
And for comparison
v0.9.10 pre-patch client talk to v0.9.10 pre-patch server
Should be the same behavior as a v0.9.10-patch client talking to a pre-patch server - you can't successfully send more than 256 names, but I'm not sure how the server will react if the client requests too many names. Actually, it might be slightly different - the client might reject the request up front without even contacting the server.
Obviously for
v0.9.10 patch client talk to v0.9.10 patch server
I'd expect "just works" :-)
Of course, if both sides are prepared to handle the new RPC protocol, it should just work. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Thu, Mar 15, 2012 at 9:43 AM, Eric Blake <eblake@redhat.com> wrote:
On 03/15/2012 05:51 AM, Daniel P. Berrange wrote:
On Thu, Mar 15, 2012 at 06:22:38AM -0500, Jesse Cook wrote:
Just to clarify, you would like to see:
I have not actually tried this, but based on my understanding of the RPC famework, I'd expect:
v0.9.10 pre-patch client talk to v0.9.10 patch server
If there are 256 or fewer names, this works as it always did.
If there are more than 256 names, then the server tries to send back more array elements than the client is prepared to receive - the client will treat the response as an invalid RPC, and my recollection is that under these circumstances, the client then forcefully disconnects the session under the assumption that the server is broken.
To avoid breaking such clients, you'd probably need some way for clients to warn the server if they are prepared to receive more than the old max (similar to how we had to add VIR_TYPED_PARAM_STRING_OKAY as a way for clients to tell the server that they were prepared to handle typed strings being returned). Basically, any time you change the RPC to allow more data to be returned than what was previously possible, the server should not return that additional data unless the client has negotiated that it is aware of the additional data.
v0.9.10 patch client talk to v0.9.10 pre-patch server
No problem for the client. The server will never successfully return more than 256 names. However, if the client requests more than 256 names, I'm not sure whether the server will silently truncate to 256 or whether it will return an RPC error because it was unable to provide the reply.
And for comparison
v0.9.10 pre-patch client talk to v0.9.10 pre-patch server
Should be the same behavior as a v0.9.10-patch client talking to a pre-patch server - you can't successfully send more than 256 names, but I'm not sure how the server will react if the client requests too many names. Actually, it might be slightly different - the client might reject the request up front without even contacting the server.
Obviously for
v0.9.10 patch client talk to v0.9.10 patch server
I'd expect "just works" :-)
Of course, if both sides are prepared to handle the new RPC protocol, it should just work.
-- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
v0.9.10 client did not want to play nicely with the v0.9.10 server via qemu+ssh. I got frustrated and just tried running the test from a client running an older version of libvirt. When connecting to v0.9.10, it behaved the same way the pre-patched did in my cover letter. I don't have full test results because of the communication errors I was getting. I'll try to either figure it out tomorrow or just use the older version as the client (pre-patch and patch). -- Jesse

On 03/15/2012 08:26 PM, Jesse Cook wrote:
v0.9.10 client did not want to play nicely with the v0.9.10 server via qemu+ssh. I got frustrated and just tried running the test from a client running an older version of libvirt. When connecting to v0.9.10, it behaved the same way the pre-patched did in my cover letter. I don't have full test results because of the communication errors I was getting. I'll try to either figure it out tomorrow or just use the older version as the client (pre-patch and patch).
Since qemu already uses rpc mechanism, I've found it easy to mix-and-match by doing this in two terminals both logged in as root, and both located in a libvirt.git checkout directory (if you don't mind building as root; if you do mind building as root, then have three, where the third is a regular user running as non-root to do the builds): setup: 1# git clone libvirt ... 1# cd libvirt 1# ./autobuild.sh --system 1# apply patches I'm interested in testing 1# make 2# cd libvirt old libvirtd server, old virsh client: 1# service libvirtd start 2# virsh ... 1# service libvirtd stop old libvirtd, new virsh 1# service libvirtd start 2# tools/virsh ... 1# service libvirtd stop new libvirtd, old virsh 1# daemon/libvirtd 2# virsh ... 1# Ctrl-C new libvirtd, new virsh 1# daemon/libvirtd 2# tools/virsh ... 1# Ctrl-C That gives me testing of all four protocol combinations, assuming that the installed version is a current release without my patch, and my just-built in-tree version has my patch. No need to complicate protocol testing with multi-host qemu+ssh:// stuff. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Fri, Mar 16, 2012 at 7:35 AM, Eric Blake <eblake@redhat.com> wrote:
On 03/15/2012 08:26 PM, Jesse Cook wrote:
v0.9.10 client did not want to play nicely with the v0.9.10 server via qemu+ssh. I got frustrated and just tried running the test from a client running an older version of libvirt. When connecting to v0.9.10, it behaved the same way the pre-patched did in my cover letter. I don't have full test results because of the communication errors I was getting. I'll try to either figure it out tomorrow or just use the older version as the client (pre-patch and patch).
Since qemu already uses rpc mechanism, I've found it easy to mix-and-match by doing this in two terminals both logged in as root, and both located in a libvirt.git checkout directory (if you don't mind building as root; if you do mind building as root, then have three, where the third is a regular user running as non-root to do the builds):
setup: 1# git clone libvirt ... 1# cd libvirt 1# ./autobuild.sh --system 1# apply patches I'm interested in testing 1# make 2# cd libvirt
old libvirtd server, old virsh client: 1# service libvirtd start 2# virsh ... 1# service libvirtd stop
old libvirtd, new virsh 1# service libvirtd start 2# tools/virsh ... 1# service libvirtd stop
new libvirtd, old virsh 1# daemon/libvirtd 2# virsh ... 1# Ctrl-C
new libvirtd, new virsh 1# daemon/libvirtd 2# tools/virsh ... 1# Ctrl-C
That gives me testing of all four protocol combinations, assuming that the installed version is a current release without my patch, and my just-built in-tree version has my patch. No need to complicate protocol testing with multi-host qemu+ssh:// stuff.
-- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Eric, Thanks. v0.9.10 was giving me a lot of trouble, so I just compiled two versions of the test code in the cover letter. One uses the patched libs and one the unchanged libs. N.B. I compiled the patched version using the following change due to the issues with using 65536: /* Upper limit on lists of storage pool names. */ -const REMOTE_STORAGE_POOL_NAME_LIST_MAX = 256; +const REMOTE_STORAGE_POOL_NAME_LIST_MAX = 7084; Results: With the patched version of libvirt installed and unpatched libvirtd running: for i in `ls -1 /etc/libvirt/storage/*`; do virsh pool-start $(basename $i .xml); done &>/dev/null ~/pool-test-unpatched Before virConnectNumOfStoragePools After virConnectNumOfStoragePools Before virConnectListStoragePools libvir: Remote error : too many remote undefineds: 407 > 256 After virConnectListStoragePools Active: -1 Before virConnectNumOfDefinedStoragePools After virConnectNumOfDefinedStoragePools Inactive: -1 ~/pool-test-patched Before virConnectNumOfStoragePools After virConnectNumOfStoragePools Before virConnectListStoragePools After virConnectListStoragePools Active: 407 Before virConnectNumOfDefinedStoragePools After virConnectNumOfDefinedStoragePools Inactive: -1 for i in `ls -1 /etc/libvirt/storage/*`; do virsh pool-destroy $(basename $i .xml); done &>/dev/null ~/pool-test-unpatched Before virConnectNumOfStoragePools After virConnectNumOfStoragePools Active: -1 Before virConnectNumOfDefinedStoragePools After virConnectNumOfDefinedStoragePools Before virConnectListDefinedStoragePools libvir: Remote error : too many remote undefineds: 407 > 256 After virConnectListDefinedStoragePools Inactive: -1 ~/pool-test-patched Before virConnectNumOfStoragePools After virConnectNumOfStoragePools Active: -1 Before virConnectNumOfDefinedStoragePools After virConnectNumOfDefinedStoragePools Before virConnectListDefinedStoragePools After virConnectListDefinedStoragePools Inactive: 407 With the unchanged version of libvirt installed and unpatched libvirtd running: for i in `ls -1 /etc/libvirt/storage/*`; do virsh pool-start $(basename $i .xml); done &>/dev/null ~/pool-test-unpatched Before virConnectNumOfStoragePools After virConnectNumOfStoragePools Before virConnectListStoragePools libvir: Remote error : too many remote undefineds: 407 > 256 After virConnectListStoragePools Active: -1 Before virConnectNumOfDefinedStoragePools After virConnectNumOfDefinedStoragePools Inactive: -1 ~/pool-test-patched Before virConnectNumOfStoragePools After virConnectNumOfStoragePools Before virConnectListStoragePools libvir: RPC error : internal error maxnames > REMOTE_STORAGE_POOL_NAME_LIST_MAX After virConnectListStoragePools Active: -1 Before virConnectNumOfDefinedStoragePools After virConnectNumOfDefinedStoragePools Inactive: -1 for i in `ls -1 /etc/libvirt/storage/*`; do virsh pool-destroy $(basename $i .xml); done &>/dev/null ~/pool-test-unpatched Before virConnectNumOfStoragePools After virConnectNumOfStoragePools Active: -1 Before virConnectNumOfDefinedStoragePools After virConnectNumOfDefinedStoragePools Before virConnectListDefinedStoragePools libvir: Remote error : too many remote undefineds: 407 > 256 After virConnectListDefinedStoragePools Inactive: -1 ~/pool-test-patched Before virConnectNumOfStoragePools After virConnectNumOfStoragePools Active: -1 Before virConnectNumOfDefinedStoragePools After virConnectNumOfDefinedStoragePools Before virConnectListDefinedStoragePools libvir: RPC error : internal error maxnames > REMOTE_STORAGE_POOL_NAME_LIST_MAX After virConnectListDefinedStoragePools Inactive: -1 -- Jesse

On Thu, Mar 15, 2012 at 5:14 AM, Daniel P. Berrange <berrange@redhat.com> wrote:
On Wed, Mar 14, 2012 at 08:42:35PM -0500, Jesse J. Cook wrote:
256 (8 bits) is insufficient for large scale deployments. 65536 (16 bits) is a more appropriate limit and should be sufficient. You are more likely to run into other system limitations first, such as the 31998 inode link limit on ext3. --- src/remote/remote_protocol.x | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index 59774b2..58f0871 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -103,7 +103,7 @@ const REMOTE_INTERFACE_NAME_LIST_MAX = 256; const REMOTE_DEFINED_INTERFACE_NAME_LIST_MAX = 256;
/* Upper limit on lists of storage pool names. */ -const REMOTE_STORAGE_POOL_NAME_LIST_MAX = 256; +const REMOTE_STORAGE_POOL_NAME_LIST_MAX = 65536;
/* Upper limit on lists of storage vol names. */ const REMOTE_STORAGE_VOL_NAME_LIST_MAX = 1024;
We have to think about what the compatibility implications are for this kind of change. eg what is the behaviour when old client talks to new server, and vica-verca. It might be fine, but I'd like someone to enumerate the before & after behaviour in all combinations.
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Sorry, I accidentally top-posted my reply originally. Just to clarify, you would like to see: v0.9.10 pre-patch client talk to v0.9.10 patch server v0.9.10 patch client talk to v0.9.10 pre-patch server Would the test code I used in my cover letter be sufficient? If so, I could probably test this fairly easily and quickly today. -- Jesse

On 03/14/2012 07:42 PM, Jesse J. Cook wrote:
256 (8 bits) is insufficient for large scale deployments. 65536 (16 bits) is a more appropriate limit and should be sufficient. You are more likely to run into other system limitations first, such as the 31998 inode link limit on ext3. --- src/remote/remote_protocol.x | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index 59774b2..58f0871 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -103,7 +103,7 @@ const REMOTE_INTERFACE_NAME_LIST_MAX = 256; const REMOTE_DEFINED_INTERFACE_NAME_LIST_MAX = 256;
/* Upper limit on lists of storage pool names. */ -const REMOTE_STORAGE_POOL_NAME_LIST_MAX = 256; +const REMOTE_STORAGE_POOL_NAME_LIST_MAX = 65536;
While I agree that it would be nice to avoid capping the number of pools that can actually exist, I have to say NACK to the approach in this patch. The reason we originally chose a limit of 256 is because each storage pool name is an arbitrary-length string, and when you pile multiple strings in a row, you can generate a rather lengthy RPC message for which the receiver has to pre-allocate memory in order to receive correctly. See this comment: /* Maximum length of a memory peek buffer message. * Note applications need to be aware of this limit and issue multiple * requests for large amounts of data. */ const REMOTE_DOMAIN_MEMORY_PEEK_BUFFER_MAX = 65536; That is, you cannot send any RPC with more than 64k data, because . With a cap of 256 pool names, that means each pool name can be (on average) about 256 bytes before you hit the RPC cap. Bumping REMOTE_STORAGE_POOL_NAME_LIST_MAX to 65536 is pointless; each remote_nonnull_string already occupies several bytes towards the 64k cap. I could perhaps see bumping things to 512, which lowers your average pool name to about 128 bytes before you hit the RPC cap; or even a cap of 1024 names averaging 64 bytes each. But the fact remains that you will hit a hard wall on RPC sizing, with diminishing returns if you change the pool name list maximum, which means that the _real_ solution to this has to be at a more fundamental level. We need a new API, which lets you specify a starting point and range, in order to query only part of the pool list. For example, see how the recent virDomainGetCPUStats has start_cpu and ncpus parameters, as well as a cap on only 128 cpus per RPC call. If you have a beefy machine with 256 cpus, you have to make two separate RPC calls in order to collect all of the statistics. In other words, I think we need: virConnectListStoragePoolsRange(virConnectPtr conn, char **const names, int start, int maxnames, unsigned int flags); where flags controls whether we are listing active or defined pools (virConnectListStoragePools and virConnectListDefinedStoragePools were added back in the days before we realized how useful a flags argument would be), where 'maxnames' can be at most 256 (per the RPC limits), but where 'start' lets you choose which point in the array that you are starting your enumeration so that multiple calls let you see the entire array. It may also be worth considering the addition of _just_ a new ranged RPC call, but no new public API, by making the public API smart enough to automatically call the ranged RPC multiple times to build up a single return to the user. After all, if we exposed the ranged call to the user, then each client will have to add the logic to make the multiple calls; whereas our current limitation is _only_ in the RPC aspect (once the data is across the RPC and into the target process' memory space, there's no longer a 64k cap to worry about). The same need for a new ranged API or RPC applies to any situation where the current RPC limits are starting to feel too small compared to the amount of data available to transmit in a single call. There's also an issue of concurrency - a single API call sees consistent data, but back-to-back calls over subsets of the array could race with another client adding or removing a pool in the meantime. But I'm not sure what can be done to prevent issues on that front; you already have TOCTTOU races between grabbing the list of pools and acting on that list even without a range command. Maybe we can invent a new API for batching several commands into one transaction, with multiple RPC calls but with a guarantee that no RPCs from any other connection will alter the state between the batch calls; but then we start to get into issues with what happens if a network connection goes down in the middle of operating on a batch command. So it may be something that we declare as not worth trying to solve. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Thu, Mar 15, 2012 at 08:31:32AM -0600, Eric Blake wrote:
On 03/14/2012 07:42 PM, Jesse J. Cook wrote:
256 (8 bits) is insufficient for large scale deployments. 65536 (16 bits) is a more appropriate limit and should be sufficient. You are more likely to run into other system limitations first, such as the 31998 inode link limit on ext3. correctly. See this comment:
/* Maximum length of a memory peek buffer message. * Note applications need to be aware of this limit and issue multiple * requests for large amounts of data. */ const REMOTE_DOMAIN_MEMORY_PEEK_BUFFER_MAX = 65536;
That is, you cannot send any RPC with more than 64k data, because . With a cap of 256 pool names, that means each pool name can be (on average) about 256 bytes before you hit the RPC cap.
Not quite right, you meant to look at virnetrpcprotocol.x: /* Size of message payload */ const VIR_NET_MESSAGE_PAYLOAD_MAX = 262120; Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/15/2012 08:41 AM, Daniel P. Berrange wrote:
On Thu, Mar 15, 2012 at 08:31:32AM -0600, Eric Blake wrote:
On 03/14/2012 07:42 PM, Jesse J. Cook wrote:
256 (8 bits) is insufficient for large scale deployments. 65536 (16 bits) is a more appropriate limit and should be sufficient. You are more likely to run into other system limitations first, such as the 31998 inode link limit on ext3. correctly. See this comment:
/* Maximum length of a memory peek buffer message. * Note applications need to be aware of this limit and issue multiple * requests for large amounts of data. */ const REMOTE_DOMAIN_MEMORY_PEEK_BUFFER_MAX = 65536;
That is, you cannot send any RPC with more than 64k data, because . With a cap of 256 pool names, that means each pool name can be (on average) about 256 bytes before you hit the RPC cap.
Not quite right, you meant to look at virnetrpcprotocol.x:
/* Size of message payload */ const VIR_NET_MESSAGE_PAYLOAD_MAX = 262120;
Okay, I read the wrong constant, so I'm off by a factor of 4. That means increasing the pool limit from 256 to 1024 would reduce the average pool name length from 1024 down to 256, which is still pretty reasonable; but increasing the pool limit to 64k reduces the average pool name length limit down to 4 bytes if all names are provided at once (and those four bytes are probably all consumed in the overhead, since RPC sends strings by listing length before contents). For a case study on bumping limits, I only found one commit where we have done it in the past, using 'git log -p src/remote/remote_protocol.x' and grepping for '^-.*MAX.*=': commit 8654175 changed REMOTE_MIGRATE_COOKIE_MAX from 256 to 16k But there, we were also adding migrate v3, where v2 (and therefore old servers never sent enough data) to pass the max, so feature negotiation is built in - a client will only get more than 256 bytes if it requests migrate v3. My point remains that bumping limits only delays the limit and also requires some client feature negotiation, and that the only way to fix things to unlimited is to break it into multiple RPC calls. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 03/15/2012 08:31 AM, Eric Blake wrote:
It may also be worth considering the addition of _just_ a new ranged RPC call, but no new public API, by making the public API smart enough to automatically call the ranged RPC multiple times to build up a single return to the user. After all, if we exposed the ranged call to the user, then each client will have to add the logic to make the multiple calls; whereas our current limitation is _only_ in the RPC aspect (once the data is across the RPC and into the target process' memory space, there's no longer a 64k cap to worry about).
Thinking more about this idea: Suppose we have the following in remote_protocol.x: Existing: struct remote_storage_pool_list_volumes_args { remote_nonnull_storage_pool pool; int maxnames; }; struct remote_storage_pool_list_volumes_ret { remote_nonnull_string names<REMOTE_STORAGE_VOL_NAME_LIST_MAX>; /* insert@1 */ }; ... REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES = 92, /* autogen skipgen priority:high */ (note the change from autogen to skipgen for src/remote) New: struct remote_storage_pool_list_volumes_get_transaction_args { remote_nonnull_storage_pool pool; int maxnames; }; struct remote_storage_pool_list_volumes_get_transaction_ret { int transaction; }; struct remote_storage_pool_list_volumes_transaction_args { remote_nonnull_storage_pool pool; int transaction; int start; }; struct remote_storage_pool_list_volumes_ret { remote_nonnull_string names<REMOTE_STORAGE_VOL_NAME_LIST_MAX>; /* insert@1 */ }; ... REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES_GET_TRANSACTION = ... /* skipgen skipgen priority:high */ REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES_TRANSACTION = ... /* skipgen skipgen priority:high */ Then, in pseudocode, we change the currently-generated src/remote handler for storagePoolListVolumes to do: if maxnames <= REMOTE_STORAGE_VOL_NAME_LIST_MAX use REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES /* existing code */ else { int transaction, i = 0; pass pool and maxnames through REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES_GET_TRANSACTION to set transaction while (i < maxnames) { pass pool and transaction through REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES_TRANSACTION to get 128 more names i += REMOTE_STORAGE_VOL_NAME_LIST_MAX; } } Then, in daemon/remote.c, we leave the existing handler (which calls back into virStoragePoolListVolumes with 256 or less entries) untouched, and add two new hand-written RPC handlers: the handler for REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES_GET_TRANSACTION: register that a sequenced transaction has been started allocate an array of maxnames and position and call virStoragePoolListVolumes, and tie that result to the sequence return the sequence transaction number the handler for REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES_TRANSACTION: validates that the sequence number is valid, and that the start value matches the recorded position prepares the return RPC with the next 128 names from the array tied to the sequence, and updates the position if the position reached the end, free the array and end the sequenced transaction we also update virnetserver.c to support sequenced commands - the idea here is that any time a transaction will need more than one RPC message between client and server, then a transaction number can be generated which reserves memory tied to the transaction; as long as that transaction number exists, then further commands sent with reference to that transaction can reuse that memory; the transaction is then closed if the server and client complete the transaction, but also closed (and frees the reserved memory) if the connection disappears. To avoid a misbehaving client from DoS'ing a server, we could also require that while a transaction is open, the next RPC call from the client must reference that transaction, and must come within a reasonable time limit. To some extent, we already have a 'continue' status for a message; right now, a 'call' can only be replied by 'ok' or 'error', but this would be a case where 'call' is now replied by 'continue' to state that a transaction is in effect, and where we could reuse the 'serial' for as long as the transaction is in effect. At any rate, I don't have code to back this up, but it's certainly food for thought on whether it makes sense to try and enhance our use of RPC to support transactions without also having to add new public API. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Thu, Mar 15, 2012 at 11:47 AM, Eric Blake <eblake@redhat.com> wrote:
On 03/15/2012 08:31 AM, Eric Blake wrote:
It may also be worth considering the addition of _just_ a new ranged RPC call, but no new public API, by making the public API smart enough to automatically call the ranged RPC multiple times to build up a single return to the user. After all, if we exposed the ranged call to the user, then each client will have to add the logic to make the multiple calls; whereas our current limitation is _only_ in the RPC aspect (once the data is across the RPC and into the target process' memory space, there's no longer a 64k cap to worry about).
Thinking more about this idea:
Suppose we have the following in remote_protocol.x:
Existing:
struct remote_storage_pool_list_volumes_args { remote_nonnull_storage_pool pool; int maxnames; }; struct remote_storage_pool_list_volumes_ret { remote_nonnull_string names<REMOTE_STORAGE_VOL_NAME_LIST_MAX>; /* insert@1 */ }; ... REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES = 92, /* autogen skipgen priority:high */
(note the change from autogen to skipgen for src/remote)
New:
struct remote_storage_pool_list_volumes_get_transaction_args { remote_nonnull_storage_pool pool; int maxnames; }; struct remote_storage_pool_list_volumes_get_transaction_ret { int transaction; }; struct remote_storage_pool_list_volumes_transaction_args { remote_nonnull_storage_pool pool; int transaction; int start; }; struct remote_storage_pool_list_volumes_ret { remote_nonnull_string names<REMOTE_STORAGE_VOL_NAME_LIST_MAX>; /* insert@1 */ }; ... REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES_GET_TRANSACTION = ... /* skipgen skipgen priority:high */ REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES_TRANSACTION = ... /* skipgen skipgen priority:high */
Then, in pseudocode, we change the currently-generated src/remote handler for storagePoolListVolumes to do:
if maxnames <= REMOTE_STORAGE_VOL_NAME_LIST_MAX use REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES /* existing code */ else { int transaction, i = 0; pass pool and maxnames through REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES_GET_TRANSACTION to set transaction while (i < maxnames) { pass pool and transaction through REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES_TRANSACTION to get 128 more names i += REMOTE_STORAGE_VOL_NAME_LIST_MAX; } }
Then, in daemon/remote.c, we leave the existing handler (which calls back into virStoragePoolListVolumes with 256 or less entries) untouched, and add two new hand-written RPC handlers:
the handler for REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES_GET_TRANSACTION: register that a sequenced transaction has been started allocate an array of maxnames and position and call virStoragePoolListVolumes, and tie that result to the sequence return the sequence transaction number
the handler for REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES_TRANSACTION: validates that the sequence number is valid, and that the start value matches the recorded position prepares the return RPC with the next 128 names from the array tied to the sequence, and updates the position if the position reached the end, free the array and end the sequenced transaction
we also update virnetserver.c to support sequenced commands - the idea here is that any time a transaction will need more than one RPC message between client and server, then a transaction number can be generated which reserves memory tied to the transaction; as long as that transaction number exists, then further commands sent with reference to that transaction can reuse that memory; the transaction is then closed if the server and client complete the transaction, but also closed (and frees the reserved memory) if the connection disappears. To avoid a misbehaving client from DoS'ing a server, we could also require that while a transaction is open, the next RPC call from the client must reference that transaction, and must come within a reasonable time limit. To some extent, we already have a 'continue' status for a message; right now, a 'call' can only be replied by 'ok' or 'error', but this would be a case where 'call' is now replied by 'continue' to state that a transaction is in effect, and where we could reuse the 'serial' for as long as the transaction is in effect.
At any rate, I don't have code to back this up, but it's certainly food for thought on whether it makes sense to try and enhance our use of RPC to support transactions without also having to add new public API.
-- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Your solution seems reasonable, though I'd have to really think about it to see if there are any major holes in it. Could you leverage compression to eliminate the need for a more sophisticated solution?

On 03/15/2012 11:30 AM, Jesse Cook wrote:
At any rate, I don't have code to back this up, but it's certainly food for thought on whether it makes sense to try and enhance our use of RPC to support transactions without also having to add new public API.
Your solution seems reasonable, though I'd have to really think about it to see if there are any major holes in it. Could you leverage compression to eliminate the need for a more sophisticated solution?
Compression is indeed another option, but it still implies a new RPC, where if maxnames is <= limit, we call the old RPC, and if it is greater, then we call the new RPC which generates the list, compresses it to less than 256k (the rpc limit), and passes the compressed data as an opaque blob, then the client decompresses the blob to reconstruct the list. Simpler than having to do transactions, if you can assume the compression is always likely to work, but more complex in that we have to decide what form of compression to use. (All of this really boils down to a protocol contract between client and server, where both sides agree on how to represent data so that any one message is not disproportionately large). Back to my earlier comments about cross-compatibility, how a server must not reply with more data than an old client was expecting so as not to crash the old client, and therefore new clients must indicate that they are newer: adding a new RPC is indeed a form of feature negotiation (if you are new enough to call the new RPC, then you can deal with the larger return of the new RPC). -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Thu, Mar 15, 2012 at 10:47:25AM -0600, Eric Blake wrote:
On 03/15/2012 08:31 AM, Eric Blake wrote:
It may also be worth considering the addition of _just_ a new ranged RPC call, but no new public API, by making the public API smart enough to automatically call the ranged RPC multiple times to build up a single return to the user. After all, if we exposed the ranged call to the user, then each client will have to add the logic to make the multiple calls; whereas our current limitation is _only_ in the RPC aspect (once the data is across the RPC and into the target process' memory space, there's no longer a 64k cap to worry about).
Thinking more about this idea:
Suppose we have the following in remote_protocol.x:
Existing:
struct remote_storage_pool_list_volumes_args { remote_nonnull_storage_pool pool; int maxnames; }; struct remote_storage_pool_list_volumes_ret { remote_nonnull_string names<REMOTE_STORAGE_VOL_NAME_LIST_MAX>; /* insert@1 */ }; ... REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES = 92, /* autogen skipgen priority:high */
(note the change from autogen to skipgen for src/remote)
New:
struct remote_storage_pool_list_volumes_get_transaction_args { remote_nonnull_storage_pool pool; int maxnames; }; struct remote_storage_pool_list_volumes_get_transaction_ret { int transaction; }; struct remote_storage_pool_list_volumes_transaction_args { remote_nonnull_storage_pool pool; int transaction; int start; }; struct remote_storage_pool_list_volumes_ret { remote_nonnull_string names<REMOTE_STORAGE_VOL_NAME_LIST_MAX>; /* insert@1 */ }; ... REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES_GET_TRANSACTION = ... /* skipgen skipgen priority:high */ REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES_TRANSACTION = ... /* skipgen skipgen priority:high */
Then, in pseudocode, we change the currently-generated src/remote handler for storagePoolListVolumes to do:
if maxnames <= REMOTE_STORAGE_VOL_NAME_LIST_MAX use REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES /* existing code */ else { int transaction, i = 0; pass pool and maxnames through REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES_GET_TRANSACTION to set transaction while (i < maxnames) { pass pool and transaction through REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES_TRANSACTION to get 128 more names i += REMOTE_STORAGE_VOL_NAME_LIST_MAX; } }
Then, in daemon/remote.c, we leave the existing handler (which calls back into virStoragePoolListVolumes with 256 or less entries) untouched, and add two new hand-written RPC handlers:
the handler for REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES_GET_TRANSACTION: register that a sequenced transaction has been started allocate an array of maxnames and position and call virStoragePoolListVolumes, and tie that result to the sequence return the sequence transaction number
the handler for REMOTE_PROC_STORAGE_POOL_LIST_VOLUMES_TRANSACTION: validates that the sequence number is valid, and that the start value matches the recorded position prepares the return RPC with the next 128 names from the array tied to the sequence, and updates the position if the position reached the end, free the array and end the sequenced transaction
we also update virnetserver.c to support sequenced commands - the idea here is that any time a transaction will need more than one RPC message between client and server, then a transaction number can be generated which reserves memory tied to the transaction; as long as that transaction number exists, then further commands sent with reference to that transaction can reuse that memory; the transaction is then closed if the server and client complete the transaction, but also closed (and frees the reserved memory) if the connection disappears. To avoid a misbehaving client from DoS'ing a server, we could also require that while a transaction is open, the next RPC call from the client must reference that transaction, and must come within a reasonable time limit. To some extent, we already have a 'continue' status for a message; right now, a 'call' can only be replied by 'ok' or 'error', but this would be a case where 'call' is now replied by 'continue' to state that a transaction is in effect, and where we could reuse the 'serial' for as long as the transaction is in effect.
To me this all feels rather complicated to deal with. The reason for the RPC limits was to prevent memory usage DOS. The actual RPC message size is only half of the story though - the actual storage pool API still requires memory to give you back the string list. So if the list of volume names consumes 1 MB of memory, we're going to consume that just by invoking the API. Now copying the strings into an RPC message will double that to 2 MB. Using the sequence of RPC calls to iterate over this is only addressing that second part of memory usage. So I'm not really feeling that's a huge win, given the complexity it introduces. I'm inclined to say, that if people are creating setups with 1000's of volume & guests, then they can probably spare the extra memory for us to increase the main RPC message payload limit somewhat. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/15/2012 11:48 AM, Daniel P. Berrange wrote:
Using the sequence of RPC calls to iterate over this is only addressing that second part of memory usage. So I'm not really feeling that's a huge win, given the complexity it introduces.
I'm inclined to say, that if people are creating setups with 1000's of volume & guests, then they can probably spare the extra memory for us to increase the main RPC message payload limit somewhat.
Our own docs/internals/rpc.html.in states:
Although the protocol itself defines many arbitrary sized data values in the payloads, to avoid denial of service attack there are a number of size limit checks prior to encoding or decoding data. There is a limit on the maximum size of a single RPC message, limit on the maximum string length, and limits on any other parameter which uses a variable length array. These limits can be raised, subject to agreement between client/server, without otherwise breaking compatibility of the RPC data on the wire.
Increasing limits makes sense, as long as we have a sane way to do it while ensuring that on version mismatch, a large packet from the sender doesn't crash an older receiver expecting the smaller limit. In our XDR, should we be converting some of our 'type name<LIST_MAX>' over to 'type name<>' notations, to state that they are effectively unlimited within the larger bounds of the overall packet size? Is 256k for overall packet size still okay for the problem at hand? -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Thu, Mar 15, 2012 at 1:10 PM, Eric Blake <eblake@redhat.com> wrote:
On 03/15/2012 11:48 AM, Daniel P. Berrange wrote:
Using the sequence of RPC calls to iterate over this is only addressing that second part of memory usage. So I'm not really feeling that's a huge win, given the complexity it introduces.
I'm inclined to say, that if people are creating setups with 1000's of volume & guests, then they can probably spare the extra memory for us to increase the main RPC message payload limit somewhat.
Our own docs/internals/rpc.html.in states:
Although the protocol itself defines many arbitrary sized data values in the payloads, to avoid denial of service attack there are a number of size limit checks prior to encoding or decoding data. There is a limit on the maximum size of a single RPC message, limit on the maximum string length, and limits on any other parameter which uses a variable length array. These limits can be raised, subject to agreement between client/server, without otherwise breaking compatibility of the RPC data on the wire.
Increasing limits makes sense, as long as we have a sane way to do it while ensuring that on version mismatch, a large packet from the sender doesn't crash an older receiver expecting the smaller limit.
In our XDR, should we be converting some of our 'type name<LIST_MAX>' over to 'type name<>' notations, to state that they are effectively unlimited within the larger bounds of the overall packet size? Is 256k for overall packet size still okay for the problem at hand?
-- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Using the previously posted snippet of code (I'm guessing that the missing 24 bytes is for RPC header info): /* Size of message payload */ const VIR_NET_MESSAGE_PAYLOAD_MAX = 262120; We would currently be able to squeeze 7084 things into the RPC message at 37 bytes per pool. We could easily reduce this to 33 bytes giving us 7943 things. Now I don't know about other users, but this would give us some room to grow. Therefore, removing the list limitation reduces the priority of this from my perspective and gives us the opportunity to test other limitations that we might run into with libvirt or otherwise. When we start moving into the neighborhood of tens of thousands of things the priority of removing the 256k limit will be much higher from our perspective, but I'm not sure when that will be. I certainly can't speak from the perspective of other users. -- Jesse
participants (6)
-
Daniel P. Berrange
-
Eric Blake
-
Jesse Cook
-
Jesse Cook
-
Jesse J. Cook
-
Osier Yang