[libvirt] [PATCH] rbd: Use RBD format 2 by default when creating images.

We used to look at the librbd code version and depending on that we would invoke rbd_create3() or rbd_create(). Since librbd version 0.67.9 we can however tell RBD that it should create rbd format 2 images even if we invoke rbd_create(). The less options we pass to librbd, the more we can lean on the sane defaults it uses. For rbd_create3() we had things like the stripe count and unit hardcoded in libvirt and that might cause problems down the road. Signed-off-by: Wido den Hollander <wido@widodh.nl> --- src/storage/storage_backend_rbd.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c index 8e8d7a7..936ad18 100644 --- a/src/storage/storage_backend_rbd.c +++ b/src/storage/storage_backend_rbd.c @@ -66,6 +66,7 @@ static int virStorageBackendRBDOpenRADOSConn(virStorageBackendRBDStatePtr ptr, const char *client_mount_timeout = "30"; const char *mon_op_timeout = "30"; const char *osd_op_timeout = "30"; + const char *rbd_default_format = "2"; if (authdef) { VIR_DEBUG("Using cephx authorization, username: %s", authdef->username); @@ -211,6 +212,14 @@ static int virStorageBackendRBDOpenRADOSConn(virStorageBackendRBDStatePtr ptr, VIR_DEBUG("Setting RADOS option rados_osd_op_timeout to %s", osd_op_timeout); rados_conf_set(ptr->cluster, "rados_osd_op_timeout", osd_op_timeout); + /* + * Librbd supports creating RBD format 2 images. We no longer have to invoke + * rbd_create3(), we can tell librbd to default to format 2. + * This leaves us to simply use rbd_create() and use the default behavior of librbd + */ + VIR_DEBUG("Setting RADOS option rbd_default_format to %s", rbd_default_format); + rados_conf_set(ptr->cluster, "rbd_default_format", rbd_default_format); + ptr->starttime = time(0); r = rados_connect(ptr->cluster); if (r < 0) { @@ -475,16 +484,7 @@ static int virStorageBackendRBDCreateImage(rados_ioctx_t io, char *name, long capacity) { int order = 0; -#if LIBRBD_VERSION_CODE > 260 - uint64_t features = 3; - uint64_t stripe_count = 1; - uint64_t stripe_unit = 4194304; - - if (rbd_create3(io, name, capacity, features, &order, - stripe_unit, stripe_count) < 0) { -#else if (rbd_create(io, name, capacity, &order) < 0) { -#endif return -1; } -- 1.9.1

On 07/14/2015 04:15 AM, Wido den Hollander wrote:
We used to look at the librbd code version and depending on that we would invoke rbd_create3() or rbd_create().
Since librbd version 0.67.9 we can however tell RBD that it should create rbd format 2 images even if we invoke rbd_create().
The less options we pass to librbd, the more we can lean on the sane defaults it uses.
For rbd_create3() we had things like the stripe count and unit hardcoded in libvirt and that might cause problems down the road.
Signed-off-by: Wido den Hollander <wido@widodh.nl> --- src/storage/storage_backend_rbd.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c index 8e8d7a7..936ad18 100644 --- a/src/storage/storage_backend_rbd.c +++ b/src/storage/storage_backend_rbd.c @@ -66,6 +66,7 @@ static int virStorageBackendRBDOpenRADOSConn(virStorageBackendRBDStatePtr ptr, const char *client_mount_timeout = "30"; const char *mon_op_timeout = "30"; const char *osd_op_timeout = "30"; + const char *rbd_default_format = "2";
if (authdef) { VIR_DEBUG("Using cephx authorization, username: %s", authdef->username); @@ -211,6 +212,14 @@ static int virStorageBackendRBDOpenRADOSConn(virStorageBackendRBDStatePtr ptr, VIR_DEBUG("Setting RADOS option rados_osd_op_timeout to %s", osd_op_timeout); rados_conf_set(ptr->cluster, "rados_osd_op_timeout", osd_op_timeout);
+ /* + * Librbd supports creating RBD format 2 images. We no longer have to invoke + * rbd_create3(), we can tell librbd to default to format 2. + * This leaves us to simply use rbd_create() and use the default behavior of librbd + */ + VIR_DEBUG("Setting RADOS option rbd_default_format to %s", rbd_default_format); + rados_conf_set(ptr->cluster, "rbd_default_format", rbd_default_format); +
I assume (from above) 0.67.9 is the first time this option is recognized?
ptr->starttime = time(0); r = rados_connect(ptr->cluster); if (r < 0) { @@ -475,16 +484,7 @@ static int virStorageBackendRBDCreateImage(rados_ioctx_t io, char *name, long capacity) { int order = 0; -#if LIBRBD_VERSION_CODE > 260 - uint64_t features = 3; - uint64_t stripe_count = 1; - uint64_t stripe_unit = 4194304; - - if (rbd_create3(io, name, capacity, features, &order, - stripe_unit, stripe_count) < 0) { -#else if (rbd_create(io, name, capacity, &order) < 0) { -#endif
Not quite my area of expertise, but since this was a build time check/change wouldn't this then impose a certain minimum version of rbd on the libvirt package build/install environment (eg librbd1-devel)? That is - if this were applied and installed on some host that didn't have at least 0.67.9, then what would happen? Or one with less than 0.56? Just trying to prevent some less than obvious issue because some build environment doesn't have the "latest and greatest" librbd.h installed John
return -1; }

On 07/14/2015 12:42 PM, John Ferlan wrote:
On 07/14/2015 04:15 AM, Wido den Hollander wrote:
We used to look at the librbd code version and depending on that we would invoke rbd_create3() or rbd_create().
Since librbd version 0.67.9 we can however tell RBD that it should create rbd format 2 images even if we invoke rbd_create().
The>> less options we pass to librbd, the more we can lean on the sane defaults it uses.
For rbd_create3() we had things like the stripe count and unit hardcoded in libvirt and that might cause problems down the road.
Hardcoding the feature bits is even worse. I think this is the right approach.
Signed-off-by: Wido den Hollander <wido@widodh.nl> --- src/storage/storage_backend_rbd.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c index 8e8d7a7..936ad18 100644 --- a/src/storage/storage_backend_rbd.c +++ b/src/storage/storage_backend_rbd.c @@ -66,6 +66,7 @@ static int virStorageBackendRBDOpenRADOSConn(virStorageBackendRBDStatePtr ptr, const char *client_mount_timeout = "30"; const char *mon_op_timeout = "30"; const char *osd_op_timeout = "30"; + const char *rbd_default_format = "2";
if (authdef) { VIR_DEBUG("Using cephx authorization, username: %s", authdef->username); @@ -211,6 +212,14 @@ static int virStorageBackendRBDOpenRADOSConn(virStorageBackendRBDStatePtr ptr, VIR_DEBUG("Setting RADOS option rados_osd_op_timeout to %s", osd_op_timeout); rados_conf_set(ptr->cluster, "rados_osd_op_timeout", osd_op_timeout);
+ /* + * Librbd supports creating RBD format 2 images. We no longer have to invoke + * rbd_create3(), we can tell librbd to default to format 2. + * This leaves us to simply use rbd_create() and use the default behavior of librbd + */ + VIR_DEBUG("Setting RADOS option rbd_default_format to %s", rbd_default_format); + rados_conf_set(ptr->cluster, "rbd_default_format", rbd_default_format); +
I assume (from above) 0.67.9 is the first time this option is recognized?
It's recognized in the bobtail (since 0.56.7) and cuttlefish (since 0.61.3) series as well. It was actually in all the dumpling (0.67.x) releases.
ptr->starttime = time(0); r = rados_connect(ptr->cluster); if (r < 0) { @@ -475,16 +484,7 @@ static int virStorageBackendRBDCreateImage(rados_ioctx_t io, char *name, long capacity) { int order = 0; -#if LIBRBD_VERSION_CODE > 260 - uint64_t features = 3; - uint64_t stripe_count = 1; - uint64_t stripe_unit = 4194304; - - if (rbd_create3(io, name, capacity, features, &order, - stripe_unit, stripe_count) < 0) { -#else if (rbd_create(io, name, capacity, &order) < 0) { -#endif
Not quite my area of expertise, but since this was a build time check/change wouldn't this then impose a certain minimum version of rbd on the libvirt package build/install environment (eg librbd1-devel)? That is - if this were applied and installed on some host that didn't have at least 0.67.9, then what would happen? Or one with less than 0.56?
The build time check was for the rbd_create3() function. This patch removes that usage, and relies on the "rbd_default*" ceph options which have no build time requirement. At runtime a non-existent option will return ENOENT, which isn't checked in this patch. I think that's fine, especially since dumpling (0.67.x) is no longer maintained [1].
Just trying to prevent some less than obvious issue because some build environment doesn't have the "latest and greatest" librbd.h installed
I'm glad you're vigilant about these, they're important. In this case the patch looks good to me: Reviewed-by: Josh Durgin <jdurgin@redhat.com> [1] http://ceph.com/docs/master/releases/

On 07/14/2015 04:13 PM, Josh Durgin wrote:
On 07/14/2015 12:42 PM, John Ferlan wrote:
On 07/14/2015 04:15 AM, Wido den Hollander wrote:
We used to look at the librbd code version and depending on that we would invoke rbd_create3() or rbd_create().
Since librbd version 0.67.9 we can however tell RBD that it should create rbd format 2 images even if we invoke rbd_create().
The>> less options we pass to librbd, the more we can lean on the sane defaults it uses.
For rbd_create3() we had things like the stripe count and unit hardcoded in libvirt and that might cause problems down the road.
Hardcoding the feature bits is even worse. I think this is the right approach.
OK - just trying to be sure... and since no one else spoke up yet, seems the approach is fine. ...
virStorageBackendRBDCreateImage(rados_ioctx_t io, char *name, long capacity) { int order = 0; -#if LIBRBD_VERSION_CODE > 260 - uint64_t features = 3; - uint64_t stripe_count = 1; - uint64_t stripe_unit = 4194304; - - if (rbd_create3(io, name, capacity, features, &order, - stripe_unit, stripe_count) < 0) { -#else if (rbd_create(io, name, capacity, &order) < 0) { -#endif
This change fails make syntax-check due to unnecessary curly braces. Before I push, the question I see a second issue... The current code returns -1 regardless of error, which is then used in a call to virReportSystemError(-r,...)... That means regardless of the error you're returning EPERM which could throw someone off if something other than PERM was returned from rbd_create I can fix that as well by : return rbd_create(io, name, capacity, &order); FWIW: This reporting snafu was introduced by commit id '761491e' which just took the return of virStorageBackendRBDCreateImage and used it as the basis for the message generated. John

On 16-07-15 18:28, John Ferlan wrote:
On 07/14/2015 04:13 PM, Josh Durgin wrote:
On 07/14/2015 12:42 PM, John Ferlan wrote:
On 07/14/2015 04:15 AM, Wido den Hollander wrote:
We used to look at the librbd code version and depending on that we would invoke rbd_create3() or rbd_create().
Since librbd version 0.67.9 we can however tell RBD that it should create rbd format 2 images even if we invoke rbd_create().
The>> less options we pass to librbd, the more we can lean on the sane defaults it uses.
For rbd_create3() we had things like the stripe count and unit hardcoded in libvirt and that might cause problems down the road.
Hardcoding the feature bits is even worse. I think this is the right approach.
OK - just trying to be sure... and since no one else spoke up yet, seems the approach is fine.
Yes, I think Josh gave the right comments.
...
virStorageBackendRBDCreateImage(rados_ioctx_t io, char *name, long capacity) { int order = 0; -#if LIBRBD_VERSION_CODE > 260 - uint64_t features = 3; - uint64_t stripe_count = 1; - uint64_t stripe_unit = 4194304; - - if (rbd_create3(io, name, capacity, features, &order, - stripe_unit, stripe_count) < 0) { -#else if (rbd_create(io, name, capacity, &order) < 0) { -#endif
This change fails make syntax-check due to unnecessary curly braces.
Before I push, the question I see a second issue...
The current code returns -1 regardless of error, which is then used in a call to virReportSystemError(-r,...)... That means regardless of the error you're returning EPERM which could throw someone off if something other than PERM was returned from rbd_create
I can fix that as well by :
return rbd_create(io, name, capacity, &order);
Seems like a sane approach to me. Less if-statements and the code still works just fine. You can apply my patch and fix this afterwards? Or do you want a new patch from me? Wido
FWIW: This reporting snafu was introduced by commit id '761491e' which just took the return of virStorageBackendRBDCreateImage and used it as the basis for the message generated.
John
participants (3)
-
John Ferlan
-
Josh Durgin
-
Wido den Hollander