[libvirt] [PATCH] json: fix interface locale dependency

libvirt creates invalid commands if wrong locale is selected. For example with locale that uses comma as a decimal point, JSON commands created with decimal numbers are invalid because comma separates the entries in JSON. But even when decimal point is affected, grouping is not, because for grouping to be enabled with *printf, there has to be a apostrophe flag specified (and supported). --- Fortunately, there should be no other place where output-formatting is affected by this problem. I tried to change this in various ways with this posted one being the cleanest from my point of view, because: - setting locale is per-proccess, not per-thread (not thread-safe) - there is no number parsing that mangles the output number because of floating point precision src/util/json.c | 23 +++++++++++++++++++++-- 1 files changed, 21 insertions(+), 2 deletions(-) diff --git a/src/util/json.c b/src/util/json.c index 5132989..753a548 100644 --- a/src/util/json.c +++ b/src/util/json.c @@ -23,6 +23,8 @@ #include <config.h> +#include <locale.h> + #include "json.h" #include "memory.h" #include "virterror_internal.h" @@ -44,7 +46,6 @@ /* XXX fixme */ #define VIR_FROM_THIS VIR_FROM_NONE - typedef struct _virJSONParserState virJSONParserState; typedef virJSONParserState *virJSONParserStatePtr; struct _virJSONParserState { @@ -200,9 +201,27 @@ virJSONValuePtr virJSONValueNewNumberUlong(unsigned long long data) virJSONValuePtr virJSONValueNewNumberDouble(double data) { virJSONValuePtr val = NULL; - char *str; + char *str, *radix, *tmp; + if (virAsprintf(&str, "%lf", data) < 0) return NULL; + + /* because printing double is locale-dependent, we could end up + * with invalid JSON code, so we have to do something like this */ + radix = localeconv()->decimal_point; + tmp = strstr(str, radix); + if (tmp) { + *tmp = '.'; + if (strlen(radix) > 1) { + /* if the current locale specifies more characters as + * decimal point then cover the others with decimal + * numbers */ + memcpy(tmp + 1, + tmp + strlen(radix), + strlen(str) - (tmp - str)); + } + } + val = virJSONValueNewNumber(str); VIR_FREE(str); return val; -- 1.7.8.6

On 06.08.2012 10:12, Martin Kletzander wrote:
libvirt creates invalid commands if wrong locale is selected. For example with locale that uses comma as a decimal point, JSON commands created with decimal numbers are invalid because comma separates the entries in JSON.
But even when decimal point is affected, grouping is not, because for grouping to be enabled with *printf, there has to be a apostrophe flag specified (and supported). --- Fortunately, there should be no other place where output-formatting is affected by this problem.
I tried to change this in various ways with this posted one being the cleanest from my point of view, because: - setting locale is per-proccess, not per-thread (not thread-safe) - there is no number parsing that mangles the output number because of floating point precision
src/util/json.c | 23 +++++++++++++++++++++-- 1 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/src/util/json.c b/src/util/json.c index 5132989..753a548 100644 --- a/src/util/json.c +++ b/src/util/json.c @@ -23,6 +23,8 @@
#include <config.h>
+#include <locale.h> + #include "json.h" #include "memory.h" #include "virterror_internal.h" @@ -44,7 +46,6 @@ /* XXX fixme */ #define VIR_FROM_THIS VIR_FROM_NONE
- typedef struct _virJSONParserState virJSONParserState; typedef virJSONParserState *virJSONParserStatePtr; struct _virJSONParserState { @@ -200,9 +201,27 @@ virJSONValuePtr virJSONValueNewNumberUlong(unsigned long long data) virJSONValuePtr virJSONValueNewNumberDouble(double data) { virJSONValuePtr val = NULL; - char *str; + char *str, *radix, *tmp; + if (virAsprintf(&str, "%lf", data) < 0) return NULL; + + /* because printing double is locale-dependent, we could end up + * with invalid JSON code, so we have to do something like this */ + radix = localeconv()->decimal_point; + tmp = strstr(str, radix); + if (tmp) { + *tmp = '.'; + if (strlen(radix) > 1) { + /* if the current locale specifies more characters as + * decimal point then cover the others with decimal + * numbers */ + memcpy(tmp + 1, + tmp + strlen(radix), + strlen(str) - (tmp - str)); + } + } + val = virJSONValueNewNumber(str); VIR_FREE(str); return val; --
When we are touching this - should we also care about thousand separator? Michal

On 08/06/2012 11:13 AM, Michal Privoznik wrote:
On 06.08.2012 10:12, Martin Kletzander wrote:
libvirt creates invalid commands if wrong locale is selected. For example with locale that uses comma as a decimal point, JSON commands created with decimal numbers are invalid because comma separates the entries in JSON.
But even when decimal point is affected, grouping is not, because for grouping to be enabled with *printf, there has to be a apostrophe flag specified (and supported). --- Fortunately, there should be no other place where output-formatting is affected by this problem.
I tried to change this in various ways with this posted one being the cleanest from my point of view, because: - setting locale is per-proccess, not per-thread (not thread-safe) - there is no number parsing that mangles the output number because of floating point precision
src/util/json.c | 23 +++++++++++++++++++++-- 1 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/src/util/json.c b/src/util/json.c index 5132989..753a548 100644 --- a/src/util/json.c +++ b/src/util/json.c @@ -23,6 +23,8 @@
#include <config.h>
+#include <locale.h> + #include "json.h" #include "memory.h" #include "virterror_internal.h" @@ -44,7 +46,6 @@ /* XXX fixme */ #define VIR_FROM_THIS VIR_FROM_NONE
- typedef struct _virJSONParserState virJSONParserState; typedef virJSONParserState *virJSONParserStatePtr; struct _virJSONParserState { @@ -200,9 +201,27 @@ virJSONValuePtr virJSONValueNewNumberUlong(unsigned long long data) virJSONValuePtr virJSONValueNewNumberDouble(double data) { virJSONValuePtr val = NULL; - char *str; + char *str, *radix, *tmp; + if (virAsprintf(&str, "%lf", data) < 0) return NULL; + + /* because printing double is locale-dependent, we could end up + * with invalid JSON code, so we have to do something like this */ + radix = localeconv()->decimal_point; + tmp = strstr(str, radix); + if (tmp) { + *tmp = '.'; + if (strlen(radix) > 1) { + /* if the current locale specifies more characters as + * decimal point then cover the others with decimal + * numbers */ + memcpy(tmp + 1, + tmp + strlen(radix), + strlen(str) - (tmp - str)); + } + } + val = virJSONValueNewNumber(str); VIR_FREE(str); return val; --
When we are touching this - should we also care about thousand separator?
Michal
Thousands are separated only if the apostrophe flag is specified (mentioned in the commit message). Martin

On Mon, Aug 06, 2012 at 10:12:17AM +0200, Martin Kletzander wrote:
libvirt creates invalid commands if wrong locale is selected. For example with locale that uses comma as a decimal point, JSON commands created with decimal numbers are invalid because comma separates the entries in JSON.
But even when decimal point is affected, grouping is not, because for grouping to be enabled with *printf, there has to be a apostrophe flag specified (and supported). --- Fortunately, there should be no other place where output-formatting is affected by this problem.
I tried to change this in various ways with this posted one being the cleanest from my point of view, because: - setting locale is per-proccess, not per-thread (not thread-safe)
Actually in glibc there is a per-thread locale: /* Switch the current thread's locale to DATASET. If DATASET is null, instead just return the current setting. The special value LC_GLOBAL_LOCALE is the initial setting for all threads and can also be installed any time, meaning the thread uses the global settings controlled by `setlocale'. */ extern __locale_t uselocale (__locale_t __dataset) __THROW;
typedef struct _virJSONParserState virJSONParserState; typedef virJSONParserState *virJSONParserStatePtr; struct _virJSONParserState { @@ -200,9 +201,27 @@ virJSONValuePtr virJSONValueNewNumberUlong(unsigned long long data) virJSONValuePtr virJSONValueNewNumberDouble(double data) { virJSONValuePtr val = NULL; - char *str; + char *str, *radix, *tmp; + if (virAsprintf(&str, "%lf", data) < 0) return NULL; + + /* because printing double is locale-dependent, we could end up + * with invalid JSON code, so we have to do something like this */ + radix = localeconv()->decimal_point; + tmp = strstr(str, radix); + if (tmp) { + *tmp = '.'; + if (strlen(radix) > 1) { + /* if the current locale specifies more characters as + * decimal point then cover the others with decimal + * numbers */ + memcpy(tmp + 1, + tmp + strlen(radix), + strlen(str) - (tmp - str)); + } + } + val = virJSONValueNewNumber(str); VIR_FREE(str); return val;
GLib has a g_ascii_dtostr() which forces uses of '.' as separator. Since GLib is LGPLv2+ licensed, we can just copy their impl, which actually uses GLibc's uselocale() if possible, otherwise has a fallback impl. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 08/06/2012 06:24 AM, Daniel P. Berrange wrote:
On Mon, Aug 06, 2012 at 10:12:17AM +0200, Martin Kletzander wrote:
libvirt creates invalid commands if wrong locale is selected. For example with locale that uses comma as a decimal point, JSON commands created with decimal numbers are invalid because comma separates the entries in JSON.
But even when decimal point is affected, grouping is not, because for grouping to be enabled with *printf, there has to be a apostrophe flag specified (and supported). --- Fortunately, there should be no other place where output-formatting is affected by this problem.
I tried to change this in various ways with this posted one being the cleanest from my point of view, because: - setting locale is per-proccess, not per-thread (not thread-safe)
Actually in glibc there is a per-thread locale:
I agree that we should be using uselocale() where appropriate, rather than hacking with localeconv()->decimal_point. Also, we need to make sure that whatever solution we come up with compiles on mingw (I'm afraid that localeconv()->decimal_point is not portable enough, yet).
GLib has a g_ascii_dtostr() which forces uses of '.' as separator. Since GLib is LGPLv2+ licensed, we can just copy their impl, which actually uses GLibc's uselocale() if possible, otherwise has a fallback impl.
gnulib also has a module 'ftoastr' for printing an unambiguous representation of floating point (one problem with the default precision of %lf and friends is that it rounds, so more than one floating point value will result in the same ambiguous output string), but alas that module is GPL at the moment, and I'm not sure whether it has a way to force the decimal point issue. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 08/06/2012 02:52 PM, Eric Blake wrote:
On 08/06/2012 06:24 AM, Daniel P. Berrange wrote:
On Mon, Aug 06, 2012 at 10:12:17AM +0200, Martin Kletzander wrote:
libvirt creates invalid commands if wrong locale is selected. For example with locale that uses comma as a decimal point, JSON commands created with decimal numbers are invalid because comma separates the entries in JSON.
But even when decimal point is affected, grouping is not, because for grouping to be enabled with *printf, there has to be a apostrophe flag specified (and supported). --- Fortunately, there should be no other place where output-formatting is affected by this problem.
I tried to change this in various ways with this posted one being the cleanest from my point of view, because: - setting locale is per-proccess, not per-thread (not thread-safe)
Actually in glibc there is a per-thread locale:
I agree that we should be using uselocale() where appropriate, rather than hacking with localeconv()->decimal_point. Also, we need to make sure that whatever solution we come up with compiles on mingw (I'm afraid that localeconv()->decimal_point is not portable enough, yet).
If you mean the construct then yes, this could be written better, my fault. If you mean the use of localeconv itself it should be more portable than nl_langinfo which was my second option. Also when I was looking at the per-thread locale, I've found a comment few lines up the code before all the per-thread locale functions: Attention: all these functions are *not* standardized in any form. This is a proof-of-concept implementation.
GLib has a g_ascii_dtostr() which forces uses of '.' as separator. Since GLib is LGPLv2+ licensed, we can just copy their impl, which actually uses GLibc's uselocale() if possible, otherwise has a fallback impl.
gnulib also has a module 'ftoastr' for printing an unambiguous representation of floating point (one problem with the default precision of %lf and friends is that it rounds, so more than one floating point value will result in the same ambiguous output string), but alas that module is GPL at the moment, and I'm not sure whether it has a way to force the decimal point issue.
I was going through the code of both of these and thanks ftoastr is under GPL, because I didn't quite understand it. Looking at the g_ascii_dtostr I've found that what is being done there and it amused me a bit. After couple of checks, conditions and whatnot, it get's the current decimal_point string from the localeconv() and replaces it with a '.', long story short, the only difference is that instead of strcpy(), there is memmove() used. Not that I don't want to change the code, this is only some info I've found and I don't know whether to change the code and if, then to what to change it. Martin

On Mon, Aug 06, 2012 at 03:59:42PM +0200, Martin Kletzander wrote:
On 08/06/2012 02:52 PM, Eric Blake wrote:
On 08/06/2012 06:24 AM, Daniel P. Berrange wrote:
On Mon, Aug 06, 2012 at 10:12:17AM +0200, Martin Kletzander wrote:
libvirt creates invalid commands if wrong locale is selected. For example with locale that uses comma as a decimal point, JSON commands created with decimal numbers are invalid because comma separates the entries in JSON.
But even when decimal point is affected, grouping is not, because for grouping to be enabled with *printf, there has to be a apostrophe flag specified (and supported). --- Fortunately, there should be no other place where output-formatting is affected by this problem.
I tried to change this in various ways with this posted one being the cleanest from my point of view, because: - setting locale is per-proccess, not per-thread (not thread-safe)
Actually in glibc there is a per-thread locale:
I agree that we should be using uselocale() where appropriate, rather than hacking with localeconv()->decimal_point. Also, we need to make sure that whatever solution we come up with compiles on mingw (I'm afraid that localeconv()->decimal_point is not portable enough, yet).
If you mean the construct then yes, this could be written better, my fault. If you mean the use of localeconv itself it should be more portable than nl_langinfo which was my second option.
Also when I was looking at the per-thread locale, I've found a comment few lines up the code before all the per-thread locale functions:
Attention: all these functions are *not* standardized in any form. This is a proof-of-concept implementation.
The fallback code that GLib uses has been around since 2001 and compiles on Win32, many UNIX & Linux so I think that's good enough from a portability POV.
GLib has a g_ascii_dtostr() which forces uses of '.' as separator. Since GLib is LGPLv2+ licensed, we can just copy their impl, which actually uses GLibc's uselocale() if possible, otherwise has a fallback impl.
gnulib also has a module 'ftoastr' for printing an unambiguous representation of floating point (one problem with the default precision of %lf and friends is that it rounds, so more than one floating point value will result in the same ambiguous output string), but alas that module is GPL at the moment, and I'm not sure whether it has a way to force the decimal point issue.
I was going through the code of both of these and thanks ftoastr is under GPL, because I didn't quite understand it. Looking at the g_ascii_dtostr I've found that what is being done there and it amused me a bit. After couple of checks, conditions and whatnot, it get's the current decimal_point string from the localeconv() and replaces it with a '.', long story short, the only difference is that instead of strcpy(), there is memmove() used.
Not that I don't want to change the code, this is only some info I've found and I don't know whether to change the code and if, then to what to change it.
FYI looking at the GIT history, the glib fallback code is from 2001 and the new uselocale() based code from 2011, and has greater speed than the old code. So I still think we should just copy what glib has into a src/util/util.c Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 08/06/2012 04:35 PM, Daniel P. Berrange wrote:
On Mon, Aug 06, 2012 at 03:59:42PM +0200, Martin Kletzander wrote:
On 08/06/2012 02:52 PM, Eric Blake wrote:
On 08/06/2012 06:24 AM, Daniel P. Berrange wrote:
GLib has a g_ascii_dtostr() which forces uses of '.' as separator. Since GLib is LGPLv2+ licensed, we can just copy their impl, which actually uses GLibc's uselocale() if possible, otherwise has a fallback impl.
gnulib also has a module 'ftoastr' for printing an unambiguous representation of floating point (one problem with the default precision of %lf and friends is that it rounds, so more than one floating point value will result in the same ambiguous output string), but alas that module is GPL at the moment, and I'm not sure whether it has a way to force the decimal point issue.
I was going through the code of both of these and thanks ftoastr is under GPL, because I didn't quite understand it. Looking at the g_ascii_dtostr I've found that what is being done there and it amused me a bit. After couple of checks, conditions and whatnot, it get's the current decimal_point string from the localeconv() and replaces it with a '.', long story short, the only difference is that instead of strcpy(), there is memmove() used.
Not that I don't want to change the code, this is only some info I've found and I don't know whether to change the code and if, then to what to change it.
FYI looking at the GIT history, the glib fallback code is from 2001 and the new uselocale() based code from 2011, and has greater speed than the old code.
So I still think we should just copy what glib has into a src/util/util.c
I missed that non-fallback code because I cloned a wrong repo, sorry. I'll send next version (that will probably be still more like a RFC) later. Martin
participants (4)
-
Daniel P. Berrange
-
Eric Blake
-
Martin Kletzander
-
Michal Privoznik