On 6/19/23 14:31, Daniel P. Berrangé wrote:
On Mon, Jun 19, 2023 at 02:03:48PM +0200, Michal Privoznik wrote:
> It's weird that in 2023 there's no reliable and portable way to
> parse strings, but okay.
>
> We started with strtol(). Except, it doesn't work the same on
> Linux and Windows. On Windows it behaves a bit different when it
> comes to parsing strings with base = 0. So we've switched to
> g_ascii_strtoll() which promised to solve this. And it did.
> Except, it introduced another problem. I don't really understand
> why, but I've seen random test failures and all of them claimed
> inability to parse a number (specifically <memory/> from the
> hard coded domain XML from test driver, which IMO is just a
> coincidence because it's the first number we parse).
What platforms are you seeing the failures on ? All platforms or just
on Windows, or just some other specific one?
I've tried only Linux so far. Windows is out of question.
> The nature of the problem suggests there's a race condition
> somewhere. I suspected glib but I don't know their code well to
> prove it.
The code in question is:
guint64
g_ascii_strtoull (const gchar *nptr,
gchar **endptr,
guint base)
{
#if defined(USE_XLOCALE) && defined(HAVE_STRTOULL_L)
return strtoull_l (nptr, endptr, base, get_C_locale ());
#else
gboolean negative;
guint64 result;
result = g_parse_long_long (nptr, (const gchar **) endptr, base, &negative);
/* Return the result of the appropriate sign. */
return negative ? -result : result;
#endif
}
gint64
g_ascii_strtoll (const gchar *nptr,
gchar **endptr,
guint base)
{
#if defined(USE_XLOCALE) && defined(HAVE_STRTOLL_L)
return strtoll_l (nptr, endptr, base, get_C_locale ());
#else
gboolean negative;
guint64 result;
result = g_parse_long_long (nptr, (const gchar **) endptr, base, &negative);
if (negative && result > (guint64) G_MININT64)
{
errno = ERANGE;
return G_MININT64;
}
else if (!negative && result > (guint64) G_MAXINT64)
{
errno = ERANGE;
return G_MAXINT64;
}
else if (negative)
return - (gint64) result;
else
return (gint64) result;
#endif
}
On Linux, we should take the first branch on the #ifdef, which
is calling strtoll_l just as your patch is switching us to.
I guess Windows would use the g_parse_long_long fallback. I don't
see anything in that code which could be non-thread safe. It is
just a copy of strtol cdoe from glibc.
On the Linux case get_C_locale is
static locale_t
get_C_locale (void)
{
static gsize initialized = FALSE;
static locale_t C_locale = NULL;
if (g_once_init_enter (&initialized))
{
C_locale = newlocale (LC_ALL_MASK, "C", NULL);
g_once_init_leave (&initialized, TRUE);
}
return C_locale;
}
and only way that could fail is if newlocale isn't threadsafe, and
we have other code in libvirt calling newlocale at exact same time
as the *first* call to get_C_locale.
Yeah, that's why I don't understand what's going on. Anyway, try running
tests repeatedly (you can use oneliner from the commit message) and
you'll run into the problem.
I've tried to debug the problem over weekend but was really
unsuccessful. Firstly, I suspected that glib version of pthread_once()
was broken. So I've rewritten get_C_locale() to use pthread_once() but
that didn't help. Which tends to point into direction of glibc,
supported by the fact that glibc impl of newlocale() does indeed have a
global RW lock. But I haven't found anything obviously wrong there either.
>
> Anyway, using strtoll_l() directly and taking glib out of the
> picture solves the issue.
I'm pretty surprised if it fixes anything ! I'm a little concerned
we might have just masked a bug that's still hiding elsewhere.
I understand that. And if you have any idea how to get to the root of
the problem, I'm more than happy to try it.
Michal