
On Tue, Jun 20, 2023 at 01:25:23PM +0200, Michal Prívozník wrote:
On 6/19/23 17:02, Daniel P. Berrangé wrote:
On Mon, Jun 19, 2023 at 02:45:57PM +0200, Michal Prívozník wrote:
On 6/19/23 14:31, Daniel P. Berrangé wrote:
On Mon, Jun 19, 2023 at 02:03:48PM +0200, Michal Privoznik wrote:
It's weird that in 2023 there's no reliable and portable way to parse strings, but okay.
We started with strtol(). Except, it doesn't work the same on Linux and Windows. On Windows it behaves a bit different when it comes to parsing strings with base = 0. So we've switched to g_ascii_strtoll() which promised to solve this. And it did. Except, it introduced another problem. I don't really understand why, but I've seen random test failures and all of them claimed inability to parse a number (specifically <memory/> from the hard coded domain XML from test driver, which IMO is just a coincidence because it's the first number we parse).
What platforms are you seeing the failures on ? All platforms or just on Windows, or just some other specific one?
I've tried only Linux so far. Windows is out of question.
snip
On the Linux case get_C_locale is
static locale_t get_C_locale (void) { static gsize initialized = FALSE; static locale_t C_locale = NULL;
if (g_once_init_enter (&initialized)) { C_locale = newlocale (LC_ALL_MASK, "C", NULL); g_once_init_leave (&initialized, TRUE); }
return C_locale; }
and only way that could fail is if newlocale isn't threadsafe, and we have other code in libvirt calling newlocale at exact same time as the *first* call to get_C_locale.
Yeah, that's why I don't understand what's going on. Anyway, try running tests repeatedly (you can use oneliner from the commit message) and you'll run into the problem.
Yes, I've hit the problem with the error message
error: XML error: Invalid value '8388608' for element or attribute './memory[1]'
interestingly, I've also hit a few other error messages, some failures without error message (I suspect virsh simply crashed with no output), and most intriguing is that it 'virshtest' simply hangs when I run it directly.
<snip/>
So the cause of the hang is exceedingly obvious.
Very few glib APIs are safe to uses in between fork & exec, and we are (yet again) being burnt.
Yep, and switching to plain strtol() fixes this hang. But it does not solve ...
What I can't explain, however, is why we sometimes get an error message instead of a hang.
.. this. I bet it has something to do with fork() + exec() because when I set up logging and run those tests I can see which one is failing (with that "unable to parse NNN" message), but when I run it manually with the exact arguments I don't see any hiccups.
Ok, I've found the root cause. g_mutex_lock() sometimes splatters errno on contended locks: https://gitlab.gnome.org/GNOME/glib/-/issues/3034 most of the time this won't matter, as the APIs have a return value that identifies failure, and only then does the caller look at errno. The strtoll/g_ascii_strtoll functions are special though because we have to directly inspect errno to identify failure. The splattering of errno breaks this :-( IOW, as of today, the g_ascii_strtoll functions are all broken and unusable on Linux when GLib is built to natively use futex() instead of pthread_mutex_t. We can avoid this, however, by making virInitialize invoke g_ascii_strtoll() on a dummy string. This triggers the glib one-time initializer, and thereafter we shouldn't hit the locking bug because g_once_init_enter will go via a fast-path that avoids the futex syscall.
Third, we should move virExec and virFork and related helpers into a standalone file, so that it is very clear which bits of code are running in between fork+exec and thus need careful design to avoid anything except async safe POSIX.
I've tried to do this, but it's bigger task than I thought. Plenty of dependencies across the file.
No problem, not an urgent thing. Just something I think we ought to do at somepoint With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|