[PATCH v2 0/3] util/xml: make xml parser error reporting less terrible

libxml2 has terrible error reporting when failing to open files, which has led to significant wasted time on wild goose chases. This improves it by taking libxml2 out of the business of opening & reading files. Daniel P. Berrangé (3): util/xml: fix extraction of XML context util/xml: don't assume libxml2 has the filename of the document util/xml: open XML files before calling libxml2 src/util/virxml.c | 29 +++++++++++++------ .../broken-xml-invalid.x86_64-latest.err | 4 +-- .../nonexistent-file.x86_64-latest.err | 2 +- 3 files changed, 23 insertions(+), 12 deletions(-) -- 2.46.0

Currently given an input of '<dom\n' we emit an error: error: Failed to define domain from tests/qemuxmlconfdata/broken-xml-invalid.xml error: at line 2: Couldn't find end of Start Tag dom line 1 (null) ^ With this fix we emit: error: Failed to define domain from tests/qemuxmlconfdata/broken-xml-invalid.xml error: at line 2: Couldn't find end of Start Tag dom line 1 <dom ----^ Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- src/util/virxml.c | 4 ++++ tests/qemuxmlconfdata/broken-xml-invalid.x86_64-latest.err | 4 ++-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/src/util/virxml.c b/src/util/virxml.c index 51173303fe..91f736bf39 100644 --- a/src/util/virxml.c +++ b/src/util/virxml.c @@ -1047,6 +1047,10 @@ catchXMLError(void *ctx, const char *msg G_GNUC_UNUSED, ...) cur = ctxt->input->cur; base = ctxt->input->base; + /* skip backwards over NUL terminator */ + if ((cur > base) && *cur == '\0') + cur--; + /* skip backwards over any end-of-lines */ while ((cur > base) && ((*(cur) == '\n') || (*(cur) == '\r'))) cur--; diff --git a/tests/qemuxmlconfdata/broken-xml-invalid.x86_64-latest.err b/tests/qemuxmlconfdata/broken-xml-invalid.x86_64-latest.err index 35a1801371..a3bacd5d3a 100644 --- a/tests/qemuxmlconfdata/broken-xml-invalid.x86_64-latest.err +++ b/tests/qemuxmlconfdata/broken-xml-invalid.x86_64-latest.err @@ -1,3 +1,3 @@ ABS_SRCDIR/qemuxmlconfdata/broken-xml-invalid.xml:2: Couldn't find end of Start Tag dom line 1 -(null) -^ +<dom +----^ -- 2.46.0

On Tue, Oct 22, 2024 at 10:27:52 +0100, Daniel P. Berrangé wrote:
Currently given an input of '<dom\n' we emit an error:
error: Failed to define domain from tests/qemuxmlconfdata/broken-xml-invalid.xml error: at line 2: Couldn't find end of Start Tag dom line 1 (null) ^
With this fix we emit:
error: Failed to define domain from tests/qemuxmlconfdata/broken-xml-invalid.xml error: at line 2: Couldn't find end of Start Tag dom line 1 <dom ----^
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- src/util/virxml.c | 4 ++++ tests/qemuxmlconfdata/broken-xml-invalid.x86_64-latest.err | 4 ++-- 2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/src/util/virxml.c b/src/util/virxml.c index 51173303fe..91f736bf39 100644 --- a/src/util/virxml.c +++ b/src/util/virxml.c @@ -1047,6 +1047,10 @@ catchXMLError(void *ctx, const char *msg G_GNUC_UNUSED, ...) cur = ctxt->input->cur; base = ctxt->input->base;
+ /* skip backwards over NUL terminator */
Preferrably also note "why" we're skipping backwards.
+ if ((cur > base) && *cur == '\0') + cur--; + /* skip backwards over any end-of-lines */ while ((cur > base) && ((*(cur) == '\n') || (*(cur) == '\r'))) cur--; diff --git a/tests/qemuxmlconfdata/broken-xml-invalid.x86_64-latest.err b/tests/qemuxmlconfdata/broken-xml-invalid.x86_64-latest.err index 35a1801371..a3bacd5d3a 100644 --- a/tests/qemuxmlconfdata/broken-xml-invalid.x86_64-latest.err +++ b/tests/qemuxmlconfdata/broken-xml-invalid.x86_64-latest.err @@ -1,3 +1,3 @@ ABS_SRCDIR/qemuxmlconfdata/broken-xml-invalid.xml:2: Couldn't find end of Start Tag dom line 1 -(null) -^ +<dom +----^
Reviewed-by: Peter Krempa <pkrempa@redhat.com>

The libxml2 error handling gets the filename from a libxml2 struct, but it is better to not assume libxml2 knows the filename being parsed, as we might have simply provided it a pre-loaded string. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- src/util/virxml.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/src/util/virxml.c b/src/util/virxml.c index 91f736bf39..797aa6f6ca 100644 --- a/src/util/virxml.c +++ b/src/util/virxml.c @@ -46,6 +46,7 @@ /* Internal data to be passed to SAX parser and used by error handler. */ struct virParserData { int domcode; + const char *filename; }; @@ -1022,7 +1023,7 @@ static void catchXMLError(void *ctx, const char *msg G_GNUC_UNUSED, ...) { xmlParserCtxtPtr ctxt = (xmlParserCtxtPtr) ctx; - + struct virParserData *private = ctxt->_private; const xmlChar *cur, *base; unsigned int n, col; /* GCC warns if signed, because compared with sizeof() */ int domcode = VIR_FROM_XML; @@ -1030,6 +1031,10 @@ catchXMLError(void *ctx, const char *msg G_GNUC_UNUSED, ...) g_autofree char *contextstr = NULL; g_autofree char *pointerstr = NULL; const xmlError *lastError = xmlCtxtGetLastError(ctxt); + const char *filename = NULL; + + if (private) + filename = private->filename; /* conditions for error printing */ if (!ctxt || @@ -1040,9 +1045,8 @@ catchXMLError(void *ctx, const char *msg G_GNUC_UNUSED, ...) lastError->message == NULL) return; - if (ctxt->_private) - domcode = ((struct virParserData *) ctxt->_private)->domcode; - + if (private) + domcode = private->domcode; cur = ctxt->input->cur; base = ctxt->input->base; @@ -1083,10 +1087,10 @@ catchXMLError(void *ctx, const char *msg G_GNUC_UNUSED, ...) pointerstr = virBufferContentAndReset(&buf); - if (lastError->file) { + if (filename) { virGenericReportError(domcode, VIR_ERR_XML_DETAIL, _("%1$s:%2$d: %3$s%4$s\n%5$s"), - lastError->file, + filename, lastError->line, lastError->message, contextstr, @@ -1152,6 +1156,7 @@ virXMLParseHelper(int domcode, abort(); private.domcode = domcode; + private.filename = filename; pctxt->_private = &private; pctxt->sax->error = catchXMLError; -- 2.46.0

On Tue, Oct 22, 2024 at 10:27:53 +0100, Daniel P. Berrangé wrote:
The libxml2 error handling gets the filename from a libxml2 struct, but it is better to not assume libxml2 knows the filename being parsed, as we might have simply provided it a pre-loaded string.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- src/util/virxml.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
Reviewed-by: Peter Krempa <pkrempa@redhat.com>

Libxml2 has awful error reporting behaviour when reading files. When we fail to load a file from the test driver we see: $ virsh -c test:///wibble.xml I/O warning : failed to load external entity "/wibble.xml" error: failed to connect to the hypervisor error: XML error: failed to parse xml document '/wibble.xml' where the I/O warning line is something printed by libxml2 itself, which also lacks any useful detail. Switching to our own file reading code we can massively improve things: $ ./build/tools/virsh -c test:///wibble.xml error: failed to connect to the hypervisor error: Failed to open file '/wibble.xml': No such file or directory Using 10 MB as an upper limit on XML file size ought to be sufficient for any XML files libvirt is reading. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- src/util/virxml.c | 8 +++++--- tests/qemuxmlconfdata/nonexistent-file.x86_64-latest.err | 2 +- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/src/util/virxml.c b/src/util/virxml.c index 797aa6f6ca..deed258bb0 100644 --- a/src/util/virxml.c +++ b/src/util/virxml.c @@ -1142,6 +1142,7 @@ virXMLParseHelper(int domcode, xmlNodePtr rootnode; const char *docname; int parseFlags = XML_PARSE_NONET | XML_PARSE_NOWARNING; + g_autofree char *xmlStrPtr = NULL; if (filename) docname = filename; @@ -1165,10 +1166,11 @@ virXMLParseHelper(int domcode, } if (filename) { - xml = xmlCtxtReadFile(pctxt, filename, NULL, parseFlags); - } else { - xml = xmlCtxtReadDoc(pctxt, BAD_CAST xmlStr, url, NULL, parseFlags); + if (virFileReadAll(filename, 1024*1024*10, &xmlStrPtr) < 0) + return NULL; + xmlStr = xmlStrPtr; } + xml = xmlCtxtReadDoc(pctxt, BAD_CAST xmlStr, url, NULL, parseFlags); if (!xml) { if (virGetLastErrorCode() == VIR_ERR_OK) { diff --git a/tests/qemuxmlconfdata/nonexistent-file.x86_64-latest.err b/tests/qemuxmlconfdata/nonexistent-file.x86_64-latest.err index 0ddf1ea510..2aedf3eded 100644 --- a/tests/qemuxmlconfdata/nonexistent-file.x86_64-latest.err +++ b/tests/qemuxmlconfdata/nonexistent-file.x86_64-latest.err @@ -1 +1 @@ -XML error: failed to parse xml document 'ABS_SRCDIR/qemuxmlconfdata/nonexistent-file.xml' +Failed to open file 'ABS_SRCDIR/qemuxmlconfdata/nonexistent-file.xml': No such file or directory -- 2.46.0

On Tue, Oct 22, 2024 at 10:27:54AM +0100, Daniel P. Berrangé wrote:
Libxml2 has awful error reporting behaviour when reading files. When we fail to load a file from the test driver we see:
$ virsh -c test:///wibble.xml I/O warning : failed to load external entity "/wibble.xml" error: failed to connect to the hypervisor error: XML error: failed to parse xml document '/wibble.xml'
where the I/O warning line is something printed by libxml2 itself, which also lacks any useful detail.
Switching to our own file reading code we can massively improve things:
$ ./build/tools/virsh -c test:///wibble.xml error: failed to connect to the hypervisor error: Failed to open file '/wibble.xml': No such file or directory
Using 10 MB as an upper limit on XML file size ought to be sufficient for any XML files libvirt is reading.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- src/util/virxml.c | 8 +++++--- tests/qemuxmlconfdata/nonexistent-file.x86_64-latest.err | 2 +- 2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/src/util/virxml.c b/src/util/virxml.c index 797aa6f6ca..deed258bb0 100644 --- a/src/util/virxml.c +++ b/src/util/virxml.c @@ -1142,6 +1142,7 @@ virXMLParseHelper(int domcode, xmlNodePtr rootnode; const char *docname; int parseFlags = XML_PARSE_NONET | XML_PARSE_NOWARNING; + g_autofree char *xmlStrPtr = NULL;
if (filename) docname = filename; @@ -1165,10 +1166,11 @@ virXMLParseHelper(int domcode, }
if (filename) { - xml = xmlCtxtReadFile(pctxt, filename, NULL, parseFlags); - } else { - xml = xmlCtxtReadDoc(pctxt, BAD_CAST xmlStr, url, NULL, parseFlags); + if (virFileReadAll(filename, 1024*1024*10, &xmlStrPtr) < 0) + return NULL; + xmlStr = xmlStrPtr; } + xml = xmlCtxtReadDoc(pctxt, BAD_CAST xmlStr, url, NULL, parseFlags);
if (!xml) { if (virGetLastErrorCode() == VIR_ERR_OK) { diff --git a/tests/qemuxmlconfdata/nonexistent-file.x86_64-latest.err b/tests/qemuxmlconfdata/nonexistent-file.x86_64-latest.err index 0ddf1ea510..2aedf3eded 100644 --- a/tests/qemuxmlconfdata/nonexistent-file.x86_64-latest.err +++ b/tests/qemuxmlconfdata/nonexistent-file.x86_64-latest.err @@ -1 +1 @@ -XML error: failed to parse xml document 'ABS_SRCDIR/qemuxmlconfdata/nonexistent-file.xml' +Failed to open file 'ABS_SRCDIR/qemuxmlconfdata/nonexistent-file.xml': No such file or directory
I know it doesn't matter as the file is literally not there, but isn't that supposed to be @ABS_SRCDIR@ or similar? Anyway the series looks good, so: Reviewed-by: Richard W.M. Jones <rjones@redhat.com> I can't remember which (of probably many) bugs this relates to, but maybe one should be mentioned in the commit. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW

On Tue, Oct 22, 2024 at 10:33:43AM +0100, Richard W.M. Jones wrote:
On Tue, Oct 22, 2024 at 10:27:54AM +0100, Daniel P. Berrangé wrote:
Libxml2 has awful error reporting behaviour when reading files. When we fail to load a file from the test driver we see:
$ virsh -c test:///wibble.xml I/O warning : failed to load external entity "/wibble.xml" error: failed to connect to the hypervisor error: XML error: failed to parse xml document '/wibble.xml'
where the I/O warning line is something printed by libxml2 itself, which also lacks any useful detail.
Switching to our own file reading code we can massively improve things:
$ ./build/tools/virsh -c test:///wibble.xml error: failed to connect to the hypervisor error: Failed to open file '/wibble.xml': No such file or directory
Using 10 MB as an upper limit on XML file size ought to be sufficient for any XML files libvirt is reading.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- src/util/virxml.c | 8 +++++--- tests/qemuxmlconfdata/nonexistent-file.x86_64-latest.err | 2 +- 2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/src/util/virxml.c b/src/util/virxml.c index 797aa6f6ca..deed258bb0 100644 --- a/src/util/virxml.c +++ b/src/util/virxml.c @@ -1142,6 +1142,7 @@ virXMLParseHelper(int domcode, xmlNodePtr rootnode; const char *docname; int parseFlags = XML_PARSE_NONET | XML_PARSE_NOWARNING; + g_autofree char *xmlStrPtr = NULL;
if (filename) docname = filename; @@ -1165,10 +1166,11 @@ virXMLParseHelper(int domcode, }
if (filename) { - xml = xmlCtxtReadFile(pctxt, filename, NULL, parseFlags); - } else { - xml = xmlCtxtReadDoc(pctxt, BAD_CAST xmlStr, url, NULL, parseFlags); + if (virFileReadAll(filename, 1024*1024*10, &xmlStrPtr) < 0) + return NULL; + xmlStr = xmlStrPtr; } + xml = xmlCtxtReadDoc(pctxt, BAD_CAST xmlStr, url, NULL, parseFlags);
if (!xml) { if (virGetLastErrorCode() == VIR_ERR_OK) { diff --git a/tests/qemuxmlconfdata/nonexistent-file.x86_64-latest.err b/tests/qemuxmlconfdata/nonexistent-file.x86_64-latest.err index 0ddf1ea510..2aedf3eded 100644 --- a/tests/qemuxmlconfdata/nonexistent-file.x86_64-latest.err +++ b/tests/qemuxmlconfdata/nonexistent-file.x86_64-latest.err @@ -1 +1 @@ -XML error: failed to parse xml document 'ABS_SRCDIR/qemuxmlconfdata/nonexistent-file.xml' +Failed to open file 'ABS_SRCDIR/qemuxmlconfdata/nonexistent-file.xml': No such file or directory
I know it doesn't matter as the file is literally not there, but isn't that supposed to be @ABS_SRCDIR@ or similar?
No, for some reason in our test code we replace the literal string "ABS_SRCDIR" without the common markers. I guess it was unique enough, in practice, for tests.
Anyway the series looks good, so:
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
I can't remember which (of probably many) bugs this relates to, but maybe one should be mentioned in the commit.
It was something that libguestfs hit back in august IIRC but I couldn't find the details. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (3)
-
Daniel P. Berrangé
-
Peter Krempa
-
Richard W.M. Jones