[libvirt] [RFC PATCH] build: detect doc build errors

I'm still stumped by xsltproc complaining about not being a valid XML entity, hence the (hackish) exemption in docs/Makefile.am that adds --html for a couple of .html.in files. But for the remaining files, this does make input validation stricter, and caught several bugs. Hence, this is an RFC (either we live with my hack that caught all the issues in the prior patch, or someone with more xsltproc knowledge than me will step in and teach it how to resolve html entities while processing the documents as xml instead of html). * docs/Makefile.am (maintainer-clean-local): Remove generated docs in VPATH build. (%.html.tmp): Don't use looser --html; our input should be strict xhtml. HACK - use --html when entities like are involved. (html/index.html): Exit on formatting problems. (rebuild): Run full doc build on request. --- docs/Makefile.am | 24 ++++++++++++++---------- 1 files changed, 14 insertions(+), 10 deletions(-) diff --git a/docs/Makefile.am b/docs/Makefile.am index db4bc59..2d1afe4 100644 --- a/docs/Makefile.am +++ b/docs/Makefile.am @@ -123,7 +123,7 @@ internals/%.html.tmp: internals/%.html.in subsite.xsl page.xsl sitemap.html.in echo "Generating $@"; \ $(MKDIR_P) "$(builddir)/internals"; \ name=`echo $@ | sed -e 's/.tmp//'`; \ - $(XSLTPROC) --stringparam pagename $$name --nonet --html \ + $(XSLTPROC) --stringparam pagename $$name --nonet \ $(top_srcdir)/docs/subsite.xsl $< > $@ \ || { rm $@ && exit 1; }; fi @@ -131,7 +131,8 @@ internals/%.html.tmp: internals/%.html.in subsite.xsl page.xsl sitemap.html.in @if [ -x $(XSLTPROC) ] ; then \ echo "Generating $@"; \ name=`echo $@ | sed -e 's/.tmp//'`; \ - $(XSLTPROC) --stringparam pagename $$name --nonet --html \ + $(XSLTPROC) --stringparam pagename $$name --nonet \ + $$(grep -qE '&(nbsp|uuml|mdash);' $< && printf %s --html) \ $(top_srcdir)/docs/site.xsl $< > $@ \ || { rm $@ && exit 1; }; fi @@ -147,21 +148,22 @@ internals/%.html.tmp: internals/%.html.in subsite.xsl page.xsl sitemap.html.in html/index.html: libvirt-api.xml newapi.xsl page.xsl sitemap.html.in - -@if [ -x $(XSLTPROC) ] ; then \ + @if [ -x $(XSLTPROC) ] ; then \ echo "Rebuilding the HTML pages from the XML API" ; \ $(XSLTPROC) --nonet -o $(srcdir)/ \ $(srcdir)/newapi.xsl $(srcdir)/libvirt-api.xml ; fi - -@if test -x $(XMLLINT) && test -x $(XMLCATALOG) ; then \ - if $(XMLCATALOG) '$(XML_CATALOG_FILE)' "-//W3C//DTD XHTML 1.0 Strict//EN" \ - > /dev/null ; then \ + @if test -x $(XMLLINT) && test -x $(XMLCATALOG) ; then \ + if $(XMLCATALOG) '$(XML_CATALOG_FILE)' \ + "-//W3C//DTD XHTML 1.0 Strict//EN" > /dev/null ; then \ echo "Validating the resulting XHTML pages" ; \ SGML_CATALOG_FILES='$(XML_CATALOG_FILE)' \ - $(XMLLINT) --catalogs --nonet --valid --noout $(srcdir)/html/*.html ; \ + $(XMLLINT) --catalogs --nonet --valid --noout $(srcdir)/html/*.html \ + || { rm $(srcdir)/$@ && exit 1; }; \ else echo "missing XHTML1 DTD" ; fi ; fi $(addprefix $(srcdir)/,$(devhelphtml)): $(srcdir)/libvirt-api.xml $(devhelpxsl) -@echo Rebuilding devhelp files - -@if [ -x $(XSLTPROC) ] ; then \ + @if [ -x $(XSLTPROC) ] ; then \ $(XSLTPROC) --nonet -o $(srcdir)/devhelp/ \ $(top_srcdir)/docs/devhelp/devhelp.xsl $(srcdir)/libvirt-api.xml ; fi @@ -183,9 +185,11 @@ clean-local: rm -f *~ *.bak *.hierarchy *.signals *-unused.txt *.html maintainer-clean-local: clean-local - rm -rf $(srcdir)/libvirt-api.xml $(srcdir)/libvirt-refs.xml todo.html.in + rm -rf $(srcdir)/libvirt-api.xml $(srcdir)/libvirt-refs.xml \ + todo.html.in $(srcdir)/*.html $(srcdir)/devhelp/*.html \ + $(srcdir)/html/*.html $(srcdir)/internals/*.html -rebuild: api all +rebuild: maintainer-clean-local api all install-data-local: $(mkinstalldirs) $(DESTDIR)$(HTML_DIR) -- 1.7.4

On Fri, Apr 01, 2011 at 04:06:50PM -0600, Eric Blake wrote:
I'm still stumped by xsltproc complaining about not being a valid XML entity, hence the (hackish) exemption in docs/Makefile.am that adds --html for a couple of .html.in files. But for the remaining files, this does make input validation stricter, and caught several bugs.
The only solution would be to add a DTD to the html.in which use any entity beyond the 5 ones hardcoded in the parser (lt, gt, amp, quot and apos.
Hence, this is an RFC (either we live with my hack that caught all the issues in the prior patch, or someone with more xsltproc knowledge than me will step in and teach it how to resolve html entities while processing the documents as xml instead of html).
We would need to add <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> to all the .html.in for consistency, since we now expect them to be well formed XML and possibly using html entities.
* docs/Makefile.am (maintainer-clean-local): Remove generated docs in VPATH build. (%.html.tmp): Don't use looser --html; our input should be strict xhtml. HACK - use --html when entities like are involved. (html/index.html): Exit on formatting problems. (rebuild): Run full doc build on request. --- docs/Makefile.am | 24 ++++++++++++++---------- 1 files changed, 14 insertions(+), 10 deletions(-)
diff --git a/docs/Makefile.am b/docs/Makefile.am index db4bc59..2d1afe4 100644 --- a/docs/Makefile.am +++ b/docs/Makefile.am @@ -123,7 +123,7 @@ internals/%.html.tmp: internals/%.html.in subsite.xsl page.xsl sitemap.html.in echo "Generating $@"; \ $(MKDIR_P) "$(builddir)/internals"; \ name=`echo $@ | sed -e 's/.tmp//'`; \ - $(XSLTPROC) --stringparam pagename $$name --nonet --html \ + $(XSLTPROC) --stringparam pagename $$name --nonet \ $(top_srcdir)/docs/subsite.xsl $< > $@ \ || { rm $@ && exit 1; }; fi
@@ -131,7 +131,8 @@ internals/%.html.tmp: internals/%.html.in subsite.xsl page.xsl sitemap.html.in @if [ -x $(XSLTPROC) ] ; then \ echo "Generating $@"; \ name=`echo $@ | sed -e 's/.tmp//'`; \ - $(XSLTPROC) --stringparam pagename $$name --nonet --html \ + $(XSLTPROC) --stringparam pagename $$name --nonet \ + $$(grep -qE '&(nbsp|uuml|mdash);' $< && printf %s --html) \ $(top_srcdir)/docs/site.xsl $< > $@ \ || { rm $@ && exit 1; }; fi
@@ -147,21 +148,22 @@ internals/%.html.tmp: internals/%.html.in subsite.xsl page.xsl sitemap.html.in
html/index.html: libvirt-api.xml newapi.xsl page.xsl sitemap.html.in - -@if [ -x $(XSLTPROC) ] ; then \ + @if [ -x $(XSLTPROC) ] ; then \ echo "Rebuilding the HTML pages from the XML API" ; \ $(XSLTPROC) --nonet -o $(srcdir)/ \ $(srcdir)/newapi.xsl $(srcdir)/libvirt-api.xml ; fi - -@if test -x $(XMLLINT) && test -x $(XMLCATALOG) ; then \ - if $(XMLCATALOG) '$(XML_CATALOG_FILE)' "-//W3C//DTD XHTML 1.0 Strict//EN" \ - > /dev/null ; then \ + @if test -x $(XMLLINT) && test -x $(XMLCATALOG) ; then \ + if $(XMLCATALOG) '$(XML_CATALOG_FILE)' \ + "-//W3C//DTD XHTML 1.0 Strict//EN" > /dev/null ; then \ echo "Validating the resulting XHTML pages" ; \ SGML_CATALOG_FILES='$(XML_CATALOG_FILE)' \ - $(XMLLINT) --catalogs --nonet --valid --noout $(srcdir)/html/*.html ; \ + $(XMLLINT) --catalogs --nonet --valid --noout $(srcdir)/html/*.html \ + || { rm $(srcdir)/$@ && exit 1; }; \ else echo "missing XHTML1 DTD" ; fi ; fi
$(addprefix $(srcdir)/,$(devhelphtml)): $(srcdir)/libvirt-api.xml $(devhelpxsl) -@echo Rebuilding devhelp files - -@if [ -x $(XSLTPROC) ] ; then \ + @if [ -x $(XSLTPROC) ] ; then \ $(XSLTPROC) --nonet -o $(srcdir)/devhelp/ \ $(top_srcdir)/docs/devhelp/devhelp.xsl $(srcdir)/libvirt-api.xml ; fi
@@ -183,9 +185,11 @@ clean-local: rm -f *~ *.bak *.hierarchy *.signals *-unused.txt *.html
maintainer-clean-local: clean-local - rm -rf $(srcdir)/libvirt-api.xml $(srcdir)/libvirt-refs.xml todo.html.in + rm -rf $(srcdir)/libvirt-api.xml $(srcdir)/libvirt-refs.xml \ + todo.html.in $(srcdir)/*.html $(srcdir)/devhelp/*.html \ + $(srcdir)/html/*.html $(srcdir)/internals/*.html
-rebuild: api all +rebuild: maintainer-clean-local api all
install-data-local: $(mkinstalldirs) $(DESTDIR)$(HTML_DIR) -- 1.7.4
I think I will check this in a few hours while preparing the release, depending on the size of the resulting diff, I may put this in, as is, add the DTD and go full XML or keep as-is ... Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/
participants (2)
-
Daniel Veillard
-
Eric Blake