On 09/28/2010 04:06 PM, Stefan Berger wrote:
Eric Blake <eblake@redhat.com> wrote on
09/28/2010
03:26:48 PM:
> [image removed]
>
> Re: [libvirt] [PATCH v2 3/5] Extend nwfilter schema to
accept
> comment attributes
>
> Eric Blake
>
> to:
>
> Stefan Berger
>
> 09/28/2010 03:27 PM
>
> Cc:
>
> libvir-list
>
> On 09/28/2010 04:28 AM, Stefan Berger wrote:
> >> okay. It also leaves out 8-bit bytes - could
that be
a problem for i18n
> >
> >> where people want comments with native-language
accented
characters?
> >> That is, are we being too strict here? Maybe a
better
pattern would be
> >> to reject specific non-printing ASCII bytes we
want to avoid,
assuing
> >> you can use escape sequences like [^\001]?
> >
> > Looking at
> >
> > http://www.asciitable.com/
> >
> > I should probably include 0x20-0x7E and 128-175,
224-238 - maybe
even
> > more? So the regex then becomes
> >
> >
[ -~€-¯à-î]{0,256}
>
> True ASCII is strictly 7-bit; any locale where isprint()
returns true
on
> 8-bit bytes is a superset single-byte encoding, such as
ISO-8859-1,
or
> 'extended ascii' from the URL you posted above. But I'm
also
thinking
> about multi-byte encodings, like UTF-8, where we cannot a
priori write
a
> regex that will accept all valid Unicode printable
characters, in
part
> because you have to look at more than one byte at a time
to determine
if
> you have a printable character. Which goes back to my
suggestion
of an
> inverse charset - rejecting bytes that are known to be
non-printable
> ASCII, and letting everything else whether or not it is
is a printable
> byte sequence in the current locale. So what about this
idea:
exclude
> control characters except for tab, and let space and
everything after
> through (I don't know if it needs to be adjusted to also
reject �):
>
> [^-
-]{0,256}
Fine by me. We may just give the impression of
accepting
unicode while the code does not handle it.
... except that xmllint does not like  with or without
preceding ^ (among other things):
xmllint --relaxng ./docs/schemas/nwfilter.rng
tests/nwfilterxml2xmlout/comment-test.xml
./docs/schemas/nwfilter.rng:862: parser error : xmlParseCharRef:
invalid xmlChar value 1
<param
name="pattern">[^-
-]{0,256}</param>
^
./docs/schemas/nwfilter.rng:862: parser error : CharRef: invalid
hexadecimal value
<param
name="pattern">[^-
-]{0,256}</param>
^
./docs/schemas/nwfilter.rng:862: parser error : xmlParseCharRef:
invalid xmlChar value 0
<param
name="pattern">[^-
-]{0,256}</param>
^
./docs/schemas/nwfilter.rng:862: parser error : CharRef: invalid
hexadecimal value
<param
name="pattern">[^-
-]{0,256}</param>
^
./docs/schemas/nwfilter.rng:862: parser error : xmlParseCharRef:
invalid xmlChar value 0
<param
name="pattern">[^-
-]{0,256}</param>
^
./docs/schemas/nwfilter.rng:862: parser error : CharRef: invalid
hexadecimal value
<param
name="pattern">[^-
-]{0,256}</param>
^
./docs/schemas/nwfilter.rng:862: parser error : xmlParseCharRef:
invalid xmlChar value 0
<param
name="pattern">[^-
-]{0,256}</param>
Stefan