On Tue, Aug 28, 2018 at 02:10:55PM +0200, Erik Skultety wrote:
> On Tue, Aug 28, 2018 at 11:35:02AM +0100, Daniel P. Berrangé wrote:
> > On Mon, Aug 27, 2018 at 05:50:22PM +0200, Simon Kobyda wrote:
> > > On Fri, 2018-08-24 at 12:10 +0200, Michal Privoznik wrote:
> > > > On 08/24/2018 11:36 AM, Daniel P. Berrangé wrote:
> > > > > On Fri, Aug 24, 2018 at 10:59:04AM +0200, Michal Privoznik
wrote:
> > > > >
> > > > > But first fix the build failures :-)
> > > > >
> > > > > On CentOS / RHEL:
> > > > >
> > > > >
https://travis-ci.org/libvirt/libvirt/jobs/420024141
> > > > >
> > > > >
> > > > > 4)
> > > > > testUnicode
.
> > > > > ..
> > > > > Offset 30
> > > > > Expect [государство
> > > > > -----------------------------------------
> > > > > 1 fedora28 running
> > > > > 2 🙊🙉🙈rhel7.5🙆🙆🙅]
> > > > > Actual
> > > > > [
> > > > > государство
> > > > >
-----------------------------------------------------------------
> > > > > ------------------------------------------------------------
> > > > > 1 fedora28
> > > > > running
> > > > > 2
\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xffrhel7.5\xff\x
> > > > > ff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff]
> > > > >
> > > >
> > > > Okay, this is probably due to ancient gcc that's there (4.8.0)
and is
> > > > supposed to be fixed by adding -finput-charset= onto gcc command
> > > > line.
> > > > Haven't tested it though.
> > >
> > > I tried but it didn't help. From what I understood, CentOS has
problems
> > > with unicodes such as 🙊🙉🙈🙆🙆🙅. On that system, it can convert
> > > any of those characters to wchar_t successfully and properly, but when
> > > we pass that character to iswprint, it returns 0 (considers those wide
> > > characters nonprintable).
> >
> > On the plus side, it appears that when this problem hits, the code is
> > still correctly doing the column alignment taking account of these
> > unexpected escape sequences.
> >
> > So how about storing 2 sets of expected data for this test case.
> >
Two is not enough. My clang 5.0.1 produces a test that displays the
monkeys correctly, but does not count their width properly:
$ VIR_TEST_RANGE=4 VIR_TEST_DEBUG=1 ./run tests/vshtabletest
TEST: vshtabletest
4) testUnicode ...
Offset 24
Expect [ государство
-----------------------------------------
1 fedora28 ]
Actual [государство
-----------------------------------
1 fedora28]
... FAILED
> > In the unit test then call iswprint() to figure out which of the
> > two expected data sets to compare against.
>
> How does it help us during runtime when someone uses such characters in a
> domain's name? It would still return a row consisting of escape sequences.
Not necessarily, see above.
> So
> what's the point of providing 2 sets of expected data for a test just so it can
> pass, rather than use unicode characters we know would pass and everything else
> is a platform limitation which is out of our hands.
I still see a benefit in having testUnicodeBasic that passes everywhere
(does it?), and conditionally running the monkey test on platforms where
iswprint returns the proper results.
Why? To see that glibc is new enough to support it? One would assume that if it
works for n (given your example I'm so sure it actually does...), it would work
for n+1 too, so I still don't see the point in this specific test case.
Thanks for the example though.
Erik