Re: [libvirt] [PATCH v3 1/3] vsh: Add API for printing tables.

22 Aug 2018


      On Wed, Aug 22, 2018 at 12:50:12PM +0200, Simon Kobyda wrote:
...
On Tue, 2018-08-21 at 11:46 +0100, Daniel P. Berrangé wrote:
...
On Tue, Aug 21, 2018 at 12:27:34PM +0200, Michal Privoznik wrote:
...
On 08/21/2018 11:18 AM, Simon Kobyda wrote:
...
On Thu, 2018-08-16 at 12:28 +0100, Daniel P. Berrangé wrote:
...
On Thu, Aug 16, 2018 at 12:56:24PM +0200, Simon Kobyda wrote:
...
After asking around I have found the right solution that we
need to
use
for measuring string width.  mbstowcs()/wcswidth() will get the
answer
wrong wrt zero-width characters, combining characters, non-
printable
characters, etc. We need to use the libunistring library:
https://www.gnu.org/software/libunistring/manual/libunistring.html#uniwidth_...
...
I've tried what you've suggested, but it seems that it doesn't
work
well with all unicode characters. I'm looking into the code of
the
library, and each function uN_strwidth calls function uN_width,
and
that function calls uc_width for calculation of width of
characters.
And if we look into the code of uc_width here:
http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/uniwidth/width...
...
it seems that this library is limited only to certain unicodes,
e.g.:
hangul characters, angle brackets, CJK characters... But it
doesn't
cover all multiple-width characters. Example: I try to throw any
emoji
(e.g. 🙉, 🦀, 🏙), it returns width of 1 column for each charact
er, nevertheless these characters have width of 2 columns on
terminal.
BTW, it seems unistring library imports those funcions from
gnulib.
I guess the only option then is to try smartcols [1]. If it is good
for
util-linux it's going to be good for us too. Although, I'd prefer
to
have our own wrappers over their API.
https://github.com/karelzak/util-linux/tree/master/libsmartcols
The util-linux code does something that uses mbstowcs / wcwidth to
convert the characters and count their width, sort of like the
original
version of this patch. They have further code that decides to convert
certain unicode characters into "\xNN" escaped sequences, which
avoids
the problems I raised wrt non-printable strings.
https://github.com/karelzak/util-linux/blob/master/lib/mbsalign.c
So we could pull that helper API into our code, since its LGPL
loicensed.
I'm unclear if this correctly handles all the cases or not though as
there's no unit tests for it in util-linux AFACT.
Really the only way for us to be sure is to provide a unit test which
stresses our the code with a variety of unicode input strings.
About unit tests. Right now i've got tests for non-pritnable, zero-
width, combining characters and opposite (rigth to left) writing.
Anybody got any idea what else could be problematic with
mbstowcs()/wcswidth(), and therefore tested?
I think that sounds reasonable enough for now - passing such tests would
already be massively better than the code  that exists today with strlen()

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|