Re: [libvirt] [PATCH v3 1/3] vsh: Add API for printing tables.

Wednesday, 22 August 2018

On Tue, 2018-08-21 at 11:46 +0100, Daniel P. Berrangé wrote:
...
 On Tue, Aug 21, 2018 at 12:27:34PM +0200, Michal Privoznik wrote:
 > On 08/21/2018 11:18 AM, Simon Kobyda wrote:
 > > On Thu, 2018-08-16 at 12:28 +0100, Daniel P. Berrangé wrote:
 > > > On Thu, Aug 16, 2018 at 12:56:24PM +0200, Simon Kobyda wrote:
 > > > > 
 > > > 
 > > > After asking around I have found the right solution that we
 > > > need to
 > > > use
 > > > for measuring string width.  mbstowcs()/wcswidth() will get the
 > > > answer
 > > > wrong wrt zero-width characters, combining characters, non-
 > > > printable
 > > > characters, etc. We need to use the libunistring library:
 > > > 
 > > >   
 > > > 
https://www.gnu.org/software/libunistring/manual/libunistring.html#uniwid...
...
 > > > 
 > > > 
 > > 
 > > I've tried what you've suggested, but it seems that it doesn't
 > > work
 > > well with all unicode characters. I'm looking into the code of
 > > the
 > > library, and each function uN_strwidth calls function uN_width,
 > > and
 > > that function calls uc_width for calculation of width of
 > > characters.
 > > And if we look into the code of uc_width here:
 > > 
 > > 
http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/uniwidth/wi...
...
 > > it seems that this library is limited only to certain
unicodes,
 > > e.g.:
 > > hangul characters, angle brackets, CJK characters... But it
 > > doesn't
 > > cover all multiple-width characters. Example: I try to throw any
 > > emoji
 > > (e.g. 🙉, 🦀, 🏙), it returns width of 1 column for each charact
 > > er, nevertheless these characters have width of 2 columns on
 > > terminal.
 > > 
 > > BTW, it seems unistring library imports those funcions from
 > > gnulib.
 > 
 > I guess the only option then is to try smartcols [1]. If it is good
 > for
 > util-linux it's going to be good for us too. Although, I'd prefer
 > to
 > have our own wrappers over their API.
 > 
 > https://github.com/karelzak/util-linux/tree/master/libsmartcols

 The util-linux code does something that uses mbstowcs / wcwidth to
 convert the characters and count their width, sort of like the
 original
 version of this patch. They have further code that decides to convert
 certain unicode characters into "\xNN" escaped sequences, which
 avoids
 the problems I raised wrt non-printable strings.

    https://github.com/karelzak/util-linux/blob/master/lib/mbsalign.c

 So we could pull that helper API into our code, since its LGPL
 loicensed.
 I'm unclear if this correctly handles all the cases or not though as
 there's no unit tests for it in util-linux AFACT.

 Really the only way for us to be sure is to provide a unit test which
 stresses our the code with a variety of unicode input strings. 
About unit tests. Right now i've got tests for non-pritnable, zero-
width, combining characters and opposite (rigth to left) writing.
Anybody got any idea what else could be problematic with
mbstowcs()/wcswidth(), and therefore tested?

Simon Kobyda.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [PATCH v3 1/3] vsh: Add API for printing tables.