On 04/30/2014 03:47 AM, Peter Krempa wrote:
> errno = 0;
> val = strtoul(s, &p, base); /* exempt from syntax-check */
> +
> + /* This one's tricky. We _want_ to allow "-1" as shorthand for
Well we want to allow it in special cases ... I'd rather see a fix for
those instances rather than having a broken-by-design wrapper that
copies strange semantics of strtoul.
Okay, so I'll definitely prepare two variants of each of the three
"parse to unsigned" wrappers - one that allows '-', the other rejects
it. Callers can choose which of the two they want.
> + * UINT_MAX, but strtoul() treats "-1" as ULONG_MAX; casting from
> + * ulong back to uint changes the values only on platforms where
> + * long is a larger size. */
> + if ((val & 0xffffffff00000000ULL) == 0xffffffff00000000ULL &&
> + memchr(s, '-', p - s))
> + val &= 0xffffffff;
Uhh ... I think this makes the wrapper even worse. We might want special
handlers for -1 that makes sense in some cases. If I parsed the above
statement correctly any negative value passed to this function would be
returned as UINT_MAX. That would convert the weird semantics of strtoul
to even weirder one.
No. The intent is as follows. Parsing to 'long' or 'unsigned long'
behaves identically to '[unsigned] long long' on 64-bit platforms and to
'[unsigned] int' on 32-bit platforms (really, that just means using
strtol/strtoul as-is). Parsing to '[unsigned] long long' behaves
identically across platforms (really, just using strtoll/strtoull
as-is). Parsing to '[unsigned] int' behaves as though using a 32-bit
strtol (on 32-bit platforms, it is easy; on 64-bit platforms we have to
do some fudge work). Whether we reject '-' on the unsigned parse is
determined by using a new function, so we have three variants (signed,
unsigned with sign, unsigned without sign) for three types (int, long,
long long).
Remember, the POSIX requirements are that strtol() accepts numbers in
the range LONG_MIN to LONG_MAX, while strtoul() accepts numbers in the
range LONG_MIN to ULONG_MAX (yes, strtoul has a larger range).
The interesting boundary cases for int are:
Anything between 0 and 2147483647 parses as itself, for all three int
variants
Anything between 2147483648 and 4294967295 parses as itself for both
unsigned int variants, but rejected as out-of-bounds for signed int
(this is the behavior required of 32-bit strtol/strtoul)
Anything at 4294967296 or larger is rejected as out-of-bounds for all
three int variants
-0 is parsed as 0 for signed int and unsigned int with sign, and is
rejected as out-of-bounds for unsigned int without sign
Anything between -1 and -2147483647 parses as itself for signed int,
parses as the negation (4294967295 through 2147483649) for unsigned int
with sign, and is rejected as out-of-bounds for unsigned int without sign
-2147483648 parses as itself for signed int, parses as 2147483648 for
unsigned int with sign, and is rejected as out-of-bounds for unsigned
int without sign
Anything between -2147483649 and -4294967295 is rejected as out of
bounds for signed int and unsigned int without sign, and is parsed as
the negation (2147483647 through 1) for unsigned int with sign
Anything at -4294967296 or beyond is rejected as out-of-bounds for all
three int variants
That's why I'm working up the test suite before pushing anything; you'll
see exactly in the testsuite what I _plan_ to accept, and I may have to
tweak the code to match that.
> +
> err = (errno || (!end_ptr && *p) || p == s || (unsigned int) val !=
> val);
> if (end_ptr)
> *end_ptr = p;
>
I think that our wrapper here should behave in a saner way than strtoul
for most of the places in the code and we should eventually add wrappers
that handle -1 as UINT_MAX and use them just in special cases.
Okay, I'm using that as guidance on the patches I'm preparing today.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org