On 08/10/2012 09:31 AM, Daniel P. Berrange wrote:
On Fri, Aug 10, 2012 at 08:58:04AM -0600, Eric Blake wrote:
> On 08/10/2012 07:47 AM, Daniel P. Berrange wrote:
>> From: "Daniel P. Berrange" <berrange(a)redhat.com>
>>
>> The current virRandomBits() API is only usable if the caller wants
>> a random number in the range [0, (n-1)] where n is a power of two.
>> This adds a virRandom() API which works for upper limits which are
>> not a power of two. It works by using virRandomBits() to generate
>> a number to the next nearest power of 2 limit, and then scales it
>> down.
>
> I like the idea, but NAK to this version of the patch. That algorithm
> is not uniform.
>
>> + while (tmp) {
>> + tmp >>= 1;
>> + bits++;
>> + }
>
> Slow. I would suggest using gcc's __builtin_clz (the way qemu does),
> except that gnulib doesn't yet have a module exposing it. I guess that
> means I'd better write a quick gnulib module. Once you have clz(), this
> is much simpler as:
>
> assert(max);
> bits = 64 - clz(max);
This part is independently useful, while we still discuss the algorithm;
I'll have the gnulib module complete later today.
> that is, you have a 50% chance of getting 0, but only a 25%
chance of
> getting 1 or 2 - that is not a uniform distribution. The ONLY way to do
> this is:
>
> do {
> ret = virRandomBits(bits);
> } while (ret >= max);
>
> Then, with the same input of max == 3, you have a 25% chance of calling
> virRandomBits at least twice, a 12.5% chance of calling it at least
> three times, and so on, but you are guaranteed that you will eventually
> get a result that is uniformly distributed.
Hmm, it was the looping that I used originally, but I wanted to get
something that had a deterministic upper bound, instead of just a
probablistic bound.
Oh, I see your point - you don't want to drag on for a long time on the
rare occasions where probabilities are against you.
The alternative I wanted to use was drand48_t() which returns a double
in the range 0.0 -> 1.0, with 48 bits of entropy. You could multiply by
'max', and cast to int. But this isn't portable and is not fixed by
GNULIB.
Might not be too hard to make drand48_r into a gnulib module (harder
than clz, but not intractible, so I'll take a shot at it, but no
promises). Using drand48_r does still hit non-uniformity due to
rounding, but with much smaller fuzz factor (that is, instead of my
example of 50/25/25, you'd be more like 25.000004, 24.999993,
24.999993), which I can live with as being in the noise. Furthermore,
since it only gives 48 bits of entropy, if 'max' is larger than 1<<47
you'll have to do some legwork yourself (I'm not quite sure what that
legwork involves to still be fair), so I'd feel more comfortable if we
capped this function to uint32_t and require the use of powers-of-2 for
anything larger than a 32-bit max.
I'm wondering if we could emulate drand48_t() ourselves by doing
something like this:
#include <ieee754.h>
double virRandom(void) {
union ieee754_double val;
val.ieee.negative = 0;
val.ieee.exponent = IEEE754_DOUBLE_BIAS;
val.ieee.mantissa0 = virRandomBits(20);
val.ieee.mantissa0 = virRandomBits(32);
return val.d - 1.0;
}
No need to do that. -lm already gives us a very nice function: ldexp().
Basically, you take an integer in the range [0,1<<48), then multiply
that by 2**-48 using ldexp().
--
Eric Blake eblake(a)redhat.com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org