On Tue, Aug 15, 2017 at 10:02:38AM -0400, John Ferlan wrote:
On 08/15/2017 08:01 AM, Peter Krempa wrote:
> On Mon, Aug 14, 2017 at 20:46:10 -0400, John Ferlan wrote:
>> On 08/03/2017 04:50 AM, Peter Krempa wrote:
>
> [ trimmed off-topic part ]
>
>> NB: I didn't dig into the qemumonitorjsontest algorithm, but...
>>
>> While going through the common object changes, I ended up looking at and
>> thinking about the hash table algorithms, initially it was just the
>> "grow" algorithm as I think it is a bit "aggressive"
(perhaps wasteful)
>> since as soon as 1 bucket chain exceeds 8 elements, the table is resized
>> by 8 unless the new size is 8 * 2048 - at which time no matter how full
>> a bucket is we don't resize the table
>
> Resizing the table is very expensive, so it makes sense to scale it
> aggresively. At some point though it would take too much memory.
>
> If some place in the code expects massive hash table, it can set the
> hash table to > 2048 manually.
>
Right - avoiding excessive or unnecessary resizing is something I was
considering. One thing that was at the back of my mind is "somewhat
recent" discussions about 10s of thousands of disk/volumes. IIRC it's
been discussed on qemu list as it relates to libguestfs.
Because volume names tend to be less random than a UUID, I was looking
at how the existing algorithms operate when you have say "disk00001"
through "disk99999" w/r/t bucket layout and length of chains when we
reach the maximum bucket count.
For any sensible hash algorithm, such a "predictable" naming convention
is not a realk problem - indeed if it were a problem it would be considered
a security vulnerability these days.
While murmurhash is not a cryptographic hash, a single bit difference
in names should still produce a very different hashcode value. If we
switch to siphash though, which is a cryptographic hash, we will have
a strong guarantee of a completely different hash code for such names.
So again I don't think bucket size is vs primes is important here - the
choice of hash function more directly determines the collision resistance
we have.
Regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|