Re: [libvirt] [PATCH 3/3] tests: deterministichash: Make hash tables arch-independent

15 Aug 2017


      On 08/15/2017 02:01 PM, Peter Krempa wrote:
...
On Mon, Aug 14, 2017 at 20:46:10 -0400, John Ferlan wrote:
...
On 08/03/2017 04:50 AM, Peter Krempa wrote:
[ trimmed off-topic part ]
...
NB: I didn't dig into the qemumonitorjsontest algorithm, but...
While going through the common object changes, I ended up looking at and
thinking about the hash table algorithms, initially it was just the
"grow" algorithm as I think it is a bit "aggressive" (perhaps wasteful)
since as soon as 1 bucket chain exceeds 8 elements, the table is resized
by 8 unless the new size is 8 * 2048 - at which time no matter how full
a bucket is we don't resize the table
Resizing the table is very expensive, so it makes sense to scale it
aggresively. At some point though it would take too much memory.
If some place in the code expects massive hash table, it can set the
hash table to > 2048 manually.
...
The next thing I considered was using a prime number as the table bucket
size and it seems by just doing that will cause qemumonitorjsontest to
What would be the benefit of such change? I don't really see how prime
numbers would improve anything performance-wise.
Because if your hash function is stupid it can cluster all the keys
together. Primes, however, have way fewer divisors, therefore clustering
is harder to achieve. Of course, you can construct a hash function so
that it's utterly stupid (e.g. {return 4;}) and then all of our
optimizations are thrown out of the window. But I believe that using
prime numbers for table size is advised in the literature too.
Apparently, wiki mentions this fact too:

https://en.wikipedia.org/wiki/Hash_table#Choosing_a_hash_function

Michal

Re: [libvirt] [PATCH 3/3] tests: deterministichash: Make hash tables arch-independent

Michal Privoznik