Re: [PATCH 10/13] util: hash: Reimplement virHashTable using GHashTable

27 Oct 2020


      On Tue, Oct 27, 2020 at 01:09:38PM +0100, Peter Krempa wrote:
...
On Tue, Oct 27, 2020 at 10:04:33 +0000, Daniel Berrange wrote:
...
On Tue, Oct 27, 2020 at 10:53:12AM +0100, Peter Krempa wrote:
...
On Mon, Oct 26, 2020 at 16:08:34 +0000, Daniel Berrange wrote:
...
On Mon, Oct 26, 2020 at 04:45:50PM +0100, Peter Krempa wrote:
...
Glib's hash table provides basically the same functionality as our hash
table.
In most cases the only thing that remains in the virHash* wrappers is
NULL-checks of '@table' argument as glib's hash functions don't tolerate
NULL.
In case of iterators, we adapt the existing API of iterators to glibs to
prevent having rewrite all callers at this point.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
---
 src/libvirt_private.syms |   4 -
 src/util/meson.build     |   1 -
 src/util/virhash.c       | 416 ++++++++++-----------------------------
 src/util/virhash.h       |   4 +-
 src/util/virhashcode.c   | 125 ------------
 src/util/virhashcode.h   |  33 ----
Our hash code impl uses Murmurhash which makes some efforts to be
robust against malicious inputs triggering collisons, notably using
a random seed.
The new code uses  g_str_hash which is much weaker, and the API
docs explicitly recommend against using it if the input can be from
an untrusted user.
Yes, I've noticed that, but didn't consider it to be that much of a
problem as any untrusted input which is stored in a hash table (so that
the attacker can use crafted keys) must be in the first place
safeguarded against OOM condition by limiting the input count/size.
The problem isn't OOM, rather it is algorithmic complexity. With malicious
hash collisions the runtime lookup performance degrades to O(n) which can
cause scalability concerns in some cases.
I was pointing out that limiting the input size needed for OOM limit
conveniently limits the size of 'n'.
The worst case for a malicious actor that I can see is the block device
statistics code, where the worst case input would be based on 2 * 10 MiB
of json, where based on 200 bytes per entry you could achieve 100k hash
comparisons.
As noted though, I think we can use the better hash function we have.
The only difference will be probably that the seed will be global and
not per-table since glibs table doesn't support that. If that's not
acceptable we need to keep all the code since glibs hash table's hash
function prototype is:
A one-time seed is likely fine.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH 10/13] util: hash: Reimplement virHashTable using GHashTable

Daniel P. Berrangé