Re: [PATCH 10/13] util: hash: Reimplement virHashTable using GHashTable

Tuesday, 27 October 2020

On Tue, Oct 27, 2020 at 10:04:33 +0000, Daniel Berrange wrote:
...
 On Tue, Oct 27, 2020 at 10:53:12AM +0100, Peter Krempa wrote:
 > On Mon, Oct 26, 2020 at 16:08:34 +0000, Daniel Berrange wrote:
 > > On Mon, Oct 26, 2020 at 04:45:50PM +0100, Peter Krempa wrote:
 > > > Glib's hash table provides basically the same functionality as our
hash
 > > > table.
 > > > 
 > > > In most cases the only thing that remains in the virHash* wrappers is
 > > > NULL-checks of '@table' argument as glib's hash functions
don't tolerate
 > > > NULL.
 > > > 
 > > > In case of iterators, we adapt the existing API of iterators to glibs to
 > > > prevent having rewrite all callers at this point.
 > > > 
 > > > Signed-off-by: Peter Krempa <pkrempa(a)redhat.com&gt;
 > > > ---
 > > >  src/libvirt_private.syms |   4 -
 > > >  src/util/meson.build     |   1 -
 > > >  src/util/virhash.c       | 416 ++++++++++-----------------------------
 > > >  src/util/virhash.h       |   4 +-
 > > >  src/util/virhashcode.c   | 125 ------------
 > > >  src/util/virhashcode.h   |  33 ----
 > > 
 > > Our hash code impl uses Murmurhash which makes some efforts to be
 > > robust against malicious inputs triggering collisons, notably using
 > > a random seed.
 > > 
 > > The new code uses  g_str_hash which is much weaker, and the API
 > > docs explicitly recommend against using it if the input can be from
 > > an untrusted user.
 > 
 > Yes, I've noticed that, but didn't consider it to be that much of a
 > problem as any untrusted input which is stored in a hash table (so that
 > the attacker can use crafted keys) must be in the first place
 > safeguarded against OOM condition by limiting the input count/size.

 The problem isn't OOM, rather it is algorithmic complexity. With malicious
 hash collisions the runtime lookup performance degrades to O(n) which can
 cause scalability concerns in some cases. 
I was pointing out that limiting the input size needed for OOM limit
conveniently limits the size of 'n'.

The worst case for a malicious actor that I can see is the block device
statistics code, where the worst case input would be based on 2 * 10 MiB
of json, where based on 200 bytes per entry you could achieve 100k hash
comparisons.

As noted though, I think we can use the better hash function we have.

The only difference will be probably that the seed will be global and
not per-table since glibs table doesn't support that. If that's not
acceptable we need to keep all the code since glibs hash table's hash
function prototype is:

guint
(*GHashFunc) (gconstpointer key);

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [PATCH 10/13] util: hash: Reimplement virHashTable using GHashTable