On Tue, Oct 27, 2020 at 10:04:33 +0000, Daniel Berrange wrote:
On Tue, Oct 27, 2020 at 10:53:12AM +0100, Peter Krempa wrote:
> On Mon, Oct 26, 2020 at 16:08:34 +0000, Daniel Berrange wrote:
> > On Mon, Oct 26, 2020 at 04:45:50PM +0100, Peter Krempa wrote:
> > > Glib's hash table provides basically the same functionality as our
hash
> > > table.
> > >
> > > In most cases the only thing that remains in the virHash* wrappers is
> > > NULL-checks of '@table' argument as glib's hash functions
don't tolerate
> > > NULL.
> > >
> > > In case of iterators, we adapt the existing API of iterators to glibs to
> > > prevent having rewrite all callers at this point.
> > >
> > > Signed-off-by: Peter Krempa <pkrempa(a)redhat.com>
> > > ---
> > > src/libvirt_private.syms | 4 -
> > > src/util/meson.build | 1 -
> > > src/util/virhash.c | 416 ++++++++++-----------------------------
> > > src/util/virhash.h | 4 +-
> > > src/util/virhashcode.c | 125 ------------
> > > src/util/virhashcode.h | 33 ----
> >
> > Our hash code impl uses Murmurhash which makes some efforts to be
> > robust against malicious inputs triggering collisons, notably using
> > a random seed.
> >
> > The new code uses g_str_hash which is much weaker, and the API
> > docs explicitly recommend against using it if the input can be from
> > an untrusted user.
>
> Yes, I've noticed that, but didn't consider it to be that much of a
> problem as any untrusted input which is stored in a hash table (so that
> the attacker can use crafted keys) must be in the first place
> safeguarded against OOM condition by limiting the input count/size.
The problem isn't OOM, rather it is algorithmic complexity. With malicious
hash collisions the runtime lookup performance degrades to O(n) which can
cause scalability concerns in some cases.
I was pointing out that limiting the input size needed for OOM limit
conveniently limits the size of 'n'.
The worst case for a malicious actor that I can see is the block device
statistics code, where the worst case input would be based on 2 * 10 MiB
of json, where based on 200 bytes per entry you could achieve 100k hash
comparisons.
As noted though, I think we can use the better hash function we have.
The only difference will be probably that the seed will be global and
not per-table since glibs table doesn't support that. If that's not
acceptable we need to keep all the code since glibs hash table's hash
function prototype is:
guint
(*GHashFunc) (gconstpointer key);