[libvirt] [PATCH] util: hash: Append to hash buckets when adding new entries

In cases when the hash function for a name collides with other entry already in the hash we prepend to the bucket. This creates a 'stack effect' on the buckets if we then iterate through the hash. Normally this is not a problem, but in tests we want deterministic results. Since it does not matter where we add the entry and it's usually more probable that a different entry will be accessed next change it to append to the end of the bucket. Luckily we already iterate throught the bucket once thus we can easily find the last entry and just connect the new entry after it. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/util/virhash.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/util/virhash.c b/src/util/virhash.c index 4907c1124f..0e30106041 100644 --- a/src/util/virhash.c +++ b/src/util/virhash.c @@ -316,6 +316,7 @@ virHashAddOrUpdateEntry(virHashTablePtr table, const void *name, { size_t key, len = 0; virHashEntryPtr entry; + virHashEntryPtr last = NULL; void *new_name; if ((table == NULL) || (name == NULL)) @@ -337,6 +338,7 @@ virHashAddOrUpdateEntry(virHashTablePtr table, const void *name, return -1; } } + last = entry; len++; } @@ -347,8 +349,11 @@ virHashAddOrUpdateEntry(virHashTablePtr table, const void *name, entry->name = new_name; entry->payload = userdata; - entry->next = table->table[key]; - table->table[key] = entry; + + if (last) + last->next = entry; + else + table->table[key] = entry; table->nbElems++; -- 2.20.1

On Tue, Apr 16, 2019 at 04:09:54PM +0200, Peter Krempa wrote:
In cases when the hash function for a name collides with other entry already in the hash we prepend to the bucket. This creates a 'stack effect' on the buckets if we then iterate through the hash. Normally this is not a problem, but in tests we want deterministic results.
Since it does not matter where we add the entry and it's usually more probable that a different entry will be accessed next change it to append to the end of the bucket. Luckily we already iterate throught the bucket once thus we can easily find the last entry and just connect the new entry after it.
I'm not understanding how this change makes the tests any more deterministic than before. The determinism for hash bucket placement is already ensured by having a non-random impl forvirHashCodeGen in tests/virdeterministichashmock.c This change simply reverses the order in which we iterate over values when many keys hash to the same bucket. If this mattered for tests then surely, this change would have to update the test suite too to take account of the new reversed ordering ? Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Tue, Apr 16, 2019 at 15:17:31 +0100, Daniel Berrange wrote:
On Tue, Apr 16, 2019 at 04:09:54PM +0200, Peter Krempa wrote:
In cases when the hash function for a name collides with other entry already in the hash we prepend to the bucket. This creates a 'stack effect' on the buckets if we then iterate through the hash. Normally this is not a problem, but in tests we want deterministic results.
Since it does not matter where we add the entry and it's usually more probable that a different entry will be accessed next change it to append to the end of the bucket. Luckily we already iterate throught the bucket once thus we can easily find the last entry and just connect the new entry after it.
I'm not understanding how this change makes the tests any more deterministic than before.
The determinism for hash bucket placement is already ensured by having a non-random impl forvirHashCodeGen in tests/virdeterministichashmock.c
This change simply reverses the order in which we iterate over values when many keys hash to the same bucket. If this mattered for tests then surely, this change would have to update the test suite too to take account of the new reversed ordering ?
If you have a hash collision when adding into the hash and then use virHashForeach the items are formatted in opposite order. Thus if you use this to e.g. store things parsed from XML and then format it back, every single iteration reverses the order. This is not a problem per-se since we can use a different output file, but it's inconvenient since we can't use the same output file. Currently there aren't any collisions which would cause problems, but I have one on my private branch. Given that this change is rather simple I figured we could just stop inverting them.

On Tue, Apr 16, 2019 at 04:26:49PM +0200, Peter Krempa wrote:
On Tue, Apr 16, 2019 at 15:17:31 +0100, Daniel Berrange wrote:
On Tue, Apr 16, 2019 at 04:09:54PM +0200, Peter Krempa wrote:
In cases when the hash function for a name collides with other entry already in the hash we prepend to the bucket. This creates a 'stack effect' on the buckets if we then iterate through the hash. Normally this is not a problem, but in tests we want deterministic results.
Since it does not matter where we add the entry and it's usually more probable that a different entry will be accessed next change it to append to the end of the bucket. Luckily we already iterate throught the bucket once thus we can easily find the last entry and just connect the new entry after it.
I'm not understanding how this change makes the tests any more deterministic than before.
The determinism for hash bucket placement is already ensured by having a non-random impl forvirHashCodeGen in tests/virdeterministichashmock.c
This change simply reverses the order in which we iterate over values when many keys hash to the same bucket. If this mattered for tests then surely, this change would have to update the test suite too to take account of the new reversed ordering ?
If you have a hash collision when adding into the hash and then use virHashForeach the items are formatted in opposite order.
Thus if you use this to e.g. store things parsed from XML and then format it back, every single iteration reverses the order. This is not a problem per-se since we can use a different output file, but it's inconvenient since we can't use the same output file.
Currently there aren't any collisions which would cause problems, but I have one on my private branch.
Given that this change is rather simple I figured we could just stop inverting them.
Ok, that does make a little more sense, though generally my view would be that if we're doing something that is order sensitive like XML parsing & formatting, then we shouldn't be using a hash table to represent the data - at least not as the primary record. Which bit of code is using the hash in this case ? Is it something purely inside the test suite, or in the real XML handling code ? Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Tue, Apr 16, 2019 at 15:32:31 +0100, Daniel Berrange wrote:
On Tue, Apr 16, 2019 at 04:26:49PM +0200, Peter Krempa wrote:
On Tue, Apr 16, 2019 at 15:17:31 +0100, Daniel Berrange wrote:
On Tue, Apr 16, 2019 at 04:09:54PM +0200, Peter Krempa wrote:
[...]
Currently there aren't any collisions which would cause problems, but I have one on my private branch.
Given that this change is rather simple I figured we could just stop inverting them.
Ok, that does make a little more sense, though generally my view would be that if we're doing something that is order sensitive like XML parsing & formatting, then we shouldn't be using a hash table to represent the data - at least not as the primary record.
Which bit of code is using the hash in this case ? Is it something purely inside the test suite, or in the real XML handling code ?
I'm adding some private data for blockjobs which are formatted into the status XML. I'm storing them in a hash table as the order isn't really important here and the user should never see this. I was just bothered by the fact that I'd need to have an output file (or change the name of the blockjob) and figured that the insertion into the hash table is very simple and does not really matter how we do it.

On Tue, Apr 16, 2019 at 04:41:57PM +0200, Peter Krempa wrote:
On Tue, Apr 16, 2019 at 15:32:31 +0100, Daniel Berrange wrote:
On Tue, Apr 16, 2019 at 04:26:49PM +0200, Peter Krempa wrote:
On Tue, Apr 16, 2019 at 15:17:31 +0100, Daniel Berrange wrote:
On Tue, Apr 16, 2019 at 04:09:54PM +0200, Peter Krempa wrote:
[...]
Currently there aren't any collisions which would cause problems, but I have one on my private branch.
Given that this change is rather simple I figured we could just stop inverting them.
Ok, that does make a little more sense, though generally my view would be that if we're doing something that is order sensitive like XML parsing & formatting, then we shouldn't be using a hash table to represent the data - at least not as the primary record.
Which bit of code is using the hash in this case ? Is it something purely inside the test suite, or in the real XML handling code ?
I'm adding some private data for blockjobs which are formatted into the status XML. I'm storing them in a hash table as the order isn't really important here and the user should never see this.
I was just bothered by the fact that I'd need to have an output file (or change the name of the blockjob) and figured that the insertion into the hash table is very simple and does not really matter how we do it.
ok, that makes more sense now, thanks for explaining. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Tue, Apr 16, 2019 at 04:09:54PM +0200, Peter Krempa wrote:
In cases when the hash function for a name collides with other entry already in the hash we prepend to the bucket. This creates a 'stack effect' on the buckets if we then iterate through the hash. Normally this is not a problem, but in tests we want deterministic results.
Since it does not matter where we add the entry and it's usually more probable that a different entry will be accessed next change it to append to the end of the bucket. Luckily we already iterate throught the bucket once thus we can easily find the last entry and just connect the new entry after it.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/util/virhash.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (2)
-
Daniel P. Berrangé
-
Peter Krempa