[libvirt] hash: failed on concurrent iterating.

In function 'virHashForEach/virHashSearch', 'table->iterating' prevent concurrent iterating. But most caller evoke it, after hold a read lock. This will lead the second caller failed to iterate the table. So, the caller should hold a write lock, then iterate the table.

On 04/04/2018 11:10 AM, Zhangzijian wrote:
In function 'virHashForEach/virHashSearch', 'table->iterating' prevent concurrent iterating. But most caller evoke it, after hold a read lock. This will lead the second caller failed to iterate the table. So, the caller should hold a write lock, then iterate the table.
Looks like there's another thread discussing this issue: https://www.redhat.com/archives/libvirt-users/2018-April/msg00004.html Michal

-----Original Message----- From: libvir-list-bounces@redhat.com [mailto:libvir-list-bounces@redhat.com] On Behalf Of Michal Privoznik Sent: Wednesday, April 04, 2018 5:21 PM To: zhangzijian (Cloud); libvir-list@redhat.com Cc: huangyong (Cloud) Subject: Re: [libvirt] hash: failed on concurrent iterating.
On 04/04/2018 11:10 AM, Zhangzijian wrote:
In function 'virHashForEach/virHashSearch', 'table->iterating' prevent concurrent iterating. But most caller evoke it, after hold a read lock. This will lead the second caller failed to iterate the table. So, the caller should hold a write lock, then iterate the table.
Looks like there's another thread discussing this issue:
Yes, the issue is still exist in v4.2.0. Is it applicable to using write lock instead?
https://www.redhat.com/archives/libvirt-users/2018-April/msg00004.html
Michal
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Wed, Apr 04, 2018 at 09:50:29AM +0000, Zhangzijian wrote:
-----Original Message----- From: libvir-list-bounces@redhat.com [mailto:libvir-list-bounces@redhat.com] On Behalf Of Michal Privoznik Sent: Wednesday, April 04, 2018 5:21 PM To: zhangzijian (Cloud); libvir-list@redhat.com Cc: huangyong (Cloud) Subject: Re: [libvirt] hash: failed on concurrent iterating.
On 04/04/2018 11:10 AM, Zhangzijian wrote:
In function 'virHashForEach/virHashSearch', 'table->iterating' prevent concurrent iterating. But most caller evoke it, after hold a read lock. This will lead the second caller failed to iterate the table. So, the caller should hold a write lock, then iterate the table.
Looks like there's another thread discussing this issue:
Yes, the issue is still exist in v4.2.0. Is it applicable to using write lock instead?
I hit this a month ago when working on mediated devices and creating a ton of them. You can't replace it with write lock because it would kill the sole purpose of having RW locks in our lists using hash tables in the first place. IMHO the correct way to fix this is to get rid of the table->iterating flag completely. In order to do that, you need to convert hash tables to lockable objects so that even modules that don't define any lockable abstractions on top of hash tables could benefit from this. There are several caveats like dealing with NWFilter and for example the virHashForEach method which allows calling the 'Remove' callback which of course requires a write lock, however, if you issue a standard query, this could have been (and also should) achieved with a read lock to allow concurrent reading of elements in the table. This would also require basically having a *Locked version of each virHash API because the existing APIs are being called both outside and inside of the module, so you need to take into account that you might already be holding a lock, etc. Erik

On Wed, Apr 04, 2018 at 12:37:11PM +0200, Erik Skultety wrote:
On Wed, Apr 04, 2018 at 09:50:29AM +0000, Zhangzijian wrote:
-----Original Message----- From: libvir-list-bounces@redhat.com [mailto:libvir-list-bounces@redhat.com] On Behalf Of Michal Privoznik Sent: Wednesday, April 04, 2018 5:21 PM To: zhangzijian (Cloud); libvir-list@redhat.com Cc: huangyong (Cloud) Subject: Re: [libvirt] hash: failed on concurrent iterating.
On 04/04/2018 11:10 AM, Zhangzijian wrote:
In function 'virHashForEach/virHashSearch', 'table->iterating' prevent concurrent iterating. But most caller evoke it, after hold a read lock. This will lead the second caller failed to iterate the table. So, the caller should hold a write lock, then iterate the table.
Looks like there's another thread discussing this issue:
Yes, the issue is still exist in v4.2.0. Is it applicable to using write lock instead?
I hit this a month ago when working on mediated devices and creating a ton of them. You can't replace it with write lock because it would kill the sole purpose of having RW locks in our lists using hash tables in the first place. IMHO the correct way to fix this is to get rid of the table->iterating flag completely. In order to do that, you need to convert hash tables to lockable objects so that even modules that don't define any lockable abstractions on top of hash tables could benefit from this. There are several caveats like dealing with NWFilter and for example the virHashForEach method which allows calling the 'Remove' callback which of course requires a write lock, however, if you issue a standard query, this could have been (and also should) achieved with a read lock to allow concurrent reading of elements in the table. This would also require basically having a *Locked version of each virHash API because the existing APIs are being called both outside and inside of the module, so you need to take into account that you might already be holding a lock, etc.
There are very few places in the code which actually need to be able to call Remove from within an iterator. We should just change the way that works to have a virHashTableForEachRemove() where the function called by the iterator returns 0 or 1 to indicate whether the currently pointed entry should be removed. Then we can just kill the iterating flag from the code Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Wed, Apr 04, 2018 at 12:05:34PM +0100, Daniel P. Berrangé wrote:
On Wed, Apr 04, 2018 at 12:37:11PM +0200, Erik Skultety wrote:
On Wed, Apr 04, 2018 at 09:50:29AM +0000, Zhangzijian wrote:
-----Original Message----- From: libvir-list-bounces@redhat.com [mailto:libvir-list-bounces@redhat.com] On Behalf Of Michal Privoznik Sent: Wednesday, April 04, 2018 5:21 PM To: zhangzijian (Cloud); libvir-list@redhat.com Cc: huangyong (Cloud) Subject: Re: [libvirt] hash: failed on concurrent iterating.
On 04/04/2018 11:10 AM, Zhangzijian wrote:
In function 'virHashForEach/virHashSearch', 'table->iterating' prevent concurrent iterating. But most caller evoke it, after hold a read lock. This will lead the second caller failed to iterate the table. So, the caller should hold a write lock, then iterate the table.
Looks like there's another thread discussing this issue:
Yes, the issue is still exist in v4.2.0. Is it applicable to using write lock instead?
I hit this a month ago when working on mediated devices and creating a ton of them. You can't replace it with write lock because it would kill the sole purpose of having RW locks in our lists using hash tables in the first place. IMHO the correct way to fix this is to get rid of the table->iterating flag completely. In order to do that, you need to convert hash tables to lockable objects so that even modules that don't define any lockable abstractions on top of hash tables could benefit from this. There are several caveats like dealing with NWFilter and for example the virHashForEach method which allows calling the 'Remove' callback which of course requires a write lock, however, if you issue a standard query, this could have been (and also should) achieved with a read lock to allow concurrent reading of elements in the table. This would also require basically having a *Locked version of each virHash API because the existing APIs are being called both outside and inside of the module, so you need to take into account that you might already be holding a lock, etc.
There are very few places in the code which actually need to be able to call Remove from within an iterator. We should just change the way that works to have a virHashTableForEachRemove() where the function called
We already have such a method - virHashRemoveSet. What you propose could work, but I still feel like, eventually, it should be the hash table object that implements the locking primitives rather than the abstractions we build on top of it. But for the time being, we can approach this with small steps. Erik

On 04/04/2018 11:50 AM, Zhangzijian wrote:
-----Original Message----- From: libvir-list-bounces@redhat.com [mailto:libvir-list-bounces@redhat.com] On Behalf Of Michal Privoznik Sent: Wednesday, April 04, 2018 5:21 PM To: zhangzijian (Cloud); libvir-list@redhat.com Cc: huangyong (Cloud) Subject: Re: [libvirt] hash: failed on concurrent iterating.
On 04/04/2018 11:10 AM, Zhangzijian wrote:
In function 'virHashForEach/virHashSearch', 'table->iterating' prevent concurrent iterating. But most caller evoke it, after hold a read lock. This will lead the second caller failed to iterate the table. So, the caller should hold a write lock, then iterate the table.
Looks like there's another thread discussing this issue:
Yes, the issue is still exist in v4.2.0. Is it applicable to using write lock instead?
Do you have a stack trace to share? Michal
participants (4)
-
Daniel P. Berrangé
-
Erik Skultety
-
Michal Privoznik
-
Zhangzijian