On Wed, Oct 14, 2009 at 03:16:19PM +0100, Daniel P. Berrange wrote:
On Wed, Oct 14, 2009 at 04:04:50PM +0200, Daniel Veillard wrote:
> On Wed, Oct 14, 2009 at 11:48:51AM +0100, Daniel P. Berrange wrote:
> So we went from list to array, and now to hash table :-)
Yeah, I very much regret switching from the linked list to
an array. I don't know why I didn't go straight to a hash
table when I changed it last time :-(
point is we switched everything (well nearly) to arrays,
but we are only changing domain lookup to hash table.
> <nitpick>
> I'm afraid the O(1) is an exageration since as the number grows
> you would still need to lock each object on the clash list for
> that hash. But for normal use we should end up in constant time,
> but in theory I think it's still O(n) ...
> </nitpick>
If that ever becomes a problem, which I very much doubt, then
we can easily enhance the hash table impl to automatically add
extra buckets once the number of elements passes some threshold
so you guarentee its always very close to O(1)
The important change here is really removal of the O(n) mutex
locks, whcih will always be O(1), because even if there are
many entries in each hash bucket you're still doing the lookup
based on the key, not the entry. So you'll only ever need to
lock the object you eventually find.
I was nitpicking on the formalism :-) Independantly of O(whatever)
I'm sure it's way faster to jump directly to the hash list.
But I'm afraid we need to check something, looking quickly at the
patch I don't see any locking of the hash table itself during the
lookups, maybe I missed it but there is 2 things I need to raise
- the obvious that another thread may modify the hash table,
I assume it's covered by the new R/W locks but just double
checking
- the less obvious point that the object hash can't be cached
because the hash table code allows to grow as needed (up to
some limit) that's inherited from libxml2 design where I wanted to
keep small hashes for non-complex document. Since UUID are immutable
we could drop that virHashGrow() code, go for the larger from the
start, and avoid recomputing UUID strings and hash everytime
by caching them in the domain object. Crude but might save some
cycle.
> I expect that the gain will come from the fact all lookups
intenally
> are done though the UUID, the Name/ID being only convenience API.
> I'm also wondering about the symetry for other kind of objects,
> we probably don't need this for networks (less objects, less
> operation and operation usually short), you really expect that
> to be a single optimization and just for domains, right ?
I think it will probably be worth doing it for storage pools too.
And once we've done that we might as well do it for networks too
just to be consistent everywhere, but its not critical for now.
Yup, let's implement and evaluate for domains, then we will
see if consistency really matters.
Daniel
--
Daniel Veillard | libxml Gnome XML XSLT toolkit
http://xmlsoft.org/
daniel(a)veillard.com | Rpmfind RPM search engine
http://rpmfind.net/
http://veillard.com/ | virtualization library
http://libvirt.org/