Re: [libvirt] [PATCH v3 00/16] virObject adjustments for common object

28 Jul 2017


      On Thu, Jul 27, 2017 at 04:46:41PM -0400, John Ferlan wrote:
...
On 07/25/2017 05:29 AM, Pavel Hrdina wrote:
...
On Thu, Jun 22, 2017 at 10:02:30AM -0400, John Ferlan wrote:
Let's move the discussion [1] into correct place.
...
Still, I must be missing something. Why is it wrong to create a new
object that would have a specific use? virObjectLockable was created at
some point in history and then used as a way to provide locking for a
virObject that only had a @ref. Someone could still create a virObject
as well and just get @refs. With the new objects based on those two
objects you get a few more features that allow for add, search, lookup,
remove.  But you call that wrong, which is confusing to me.
What's wrong, or better wording, what I don't like personally about this
approach is that in order to have features like add, search, lookup,
remove it would require that the object is derived from
virObjectLookupKeys.  The only thing it adds to the simple virObject
is two variables @uuid and @name.
We don't need the virObjectLookupKeys object at all.
First off - thanks for taking the time to put your thoughts here. It
really helped me understand where I believe you were coming from. There
are some disadvantages to working at home when it comes to sounding
boards or drawing things up on a white board for those around you in an
office to provide/receive face-2-face feedback. Black and white text can
be harsher than intended.
Now that I understand that it's less 'wrong' and more 'not necessary' or
as you put it a personal dislike I can come in off of the ledge ;-).
Still there's a lot of history w/r/t the initial posting where I was
encouraged to use/update Objects vs. doing what I did creating a new
module and essentially renaming objs and objlists. Scaling back and
considering usage of objects sent me down this current path which hasn't
received as much feedback. It may not be 100% correct, but I really was
hoping it wasn't wrong especially since I have working code.
Storing the UUID/Name in an @obj is the means I use to make certain
decisions along the way - especially w/r/t Add/Remove logic which as I
pointed out previously - there's no way to "Add" something without a key
nor "Remove" it; however, the Lookup/Search logic do not need it because
the hash table keeps track of the key and all the find/lookup vir*obj
logic would have the @def->{uuid|name} as input for the search. The
ForEach logic won't use it, although it could (more later).
Storing the keys in @obj removes a few decision points from the caller.
The consumer only cares they've created an object with 1 or 2 keys in
order to allow the objList code "find" those keys and make decisions
based on the key rather than having multiple API's that fetch using the
key name in the API name. Adding to the objlist is just that - adding
and not caring/knowing how many or which tables the object is using.
While it's certainly not necessary to have UUID/Name and a LookupKeys
object, it is convenient and more than just for UUID/Name. The new
object is (will be) used to add bool's (active, persistent, started,
etc.) that are found in multiple objects in order to have a common API
to manage that. It's not required, but something possible. There are
some other 'shared' type pieces of data (configFile, autostartLink,
autostart, etc.) that could also make use of "sharing" data and code if
The shared code for storing a list of object is what we need and want,
however adding this 'shared' data into the listing API is out of scope
of that API and not all objects uses this data.  Yes I can see the
benefit, but IMHO it doesn't belong into that API.
...
so inclined. Of course the primary reason I've adjusted all those
obj->def->X to be def = obj->def and def->X access is that moving the
@def and @newDef inside the object was another initial goal, which needs
to be rethought/reworked as that was considered too generic and possibly
prone to coding errors.
...
For example, current add/remove API is:
==== virObjectLookupHashNew ====
virObjectLookupHashNew(virClassPtr klass,
                       int tableElemsStart);
could be
virObjectListNew(int tableSize,
                 bool useName,
                 bool useUUID);
or
typedef enum {
    VIR_OBJECT_LIST_TABLE_NAME = (1 << 0),
    VIR_OBJECT_LIST_TABLE_UUID = (1 << 1),
} virObjectListTableType;
virObjectListNew(int tableSize,
                 unsigned int flags);
I don't see why the virObjectLookupHashNew() has to have the @klass
parameter and the @tableElemsStart is not correct, it's not the number
of elements, it's has table size.
Well @klass is a parameter for other virObject*New() functions, so why
should virObjectLookupHashNew not have one?  Or asked differently how
would virObjectListNew determine it was derived from some class? Does
That's the difference, the proposed VirObjectList is a final class and
currently there is no need to derive any other class from it.
...
that leave the hash tables in the calling vir*obj.c files or in some
common file, such as vircommonobjlist.c? To me that's just a shim to
virObject for each of the vir*obj.c. I don't like that solution - we
All of the code for virObjectList would be for example in
virobjectlist.c|h files, it would be a separate module that uses
virObject as a building block and also is tailored to contain virObject.
...
shouldn't create a shim that doesn't converge objects, we would take the
plunge and create object code. I also don't like ObjectList if I'm using
a hash table - it's not a representative name. Perhaps I'm missing
something else you had in mind.
I chose the "list" because that's what the API does, it's storing a data
addressed by key without any specific order, usually this type of
data structure is "map" or "dictionary", so yes, the "list" is not the
perfect description, we can use "map" or "dict", but "hash" is one of
the possible implementation of this data structure.
...
@tableElemsStart is the starting size of the hash table that can grow
beyond the starting size. Whether it changes names to @tableSize is just
window dressing to hide the bikes in the shed. I think the reason why I
used a variable was to allow different sizes for different ObjLists.
Yes, the variable should be there, you should be able to specify the
size of hash table, just the name was confusing.
...
There may have been something else but that's long since left the short
term memory. It might be for the clone or backup copy of an ObjList and
wanted to have the original table size in order to create the clone.
Doesn't seem I use it now though, so saving it could be dropped. Heck we
could drop the initial size logic too and give virHashGrow some
exercise. Although I see that has some interesting limitations w/r/t
bucket chain length and larger element count tables where a max of 2048
buckets can be created allowing bucket chains to potentially grow really
long. Perhaps not a problem today with test beds using 100's of objects,
but IIRC there was a storage example using 1000's of volumes if not 10's
of 1000's. Once you get to about 15K objs, then the probability of
longer chains increases (I played around with virhashtest a bit - which
currently only maxes at 250 objs, yawn).
FWIW: I went away from @useName or @useBool type logic because some
tables have UUID only, some have Name only, others have both. For those
with one - there is a waste of space, but if we start with table sizes
of 1, it's minimized. Again, I preferred @primary and @secondary, but
that idea was already squashed. In any case, here's a crudely drawn
table of UUID/Name and the various vir*obj consumers.
UUID    |  Name
____________________|___________
NodeDev     No      |  Yes
____________________|___________
Secret      Yes     |  No [1]
____________________|___________
NWFilter    Maybe[2]|  Yes
____________________|___________
Interface   No      |  Yes
____________________|___________
Network     Yes     |  Yes
____________________|___________
StoragePool Yes     |  Yes
____________________|___________
StorageVol  No      |  Yes
____________________|___________
Domain      Yes     |  Yes
____________________|___________
[1] I do have patches that would make the secret usage_id be a secondary key
[2] Awful piece of history that allows the nwfilter to be defined
without a UUID if saving the UUID to the file fails, thus allowing a
future start to overwrite the UUID of a previous nwfilter.
There's also the possibility that Snapshots could use this, but I don't
think it's as beneficial since "unpredictable ordering" for HashTables
could make things AFU'd and I don't suppose you'd end up with 1000's of
snapshots.
...
==== virObjectLookupHash(Add|Remove) ====
virObjectLookupHashAdd(void *tableobj,
                       virObjectLookupKeysPtr obj);
virObjectLookupHashRemove(void *tableobj,
                          virObjectLookupKeysPtr obj);
The only difference without the virObjectLookupKeys object would be
that the API will take two more parameters to give it the @uuid and
@name.
virObjectListAdd(virObjectListPtr listObj,
                 void *obj,
                 const char *name,
                 const char *uuid);
virObjectListRemove(virObjectListPtr listObj,
                    const char *name,
                    const char *uuid);
or
virObjectListRemoveByName(virObjectListPtr listObj,
                          const char *name);
virObjectListRemoveByUUID(virObjectListPtr listObj,
                          const char *uuid);
Yes, at first it may looks worse, but on the other hand it gives you
better flexibility in the way that the object added/removed from the
list doesn't even have to have the exact same variables and it will
work with every object that is derived from virObject.
Ahhh.. and this is where our fundamental difference lies. You see
obj->uuid and obj->name as unnecessary and I find them useful especially
for the purpose of adding/removing @obj to/from the @listObj.
Agree, for adding/removing you need all keys for all tables, having a
common object with predefined keys makes it easier but also adds a
limitations that you can use the list only for these objects.
...
It seems you'd rather see virObject* API's (or perhaps some new
vircommonobjlist.c) that force the callers to know which call to use
(ByName/ByUUID) and which table's were created for the object. I prefer
having a common API that doesn't need to be that specific, but does the
thing just using an argument.
Yes, that's exactly what I'm proposing, having a separate module, it's
not even an virObject* API.
...
Your characterization of "flexible" is my characterization of "control".
It's not like UUID/Name are going away - they're just being managed in a
lower layer. My view is that the @obj as created during vir*ObjNew is
what decides which tables it will need to be present in. The opposing
view is I have an object, I'm going to place it in this table and
possibly that table and I know where I want to put it and will manage
that from the calling function entirely.
One thing I would like to see changed is *Remove returning status. Not a
a lot of good happens if something during the Remove processing fails
and no one is told.
...
==== virObjectLookupHashFind ====
virObjectLookupHashFind(void *tableobj,
                        bool useUUID,
                        const char *key);
could be split into two APIs:
virObjectListFindByName(virObjectListPtr listObj,
                        const char *name);
virObjectListFindByUUID(virObjectListPtr listObj,
                        const char *UUID);
Which in my opinion is way better than having a single API with a
boolean parameter that switches what type of key it is.
Six of one, half-dozen of the other. The consumer is still going to call
one or the other ByName/ByUUID, but that leaves that decision point
higher up. It's easy enough to make ByName and ByUUID API's, but they
just turn into shims to the common code - so why bother?
Because a bool parameter is not extendable, imagine that we would like
to add another key in future, it would force us to rewrite all the APIs.
...
I still prefer the concept of @primary and @secondary. Consider the
table above - should it matter that a table stores UUID's? or Names? or
just one or two keys?  We made the decision early on during ObjNew what
our keys would be.  After that it's up to calling function to provide
some key that we'll search on. For some consumers which use both tables,
they want to Find in a specific table and that's why you'd have a
boolean to say - don't search primary, search secondary.
How about a different idea. Why not make the hash table count variable
(1, 2, 3, etc.) and let the caller decide how many to create and which
I had the same idea, however for some object it doesn't give us any
benefit to have 4 or 5 hash tables for different keys because the number
of object stored in the tables would be small.  We have to think
about balance between speed and memory consumption.
...
to use. That way the nodedev code removes the 4 or 5 FindBy type API's
and replaces it with 4 or 5 hash tables to allow for lookup by key. The
consumer can then have vir*ObjListFind{ByUUID|ByName}, but that ends up
being a virObjectLookupHashFind(tableobj, tableidx, const char *key)
where the consumer knows tableidx = 0 is UUID, tableidx = 1 is Name, etc.
...
==== virObjectLookupHashForEach ====
virObjectLookupHashForEach(void *tableobj,
                           bool useUUID,
                           virHashIterator callback,
                           virObjectLookupHashForEachDataPtr data);
could be
virObjectListForEach(virObjectListPtr listObj,
                     virHashIterator callback,
                     void *callbackData);
There are two things that I don't like.  There is no need to have the
@useUUID because both tables should contain the same objects if both are
used so this can be hidden from the user and the API will simply use one
of the available tables.  The second issue is that this API forces you
to use some predefined data structure and you cannot create your own
specifically tailored for the task that you are about to do.
The @useUUID argument also applies to virObjectLookupHashSearch().
[NB: written before rereading and writing the above paragraph regarding
configurable number of hash tables]
How do you choose which table? How do you know which data the caller
wants? Refer to the table above - some objects have UUID and some have
Name. Some consumers may even want to return a list of all objects by
UUID using an input parameter and other by Name with a different input
parameter. The *ForEach ends up being a multi-purpose backend for NumOf,
Get{Name|UUID}, and Export functions. Typically the Get{Name|UUID} is
the "primary key" for the object, but I'm just saying...
While I'm typing this response I just realized that the Get{Names|UUIDs}
type API functions wouldn't need vir*obj.c specific *Callback functions
to deref obj->def->{name|uuid} since obj->{uuid|name} exists! I'd still
need some function to determine whether @aclfilter was passed or not for
@def and I'd to know to @useUUID or not. But I could converge things
even more than I have.
...
The usage of the virObjectList* APIs could be something like this:
virObjectListPtr
virInterfaceObjListNew(void) {
    return virObjectListNew(10, VIR_OBJECT_LIST_TABLE_NAME);
}
static virInterfaceObjPtr
virInterfaceObjListFindByName(virObjectListPtr interfaces,
                              const char *name)
{
    return virObjectListFindByName(interfaces, name);
}
Pavel
And here's a perfect example of an object that only lives in one table.
If I call virObjectLookupHashForEach or virObjectLookupHashSearch and
expect to find something in objsUUID, I'm going to be sorely
disappointed. One solution is to have two API's - my solution is one API
with a parameter that dictates.
That's the thing, for APIs ForEach or Search you need to use only one
hash table.  If there is only one, it's easy to pick the correct one, if
there are two or more tables, they should contain the same objects only
mapped by different keys so it doesn't matter which table is used.

Pavel
...
Given the amount of change since this series was first posted, I have to
deal with merge conflicts, the virHashSearch new argument, and using RW
locks for all the objlists. So I'm going to have to repost the patches
again anyway.  At this time, I prefer to leave them as is for the
reasons stated regarding avoiding a shim. If that's the avenue to take,
then I'll let someone else go down that path. I can finish up cleanup
and conversion of all the vir*obj code to use vir*Obj and vir*ObjList,
but converging to a shim is something I'll avoid.
John
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list