Re: [libvirt] [PATCH v3 00/16] virObject adjustments for common object

28 Jul 2017


      On Fri, Jul 28, 2017 at 09:44:59AM -0400, John Ferlan wrote:
...
On 07/28/2017 04:51 AM, Pavel Hrdina wrote:
...
On Thu, Jul 27, 2017 at 04:46:41PM -0400, John Ferlan wrote:
...
On 07/25/2017 05:29 AM, Pavel Hrdina wrote:
...
On Thu, Jun 22, 2017 at 10:02:30AM -0400, John Ferlan wrote:
Let's move the discussion [1] into correct place.
...
Still, I must be missing something. Why is it wrong to create a new
object that would have a specific use? virObjectLockable was created at
some point in history and then used as a way to provide locking for a
virObject that only had a @ref. Someone could still create a virObject
as well and just get @refs. With the new objects based on those two
objects you get a few more features that allow for add, search, lookup,
remove.  But you call that wrong, which is confusing to me.
What's wrong, or better wording, what I don't like personally about this
approach is that in order to have features like add, search, lookup,
remove it would require that the object is derived from
virObjectLookupKeys.  The only thing it adds to the simple virObject
is two variables @uuid and @name.
We don't need the virObjectLookupKeys object at all.
First off - thanks for taking the time to put your thoughts here. It
really helped me understand where I believe you were coming from. There
are some disadvantages to working at home when it comes to sounding
boards or drawing things up on a white board for those around you in an
office to provide/receive face-2-face feedback. Black and white text can
be harsher than intended.
Now that I understand that it's less 'wrong' and more 'not necessary' or
as you put it a personal dislike I can come in off of the ledge ;-).
Still there's a lot of history w/r/t the initial posting where I was
encouraged to use/update Objects vs. doing what I did creating a new
module and essentially renaming objs and objlists. Scaling back and
considering usage of objects sent me down this current path which hasn't
received as much feedback. It may not be 100% correct, but I really was
hoping it wasn't wrong especially since I have working code.
Storing the UUID/Name in an @obj is the means I use to make certain
decisions along the way - especially w/r/t Add/Remove logic which as I
pointed out previously - there's no way to "Add" something without a key
nor "Remove" it; however, the Lookup/Search logic do not need it because
the hash table keeps track of the key and all the find/lookup vir*obj
logic would have the @def->{uuid|name} as input for the search. The
ForEach logic won't use it, although it could (more later).
Storing the keys in @obj removes a few decision points from the caller.
The consumer only cares they've created an object with 1 or 2 keys in
order to allow the objList code "find" those keys and make decisions
based on the key rather than having multiple API's that fetch using the
key name in the API name. Adding to the objlist is just that - adding
and not caring/knowing how many or which tables the object is using.
While it's certainly not necessary to have UUID/Name and a LookupKeys
object, it is convenient and more than just for UUID/Name. The new
object is (will be) used to add bool's (active, persistent, started,
etc.) that are found in multiple objects in order to have a common API
to manage that. It's not required, but something possible. There are
some other 'shared' type pieces of data (configFile, autostartLink,
autostart, etc.) that could also make use of "sharing" data and code if
The shared code for storing a list of object is what we need and want,
however adding this 'shared' data into the listing API is out of scope
of that API and not all objects uses this data.  Yes I can see the
benefit, but IMHO it doesn't belong into that API.
And I don't believe that's the right solution. Ironically from the
original series when the idea of using polymorphic or inherited objects
was suggested, there seemed to be consensus that this was the correct
approach. Although once implemented and shown to work, now you don't
like it and would prefer some sort of common listing code. I'll stick
with this, update the series and post it again with all that I've
written here a bit more fresh in memory.
I've replied to the original series [1] that I don't like it.

[1] <https://www.redhat.com/archives/libvir-list/2017-February/msg01574.html>
...
...
...
so inclined. Of course the primary reason I've adjusted all those
obj->def->X to be def = obj->def and def->X access is that moving the
@def and @newDef inside the object was another initial goal, which needs
to be rethought/reworked as that was considered too generic and possibly
prone to coding errors.
...
For example, current add/remove API is:
==== virObjectLookupHashNew ====
virObjectLookupHashNew(virClassPtr klass,
                       int tableElemsStart);
could be
virObjectListNew(int tableSize,
                 bool useName,
                 bool useUUID);
or
typedef enum {
    VIR_OBJECT_LIST_TABLE_NAME = (1 << 0),
    VIR_OBJECT_LIST_TABLE_UUID = (1 << 1),
} virObjectListTableType;
virObjectListNew(int tableSize,
                 unsigned int flags);
I don't see why the virObjectLookupHashNew() has to have the @klass
parameter and the @tableElemsStart is not correct, it's not the number
of elements, it's has table size.
Well @klass is a parameter for other virObject*New() functions, so why
should virObjectLookupHashNew not have one?  Or asked differently how
would virObjectListNew determine it was derived from some class? Does
That's the difference, the proposed VirObjectList is a final class and
currently there is no need to derive any other class from it.
...
that leave the hash tables in the calling vir*obj.c files or in some
common file, such as vircommonobjlist.c? To me that's just a shim to
virObject for each of the vir*obj.c. I don't like that solution - we
All of the code for virObjectList would be for example in
virobjectlist.c|h files, it would be a separate module that uses
virObject as a building block and also is tailored to contain virObject.
...
shouldn't create a shim that doesn't converge objects, we would take the
plunge and create object code. I also don't like ObjectList if I'm using
a hash table - it's not a representative name. Perhaps I'm missing
something else you had in mind.
I chose the "list" because that's what the API does, it's storing a data
addressed by key without any specific order, usually this type of
data structure is "map" or "dictionary", so yes, the "list" is not the
perfect description, we can use "map" or "dict", but "hash" is one of
the possible implementation of this data structure.
Lots of agita over a name... For as long as I've been doing this when
someone has an API w/ List or code with @list variables that means some
sort of forward or doubly-link linked list (flink and blink). Under the
covers even the hash table algorithm uses a forward linked list, but it
(mostly) limits the length of the list for any particular bucket to be
of a particular size before the table is made to grow in order to change
the population of the buckets.
A map is something to show you how to get from point A to point Z with
numerous route options through various points along the way that could
be interesting. Although the shortest distance between any 2 points is a
straight line - it's not always the fastest way. I associate the term
                with memory or cpu and has been a 2D/3D type concept. It
allows sharing between things and changing it's size has synchronicity
problems. A dictionary is something on a shelf that keeps everything
alphabetically (and numerically) ordered, but is not great for searches
because there's a higher likelihood of a few of the 62 letter/number
buckets in the english language being overloaded while other buckets
have very few entries. While "hash" isn't best name, it is the
implementation just as much as the current implementation is a linked
list of objects.
Sigh, map has a several meanings and dictionary as well.  In computer
science they both describe a data structure.  For example python uses
dictionary as a data structure of a list of pairs(key, value).
...
BTW: Yes what each of the existing API's returns is a forward linked
list of objects as (in the future) taken from a object containing a hash
table of all objects (such as how the domain code handles things).
...
...
@tableElemsStart is the starting size of the hash table that can grow
beyond the starting size. Whether it changes names to @tableSize is just
window dressing to hide the bikes in the shed. I think the reason why I
used a variable was to allow different sizes for different ObjLists.
Yes, the variable should be there, you should be able to specify the
size of hash table, just the name was confusing.
Again name agita. The reality is - it's just the starting size.  The
@tableSize could change over time. In fact tableSize is even a misnomer.
The libvirt hash table starting size of 10 could actually store up to 80
elements without ever changing size if it was perfectly balanced. Once
I didn't know that our code modifies the size of hash table, in that
case it might be pointless to specify the original size at all.
...
there were more than 8 elements in any one of the original 10 buckets
the table gets resized by a factor of 8 w/ a limitation that the next
size cannot be > "8 * 2048" - taking into account the size is the number
of buckets as opposed to the number of elements, but that's a different
problem and discussion.
...
...
There may have been something else but that's long since left the short
term memory. It might be for the clone or backup copy of an ObjList and
wanted to have the original table size in order to create the clone.
Doesn't seem I use it now though, so saving it could be dropped. Heck we
could drop the initial size logic too and give virHashGrow some
exercise. Although I see that has some interesting limitations w/r/t
bucket chain length and larger element count tables where a max of 2048
buckets can be created allowing bucket chains to potentially grow really
long. Perhaps not a problem today with test beds using 100's of objects,
but IIRC there was a storage example using 1000's of volumes if not 10's
of 1000's. Once you get to about 15K objs, then the probability of
longer chains increases (I played around with virhashtest a bit - which
currently only maxes at 250 objs, yawn).
FWIW: I went away from @useName or @useBool type logic because some
tables have UUID only, some have Name only, others have both. For those
with one - there is a waste of space, but if we start with table sizes
of 1, it's minimized. Again, I preferred @primary and @secondary, but
that idea was already squashed. In any case, here's a crudely drawn
table of UUID/Name and the various vir*obj consumers.
UUID    |  Name
____________________|___________
NodeDev     No      |  Yes
____________________|___________
Secret      Yes     |  No [1]
____________________|___________
NWFilter    Maybe[2]|  Yes
____________________|___________
Interface   No      |  Yes
____________________|___________
Network     Yes     |  Yes
____________________|___________
StoragePool Yes     |  Yes
____________________|___________
StorageVol  No      |  Yes
____________________|___________
Domain      Yes     |  Yes
____________________|___________
[1] I do have patches that would make the secret usage_id be a secondary key
[2] Awful piece of history that allows the nwfilter to be defined
without a UUID if saving the UUID to the file fails, thus allowing a
future start to overwrite the UUID of a previous nwfilter.
There's also the possibility that Snapshots could use this, but I don't
think it's as beneficial since "unpredictable ordering" for HashTables
could make things AFU'd and I don't suppose you'd end up with 1000's of
snapshots.
...
==== virObjectLookupHash(Add|Remove) ====
virObjectLookupHashAdd(void *tableobj,
                       virObjectLookupKeysPtr obj);
virObjectLookupHashRemove(void *tableobj,
                          virObjectLookupKeysPtr obj);
The only difference without the virObjectLookupKeys object would be
that the API will take two more parameters to give it the @uuid and
@name.
virObjectListAdd(virObjectListPtr listObj,
                 void *obj,
                 const char *name,
                 const char *uuid);
virObjectListRemove(virObjectListPtr listObj,
                    const char *name,
                    const char *uuid);
or
virObjectListRemoveByName(virObjectListPtr listObj,
                          const char *name);
virObjectListRemoveByUUID(virObjectListPtr listObj,
                          const char *uuid);
Yes, at first it may looks worse, but on the other hand it gives you
better flexibility in the way that the object added/removed from the
list doesn't even have to have the exact same variables and it will
work with every object that is derived from virObject.
Ahhh.. and this is where our fundamental difference lies. You see
obj->uuid and obj->name as unnecessary and I find them useful especially
for the purpose of adding/removing @obj to/from the @listObj.
Agree, for adding/removing you need all keys for all tables, having a
common object with predefined keys makes it easier but also adds a
limitations that you can use the list only for these objects.
Forcing naming by UUID or by Name is limiting to me. Does it really
matter what the name of a key is to the hash object? No it just needs to
be unique and for our purposes is currently a string comparison. The
fact that the driver/vir*obj's "know" that it's by UUID or by Name is a
higher level abstraction. Still I capitulated and altered my primary and
secondary to use uuid/name and that's now causing other problems. Feels
like a no win situation.
When you add a 3rd key by FOO, then by having API's with By{KEY} names
means you have to add a series of API's instead of modifying the object
handling code to handle that.
In my view, for any current driver/vir*obj that has 1 key (e.g.
secrets), if I one day decide to use a second key (say by usageID), then
all I do is add that key to the virSecretObjNew processing and voila
under the covers the object knows how to handle things by one or two
keys. The driver/vir*obj code then can use LookupByKey instead of Search
for the new key. Similarly, if there were a 3rd key then a few simple
modifications to add a new key to virObjectLookupKeys, modify the
Add/Remove algorithm to handle the 3rd key, and teach the
Find/ForEach/Search API's to handle any one of 3 keys and things should
work. There's probably a couple of other missing details, but to me less
work than new API's represent.
The nodedev code currently has the most multiple key possibilities with
ObjListFind* API's for SysfsPath, Name, WWNs, FabricWWN, Cap, and
SCSIHostByWWNs. Since a key has to exist in all objs to be valid, that
means for nodedev's, the only one that comes close is SysfsPath, but it
doesn't work for the 'computer' object, so that's not an option. Thus
(for now) nodedev is stuck using a single key. Luckily we're guaranteed
some amount of uniqueness by the host/kernel algorithms.
...
...
It seems you'd rather see virObject* API's (or perhaps some new
vircommonobjlist.c) that force the callers to know which call to use
(ByName/ByUUID) and which table's were created for the object. I prefer
having a common API that doesn't need to be that specific, but does the
thing just using an argument.
Yes, that's exactly what I'm proposing, having a separate module, it's
not even an virObject* API.
...
Your characterization of "flexible" is my characterization of "control".
It's not like UUID/Name are going away - they're just being managed in a
lower layer. My view is that the @obj as created during vir*ObjNew is
what decides which tables it will need to be present in. The opposing
view is I have an object, I'm going to place it in this table and
possibly that table and I know where I want to put it and will manage
that from the calling function entirely.
One thing I would like to see changed is *Remove returning status. Not a
a lot of good happens if something during the Remove processing fails
and no one is told.
...
==== virObjectLookupHashFind ====
virObjectLookupHashFind(void *tableobj,
                        bool useUUID,
                        const char *key);
could be split into two APIs:
virObjectListFindByName(virObjectListPtr listObj,
                        const char *name);
virObjectListFindByUUID(virObjectListPtr listObj,
                        const char *UUID);
Which in my opinion is way better than having a single API with a
boolean parameter that switches what type of key it is.
Six of one, half-dozen of the other. The consumer is still going to call
one or the other ByName/ByUUID, but that leaves that decision point
higher up. It's easy enough to make ByName and ByUUID API's, but they
just turn into shims to the common code - so why bother?
Because a bool parameter is not extendable, imagine that we would like
to add another key in future, it would force us to rewrite all the APIs.
So after X amount of years of using UUID/Name do you really believe
there's going to be a 3rd key? Even if there were, changing the @bool to
an @flags is very simple. It's not like we've never done that before and
it's not like we're exposing these API's in such a way that we need to
have a whole new API. It's an internal mechanism.
Still what purpose does a 3rd key ever serve for an object? That
probably means one didn't set unique enough primary and secondary keys.
A tertiary key means searching 3 tables to make sure none of them has
keys being used that the new object is trying to use as it's keyset. I
sincerely doubt there's much use for that. I have a feeling we'd
outright reject someone creating a 3 keys for the simple reason the
API's are designed to handle at most 2 keys which should guarantee
uniqueness as long as you're not forced by API naming scheme to have
certain named keys.
...
...
I still prefer the concept of @primary and @secondary. Consider the
table above - should it matter that a table stores UUID's? or Names? or
just one or two keys?  We made the decision early on during ObjNew what
our keys would be.  After that it's up to calling function to provide
some key that we'll search on. For some consumers which use both tables,
they want to Find in a specific table and that's why you'd have a
boolean to say - don't search primary, search secondary.
How about a different idea. Why not make the hash table count variable
(1, 2, 3, etc.) and let the caller decide how many to create and which
I had the same idea, however for some object it doesn't give us any
benefit to have 4 or 5 hash tables for different keys because the number
of object stored in the tables would be small.  We have to think
about balance between speed and memory consumption.
Right and having more than 2 keys for any object means that object *has*
to have all N keys in order for creation of a tertiary table to exist
and all keys have to be unique between 3 tables. I hadn't fully thought
through that when originally writing.
Using a variable number of hash tables is primarily beneficial for that
memory consumption conundrum for those existing vir*obj that use 1
entry, IOW: Nodedev, Interface, Secret, and Volume.
Of those only nodedev seems to be stuck with 1 key. Secrets could (and
should) add unique_id as a Name to go along with the existing UUID.
Interfaces could add MAC and Volumes could add either Key or Path.
However, in order to do so because previous reviews forced me to use
UUID and Name, that means I either "utilize" UUID for "MAC" or "Key" or
I create a tertiary table just because I have driver/vol*obj that
doesn't conform the "chosen" standard using UUID and Name as the key names.
Does it at least make more sense now why I prefer non deterministic
names for the table key names. I chose primary and secondary, but it
could have been "foo" and "bar" too. As a side note for Interface, IIRC
a virtual MAC can change that means the table key changes which makes
things tricky. So doing so is "outside" my current scope.
I'm not saying that name/uuid is perfect, I just don't like
primary/secondary.
...
...
...
to use. That way the nodedev code removes the 4 or 5 FindBy type API's
and replaces it with 4 or 5 hash tables to allow for lookup by key. The
consumer can then have vir*ObjListFind{ByUUID|ByName}, but that ends up
being a virObjectLookupHashFind(tableobj, tableidx, const char *key)
where the consumer knows tableidx = 0 is UUID, tableidx = 1 is Name, etc.
...
==== virObjectLookupHashForEach ====
virObjectLookupHashForEach(void *tableobj,
                           bool useUUID,
                           virHashIterator callback,
                           virObjectLookupHashForEachDataPtr data);
could be
virObjectListForEach(virObjectListPtr listObj,
                     virHashIterator callback,
                     void *callbackData);
There are two things that I don't like.  There is no need to have the
@useUUID because both tables should contain the same objects if both are
used so this can be hidden from the user and the API will simply use one
of the available tables.  The second issue is that this API forces you
to use some predefined data structure and you cannot create your own
specifically tailored for the task that you are about to do.
The @useUUID argument also applies to virObjectLookupHashSearch().
[NB: written before rereading and writing the above paragraph regarding
configurable number of hash tables]
How do you choose which table? How do you know which data the caller
wants? Refer to the table above - some objects have UUID and some have
Name. Some consumers may even want to return a list of all objects by
UUID using an input parameter and other by Name with a different input
parameter. The *ForEach ends up being a multi-purpose backend for NumOf,
Get{Name|UUID}, and Export functions. Typically the Get{Name|UUID} is
the "primary key" for the object, but I'm just saying...
While I'm typing this response I just realized that the Get{Names|UUIDs}
type API functions wouldn't need vir*obj.c specific *Callback functions
to deref obj->def->{name|uuid} since obj->{uuid|name} exists! I'd still
need some function to determine whether @aclfilter was passed or not for
@def and I'd to know to @useUUID or not. But I could converge things
even more than I have.
...
The usage of the virObjectList* APIs could be something like this:
virObjectListPtr
virInterfaceObjListNew(void) {
    return virObjectListNew(10, VIR_OBJECT_LIST_TABLE_NAME);
}
static virInterfaceObjPtr
virInterfaceObjListFindByName(virObjectListPtr interfaces,
                              const char *name)
{
    return virObjectListFindByName(interfaces, name);
}
Pavel
And here's a perfect example of an object that only lives in one table.
If I call virObjectLookupHashForEach or virObjectLookupHashSearch and
expect to find something in objsUUID, I'm going to be sorely
disappointed. One solution is to have two API's - my solution is one API
with a parameter that dictates.
That's the thing, for APIs ForEach or Search you need to use only one
hash table.  If there is only one, it's easy to pick the correct one, if
there are two or more tables, they should contain the same objects only
mapped by different keys so it doesn't matter which table is used.
Pavel
OK - sure since all objects have to be in both tables *if* there are two
tables, then it's a selection of one table to search. However, since my
concept of @primary and @secondary was discarded - how does one know
which to pick - UUID or Name?  Nodedev is not going to find anything by
UUID. So as another adjustment to the next posting I'll alter the names
to be less deterministic. That way the Find/Search/ForEach algorithms
can always don't need to know whether the upper layer is using one or
two tables.
It's really simple, when you create a new list for nodes you tell the
virObjectListNew that you care only about name and it will internally
allocate only one table for names and the second one will remain empty
(NULL).  In the code for Find/Search/ForEach you can check which table
is NULL and which table exists and use the one that exists.

Pavel
...
John
...
...
Given the amount of change since this series was first posted, I have to
deal with merge conflicts, the virHashSearch new argument, and using RW
locks for all the objlists. So I'm going to have to repost the patches
again anyway.  At this time, I prefer to leave them as is for the
reasons stated regarding avoiding a shim. If that's the avenue to take,
then I'll let someone else go down that path. I can finish up cleanup
and conversion of all the vir*obj code to use vir*Obj and vir*ObjList,
but converging to a shim is something I'll avoid.
John
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list