Re: [libvirt] [GSoC] Abstracting device address allocation

11 Jun 2016

...
Just as a note for now, I won't be saying much about naming stuff.  1)
it could lead us into a rabbit hole that has no way out, 2) it is not
needed for the first RFCs, 3) you'll see how other things will be named
around, so I guess you will feel yourself what the right name will be,
and finally 4) it is *very* likely that it will be done by others and
discussed more than the rest of the code ;)
Thank you :) Naming will be dealt with later.
...
We will have the possibility of adding an address pool to the allocator.
For example, when we create a root PCI bus, we tell the allocator that
the following PCI addresses are possible: {0..0}:{0..31}.{0..7},
where {a..b} means a range from a to b, inclusive.
This function call could look like this:
allocatorAddPool(
 allocator[PCI_ADDRESS_TYPE],
 &outputAddress,
 AllocatorRange(0, 0),
 AllocatorRange(0, 31),
 AllocatorRange(0, 7));
The outputAddress would be an array owned by the caller
(only filled in by the allocator).
If we were to allocate IP addresses for computers from the range
192.168.1.2 to 192.168.1.240, where the router has to have
the address 192.168.1.254:
Even though we won't be assigning IP addresses, it is really nice
analogy to use and explain stuff on since hopefully everyone on the list
knows more or less about the limits and properties of them.
...
First, we reserve the address for the router to let the Allocator know
that it's in use, and that we can track collisions in manually
assigned addresses:
allocatorReserveAddress(
 allocator[IP_ADDRESS_TYPE],
 &outputAddress,
 AllocatorValue(192),
 AllocatorValue(168),
 AllocatorValue(1),
 AllocatorValue(254));
Then, we assign the addresses for the computers:
allocatorReserveAddress(
 allocator[IP_ADDRESS_TYPE],
 &outputAddress,
 AllocatorValue(192),
 AllocatorValue(168),
 AllocatorValue(1),
 AllocatorRange(1, 254));
Please note that AllocatorValue() is now in use.
Since libvirt is pure C, no C++, you need to make sure errors are
caught, esp. allocation errors.  If AllocatorValue() is a function and
it can fail, then this won't work.  But you probably know that, right?
These functions/macros are just for the convenience of using the allocator.
An idea on how AllocatorRange/Value might work as a macro:
Every field (IP has 4 of them) of an address would be provided as a range.
For example, if we wanted to allocate an address from the range
192.168.1.1 - 192.168.1.254 without using any macros, it would be:
AllocatorReserveAddr(…, 192, 192,  168, 168, 1, 1, 1, 254);
but we can use AllocatorValue(X), which would expand to “X, X”,
and AllocatorRange(X, Y), which expands to “X, Y”, so the function
call looks nicer:

allocatorReserveAddress(…,
    AllocatorValue(192),
    allocatorValue(168),
    allocatorValue(1),
    allocatorRange(1, 254));

Another idea is to use a helper struct with two integers inside. This
struct would represent a range.
Then a call to AllocatorValue(192) would return a struct with fields
{192, 192}, and AllocatorRange(1, 254) would return a struct
with fields {1, 254}. AllocatorValue() could be a macro or a function
in this case.

These functions/macros are just for the convenience of using the allocator.
I am open to suggestions though.
...
...
There could be a different function call to simply assign any address:
allocatorReserveAny(Allocator* allocator, &outputAddress);
Let's say that we want an "sda" disk. We could create a wrapper:
allocatorReserveSCSIAddress(allocator, "sda");
All this wrapper does is representing the string "sda" as an integer
mapping to the letter (eg. 'a' = 0, 'z' = 25):
allocatorReserveAddress(allocator[SCSI_ADDRESS_TYPE], &outputAddress, 0);
If an address is already determined, because it was specified in the XML
or it's some specific device that has to be at some specific address,
we still reserve it to let the Allocator know that it's in use.
How would this work internally?
One of the possible solutions is keeping a set of ranges of addresses.
For example, suppose that we have two PCI busses to use. Our address pool
is stored as one range:
{0..1} {0..31} {0..7}
Now someone reserves address 0:13.2
Our new free addresses pool is now stored as this set of ranges:
{0..0} {0..12} {0..7},
{0..0} {12..12} {0..1},
{0..0} {12..12} {3..7},
{0..0} {13..31} {0..7},
{1..1} {0..31} {0..7}
If we kept every address separately, it would require 2*32*8=512 addresses.
The set data structure from gnulib's gl_oset.h is a candidate for keeping
the ranges in a sorted fashion. Another candidate is simply a sorted list
from gl_list.h.
We cannot use those since they are GPL and the library part of libvirt
is LGPL.  We can either create our own or just use whatever there's
provided already.
We have macros and functions for handling arrays; if you want it sorted,
then VIR_INSERT_ELEMENT and others are your friends.  With arrays we
would just keep pointers there and move those which is not that
expensive.  Sparse arrays don't make sense to me in this situation.
You could also convert it to bunch of bytes (either specifying a new
format or just converting them to a string) and use our hash table (see
virHashCreate()).
But I don't think we'll need to handle these, let me explain why a bit
below.
...
This structure would be able to handle all types of addresses that are
convertible to a fixed-length list of integers. We don't mind how many
of these integers there are, because we can use variadic arguments.
It won't allow duplicates if we stick to using it in every place where
we have some addresses.
It will also be very generic, with the possibility of writing wrappers on top
of it that might be more convenient for some particular address type.
This way we would keep qemu_* files as they are for now, and just replace
existing allocators (and manual assignment of addresses) to calls to our
new allocator.
Actually, in order for us to be able to replace the calls we have there,
we would need to go a bit different route.  The one I suggested would
actually look a bit different:
Let's say we would start by moving qemuDomainAssignSpaprVIOAddress() to,
for example, src/conf/domain_addr.c and renaming it to, for example
virDomainDeviceAddressAssignSpaprVIO().  Then we would replace all its
callers and so on.  Every time the function needs to access something in
the private data of the domain (qemuDomainObjPrivate), such data would
be instead moved into the virDomainObj (or virDomainDef.  Whenever qemu-specific
information needs to be passed into that function, it could be converted
to a parameter of that function.  Either some value, flag or some more
complex structure, e.g. allocopt (similarly to xmlopt which we have for
parsing and formating XML).  Slowly we might get to the point where lot
of the code is changed to data that driver has stored either per-domain
or globally.
I started doing it. I moved almost every function that assigns addresses
(apart from PCI addresses, they are the most complex ones) from
qemu_domain_address.c to domain_addr.c.
Please take a look at my GitHub: https://github.com/Darge/libvirt
For an overview of what happened, look at the function
qemuDomainAssignAddresses from qemu_domain_address.c.

What happened:
I changed a lot of function names from qemu* to vir* and moved
them to another files. I also moved the structures
virDomainCCWAddressSetPtr and virDomainVirtioSerialAddrSetPtr
from qemu’s private data to virDomainObj.

I had to move some definitions from domain_addr.h to
domain_conf.h, because the address sets are now
in domainObj, so virDomainObjDispose() has to dispose them.
I am not totally sure that the resources are freed properly,
because their owner changed, but I will deal with it later,
especially that we haven’t even decided whether they should belong
to domainDef or domainObj.

My next step is to make qemuDomainReleaseDeviceAddress
generic and then start doing the same thing with PCI address
assignment functions.

Is that what you meant? Does it look good at first glance?

Thank you very much for the email!

Tomasz
...
...
My ideas:
- A possibility of reserving a range of addresses might be used
to reserve a whole PCI slot at once.
I don't see a use case for that.
...
- Flags could be added, changing the behavior of particular
addresses or the whole allocator.
As above, we'll most probably need that.
...
My questions:
- Is this approach good long-term?
- Is address releasing expected to be required at any point?
(think device hotplug/unplug, etc.)
Every time device is unplugged.  At first I thought it will be only on
hot-unplug, but it needs to be freed whenever the device is removed and
the data will not be recalculated.  And that can happen even on
cold-unplug (virsh detach-device --config).
...
- What other features are needed?
We'll see :)  Let's start with either of the approaches (I would
personally suggest the one I had in mind as I think it goes nicely with
current approaches in libvirt and will have less hurdles along the way.
Think it through and let us know about the next design steps.
...
Please speak up your mind if you have remarks or thoughts about the idea.
I'd really appreciate it.
Tomasz
Have a nice day,
Martin

Re: [libvirt] [GSoC] Abstracting device address allocation

Tomasz Flendrich