On Wed, Jun 08, 2016 at 08:04:59PM +0200, Tomasz Flendrich wrote:
Hello everyone,
Let me introduce myself - I'm Tomasz Flendrich and I'm a Google Summer
of Code student from University of Wrocław, Poland.
My goal is to create a generic address allocator (for PCI, virtio,
SCSI, etc.), because current handling of addresses has its flaws:
sometimes addresses aren't properly assigned and checked for duplicates.
Additionally, a generic solution will be very useful when more hypervisors
add the feature of explicitly stating the addresses of devices.
The key goal I'm willing to achieve at this moment is defining
a minimum viable product (one thing to be implemented quickly
which is reasonably complete on its own, and has potential for further
development in the direction which makes sense).
Hi,
sorry for replying a bit later. We talked about the internals together
quite a lot and it's good to see the design upstream on the list so that
everybody can express their opinions.
I came up with an initial design. The internal allocator's data
will be kept in struct Allocator. There will be one structure
for one address type and all of the allocators would be in an array.
Just as a note for now, I won't be saying much about naming stuff. 1)
it could lead us into a rabbit hole that has no way out, 2) it is not
needed for the first RFCs, 3) you'll see how other things will be named
around, so I guess you will feel yourself what the right name will be,
and finally 4) it is *very* likely that it will be done by others and
discussed more than the rest of the code ;)
We will have the possibility of adding an address pool to the
allocator.
For example, when we create a root PCI bus, we tell the allocator that
the following PCI addresses are possible: {0..0}:{0..31}.{0..7},
where {a..b} means a range from a to b, inclusive.
This function call could look like this:
allocatorAddPool(
allocator[PCI_ADDRESS_TYPE],
&outputAddress,
AllocatorRange(0, 0),
AllocatorRange(0, 31),
AllocatorRange(0, 7));
The outputAddress would be an array owned by the caller
(only filled in by the allocator).
If we were to allocate IP addresses for computers from the range
192.168.1.2 to 192.168.1.240, where the router has to have
the address 192.168.1.254:
Even though we won't be assigning IP addresses, it is really nice
analogy to use and explain stuff on since hopefully everyone on the list
knows more or less about the limits and properties of them.
First, we reserve the address for the router to let the Allocator
know
that it's in use, and that we can track collisions in manually
assigned addresses:
allocatorReserveAddress(
allocator[IP_ADDRESS_TYPE],
&outputAddress,
AllocatorValue(192),
AllocatorValue(168),
AllocatorValue(1),
AllocatorValue(254));
Then, we assign the addresses for the computers:
allocatorReserveAddress(
allocator[IP_ADDRESS_TYPE],
&outputAddress,
AllocatorValue(192),
AllocatorValue(168),
AllocatorValue(1),
AllocatorRange(1, 254));
Please note that AllocatorValue() is now in use.
Since libvirt is pure C, no C++, you need to make sure errors are
caught, esp. allocation errors. If AllocatorValue() is a function and
it can fail, then this won't work. But you probably know that, right?
There could be a different function call to simply assign any
address:
allocatorReserveAny(Allocator* allocator, &outputAddress);
Let's say that we want an "sda" disk. We could create a wrapper:
allocatorReserveSCSIAddress(allocator, "sda");
All this wrapper does is representing the string "sda" as an integer
mapping to the letter (eg. 'a' = 0, 'z' = 25):
allocatorReserveAddress(allocator[SCSI_ADDRESS_TYPE], &outputAddress, 0);
If an address is already determined, because it was specified in the XML
or it's some specific device that has to be at some specific address,
we still reserve it to let the Allocator know that it's in use.
How would this work internally?
One of the possible solutions is keeping a set of ranges of addresses.
For example, suppose that we have two PCI busses to use. Our address pool
is stored as one range:
{0..1} {0..31} {0..7}
Now someone reserves address 0:13.2
Our new free addresses pool is now stored as this set of ranges:
{0..0} {0..12} {0..7},
{0..0} {12..12} {0..1},
{0..0} {12..12} {3..7},
{0..0} {13..31} {0..7},
{1..1} {0..31} {0..7}
If we kept every address separately, it would require 2*32*8=512 addresses.
The set data structure from gnulib's gl_oset.h is a candidate for keeping
the ranges in a sorted fashion. Another candidate is simply a sorted list
from gl_list.h.
We cannot use those since they are GPL and the library part of libvirt
is LGPL. We can either create our own or just use whatever there's
provided already.
We have macros and functions for handling arrays; if you want it sorted,
then VIR_INSERT_ELEMENT and others are your friends. With arrays we
would just keep pointers there and move those which is not that
expensive. Sparse arrays don't make sense to me in this situation.
You could also convert it to bunch of bytes (either specifying a new
format or just converting them to a string) and use our hash table (see
virHashCreate()).
But I don't think we'll need to handle these, let me explain why a bit
below.
This structure would be able to handle all types of addresses that
are
convertible to a fixed-length list of integers. We don't mind how many
of these integers there are, because we can use variadic arguments.
It won't allow duplicates if we stick to using it in every place where
we have some addresses.
It will also be very generic, with the possibility of writing wrappers on top
of it that might be more convenient for some particular address type.
This way we would keep qemu_* files as they are for now, and just replace
existing allocators (and manual assignment of addresses) to calls to our
new allocator.
Actually, in order for us to be able to replace the calls we have there,
we would need to go a bit different route. The one I suggested would
actually look a bit different:
Let's say we would start by moving qemuDomainAssignSpaprVIOAddress() to,
for example, src/conf/domain_addr.c and renaming it to, for example
virDomainDeviceAddressAssignSpaprVIO(). Then we would replace all its
callers and so on. Every time the function needs to access something in
the private data of the domain (qemuDomainObjPrivate), such data would
be instead moved into the virDomainObj (or virDomainDef. Whenever qemu-specific
information needs to be passed into that function, it could be converted
to a parameter of that function. Either some value, flag or some more
complex structure, e.g. allocopt (similarly to xmlopt which we have for
parsing and formating XML). Slowly we might get to the point where lot
of the code is changed to data that driver has stored either per-domain
or globally.
My ideas:
- A possibility of reserving a range of addresses might be used
to reserve a whole PCI slot at once.
I don't see a use case for that.
- Flags could be added, changing the behavior of particular
addresses or the whole allocator.
As above, we'll most probably need that.
My questions:
- Is this approach good long-term?
- Is address releasing expected to be required at any point?
(think device hotplug/unplug, etc.)
Every time device is unplugged. At first I thought it will be only on
hot-unplug, but it needs to be freed whenever the device is removed and
the data will not be recalculated. And that can happen even on
cold-unplug (virsh detach-device --config).
- What other features are needed?
We'll see :) Let's start with either of the approaches (I would
personally suggest the one I had in mind as I think it goes nicely with
current approaches in libvirt and will have less hurdles along the way.
Think it through and let us know about the next design steps.
Please speak up your mind if you have remarks or thoughts about the
idea.
I'd really appreciate it.
Tomasz
Have a nice day,
Martin