[libvirt] [GSoC] Abstracting device address allocation

9 Jun 2016

Thank you for your input, I really appreciate it.

My goal was to define a data structure with a minimal set of features that would make sense and could be further developed.

> I know of a few problems with PCI address allocation, but abstracting the data structure and functions used for allocation aren't going to solve them (for example, if you attach a device that uses PCI with "virsh attach-device .... --live", then then add another with "virsh attach-device --live --config", the new device will be assigned a different address in the persistent config than in the live domain (the result will be that the next time you shutdown and restart, the device will appear at a different address, which could confuse the guest OS). Another (which I need to fix *very* soon) is that even when a device is classified as connecting to PCIe slots rather than PCI (and there aren't enough of those right now, which is *yet another* problem I need to fix), if there aren't any empty PCIe slots available to assign, we'll fail rather than adding a new controller of the appropriate type and attached to the appropriate existing controller). But, as I said, a new data structure isn't going to solve any of those problems (and I'm not sure it will make the solution any easier to get to either).
That’s right, the structure itself won’t solve these. Using it might make address assignment more convenient though, because you will be using a uniform interface for dealing with all addresses:
1. There will be no need to use virDomainCCWAddressAsString() because you are doing a hashtable lookup. You wouldn’t have to know if there is a hashtable.
2. You also wouldn’t have to iterate through controllers every time you want to assign one address, which happens in virDomainVirtioSerialAddrNext. You would just iterate through all the controllers once and add the addresses that they supply to the pool of available addresses. Then if you need an address for a device, just ask for one and the allocator will deal with it.

> PCI addresses are likely much more complicated than you've planned for this.
I spent a lot of time reading about the PCI addresses in libvirt and I am somewhat familiar with their complexity. I was trying to define a minimum viable product, so this data structure was stripped out of features. Let me elaborate how it could be expanded below.

> For starters, each PCI bus can have a different number of slots available (the 2nd dimension, which you list as {0..31}, has different start & end indexes depending on the type of controller that provides that bus. (e.g. pci-root, pcie-root, and pci-bridge have 31 slots from 1-31, dmi-to-pci-bridge and pcie-switch-upstream-port have 32 slots from 0-31, pcie-root-port and pcie-switch-downstream-port each have a single slot at 0,),
I might have not explained myself sufficiently on how I see this data structure in use.
This data structure would keep *all* the available addresses of a given type, e.g. one allocator for all the PCI addresses.

Initially, the pool of possible addresses is empty.
Then, you can iterate through the buses and if you encounter a PCI bus, insert its addresses that can be used (slots and functions) into the allocator.
Let’s say we want to do it with a pcie-root-port that has only one slot that has 7 functions and its index is 0:
allocatorAddPool(allocator[PCIE_ADDRESS], AllocatorValue(0), AllocatorValue(1), AllocatorRange(0, 7));

If you decide that your machine also has a pci-root with 31 slots from 1 to 31 (each with 7 functions) and its index is 0, do this:
allocatorAddPool(allocator[PCI_ADDRESS], AllocatorValue(0), AllocatorRange(1, 31), AllocatorRange(0, 7));

If you also add (on index 5) a new type of pci bus that has 32 slots (from 0 to 31) and the first 16 slots have 8 functions, but the remaming slots have 3 functions only, you can use the allocatorAddPool() function twice:
allocatorAddPool(allocator[PCI_ADDRESS), AllocatorValue(5), AllocatorRange(0, 15), AllocatorRange(0, 7));
allocatorAddPool(allocator[PCI_ADDRESS), AllocatorValue(5), AllocatorRange(16, 31), AllocatorRange(0, 2));

After you do all of these on a single struct Allocator, it now knows the addresses provided by all of the buses and you can proceed to asking  it for some addresses. You can iterate through all the devices and request a PCI address for each of them.

> and each bus has different rules about what devices can be plugged into them (e.g. a pcie-root-port can only be plugged into pcie-root or a pcie-switch-downstream-port, some devices can only be plugged into pci-root, pci-bridge, or dmi-to-pci-bridge, some can be plugged into pcie-root directly if specifically requested, but it's otherwise avoided, etc). So it's not just a case of a 4-dimensional vector with the same size at each index on on the same axis, and each address is not the same as any other address.

This could be configured by flags. Every address could have the ability to have some flag set for it, and during reserving an address, the caller could specify that he’s looking for addresses with some specific flag.

> Are you intending for this allocator to have any sort of "Give me the next free address" function?

Yes, I intend to do that. If I do it for this generic structure, it will be done for every other address type too. All the address types do have some common features, so it’s better to have it in code once.

> If so, doing this for PCI is likely to require you to add lots of extra complexity that will be completely unnecessary for the other address types
It might not be a bad thing, provided that the functions from the interface that will be used for other address types (other than PCI) won’t be cluttered because of PCI’s complexity.

> Of course maybe your intent is that your allocator will only implement an "allocate this specific address" function, and leave auto-assignment up to a higher layer; but then the higher layer would still need a way to efficiently scan the entire list of free addresses, and each address would need a bit of data describing the controllers at that address so they they could be matched.
I want this data structure to be versatile enough so that you can use it in all the required ways.

If you want some specific address, just ask for it explicitly and you will get a response whether you succeeded or not (you might’ve failed because the address was never added to the pool, or because it’s already used by some other device).

If you want any address, it could be done with a function:
allocatorReserveAny(allocator[PCI_ADDRESS]), which could be the following function in disguise:
allocatorReserveAddress(allocator[PCI_ADDRESS], AllocatorRange(INT_MIN, INT_MAX), AllocatorRange(INT_MIN, INT_MAX), AllocatorRange(INT_MIN, INT_MAX));
Since any address is between INT_MIN and INT_MAX, it would simply find the first one that is available.

> and each address would need a bit of data describing the controllers at that address so they they could be matched.
That’s one of my ideas for further development. Each address could store some additional information, for example what kind of SCSI controller it is. Then during reserving an address, you could restrict that you’re only interested in addresses that have that flag.

>> This function call could look like this:
>> allocatorAddPool(
>>    allocator[PCI_ADDRESS_TYPE],
>>    &outputAddress,
>>    AllocatorRange(0, 0),
>>    AllocatorRange(0, 31),
>>    AllocatorRange(0, 7));
> 
> Again, this doesn't allow for some of the indexes on a particular dimension having a different start/end (or for other differences for a particular index, which possibly could be stored in some opaque pointer).
It’s possible that I misunderstood you, but it does allow you to do that. You can use this allocatorAddPool() function many times to add many addresses to the big pool of possible addresses.Perhaps I shouldn’t have used the word “pool”, because it suggests that there is one pool only.

Think of it as of a function that is a quicker and more convenient possibility of simply specifying all the possible addresses explicitly, one by one:
allocatorAddAddressThatIsPossibleToUse(0, 0, 0);
allocatorAddAddressThatIsPossibleToUse(0, 0, 1);
…
allocatorAddAddressThatIsPossibleToUse(0, 0, 7);
allocatorAddAddressThatIsPossibleToUse(0, 1, 0);
etc.

You can iterate through all of your PCI buses and execute allocatorAddPool() for each of them (all using the same struct Allocator). They can have various possible addresses, starting from 0 or 1 or any other value, some of them might have one lot only and it’s not a problem.

> Ah, there it is! This won't work for PCI (or for scsi) without much more information available for the ranges that are being allocated from.
I hope that I dealt with that somewhere above, but if you still see a problem that arises, please let me know.

> Disk device names are incredibly ugly beasts. Apparently once you get to sdz, you can start over at sdaa, sdab, etc. And the whole thing is pointless anyway, because that name isn't passed directly to the guest anyway (there  isn't a way to do that - the guest OS determines device names on its own based on the bus and order in which the disks appear).
*sigh of relief* That’s great, I prefer to deal with numbers :)

>> If we kept every address separately, it would require 2*32*8=512 addresses.
> 
> ...which isn't all that much, since we aren't talking about microcontrollers with tiny amounts of RAM.
You are right. An array or a linked list will probably be enough.

>> My ideas:
>> - A possibility of reserving a range of addresses might be used
>>   to reserve a whole PCI slot at once.
> 
> I do see the usefulness of this in general, but in this particular case rather than reserving an entire slot at once, we may want to allocate a single function where the device is going to be, but mark the rest of that slot available for further functions to be allocated, then later mark it unavailable for further allocations *without actually allocating the rest of the functions in the slot*. The reason for this is that it would make it simpler to use the same code for detaching a "lone" device on a slot as we do for detaching the functions of a multifunction PCI device. (You're likely thinking "what's the difference between allocating a single function + mark the other functions on the slot as unavailable vs. just marking the entire slot as allocated; it all has to do with being able to easily notice when the slot is once again completely availaable - if you mark all the functions in the slot as used, then when you free it you have to free all the slots; if you're doing that, then you have to figure out whether the device you'r freeing was one function of a multifunction device, or if it was a lone device plugged into a slot by itself. Of course that may not fit in with your abstracted view of generic address allocation.) Right now we "ham fist" this issue by just marking the entire slot as allocated when someone asks for a single function, but really we should do allocations (and frees) one function at a time, but just with a knowledge of when it's appropriate to allocate a 2nd (and further) function on a slot that already has some functions allocated, and when it isn’t
It does fit in with my abstracted view :)
We could add some distinction between “used” addresses and “unavailable” addresses (maybe add a reason for being unavailable too). Or some flags that specify how a particular address type should behave. I must learn more about how PCI addresses behave to be sure, especially if it’s not in our code yet.

In my idea about this data structure, the caller has to know whether he/she wants to free one function, or a whole slot. You can always make a wrapper though that would deal with it.

>> - Is this approach good long-term?
> 
> Again, my concern would be about making something more complex than necessary for the individual uses and their requirements. Eliminating nearly duplicate code is good, but you need to be careful that what you create doesn't require more "glue" to make each of the cases fit into the abstract model than would have been used for a separate simpler "from-scratch" implementation for each individual case.
That’s a valid concern. I am trying to make it as simple as possible when still providing all the required functionality. That’s why I am asking the experienced developers what is needed right now and what might be needed in the future.
If the requirements are too wide for this data structure to be sensible, I will use a different approach.

>> - Is address releasing expected to be required at any point?
>>   (think device hotplug/unplug, etc.)
> 
> Yes. And we need to be aware that sometimes devices are added to the live domain but not the persistent config, sometimes only to the persistent config, and sometimes to both (and when they're added to both, they should use the same address in both live domain and persistent config, which isn't always the case today).
That’s really good to know. I thought that the addresses are persisted though, because I looked at the comment that said:
/* if this is the live domain object, we persist the PCI addresses */

Could you elaborate on that? When/where are addresses handled wrong?

>> - What other features are needed?
> 
> You had said you wanted the qemu_* files to remain the same for now. Exactly what are you figuring on replacing? For example, for PCI addresses (since these are obviously the only ones I care about :-)) are you just planning to replace virDomainPCIAddressBus with your new virAllocator and a set of accessor functions? Even if you do nothing higher level than this, each bus will need to have a "connectFlags" that describes what type of devices can be connected to that bus (or range, or whatever you call it).
The initial idea is to simply create a structure that provides an interface (through functions) which you can use to allocate addresses. That’s just an idea though and it can change depending on the feedback and what my mentor requires from me.

The allocator would remove these:
virDomainCCWAddressSetPtr ccwaddrs;
virDomainVirtioSerialAddrSetPtr vioserialaddrs;
virDomainPCIAddressSetPtr pciaddrs;

virDomainPCIAddressBus would be removed too, because it’s a part of virDomainPCIAddressSetPtr.

> Hopefully my comments haven't just made the situation more muddy :-)
Not at all! They were really useful :)

Tomasz Flendrich

Martin Kletzander

Tomasz Flendrich

Laine Stump

Tomasz Flendrich

Martin Kletzander

tags

participants (3)