[libvirt] [GSoC] Abstracting device address allocation

Hello everyone, Let me introduce myself - I'm Tomasz Flendrich and I'm a Google Summer of Code student from University of Wrocław, Poland. My goal is to create a generic address allocator (for PCI, virtio, SCSI, etc.), because current handling of addresses has its flaws: sometimes addresses aren't properly assigned and checked for duplicates. Additionally, a generic solution will be very useful when more hypervisors add the feature of explicitly stating the addresses of devices. The key goal I'm willing to achieve at this moment is defining a minimum viable product (one thing to be implemented quickly which is reasonably complete on its own, and has potential for further development in the direction which makes sense). I came up with an initial design. The internal allocator's data will be kept in struct Allocator. There will be one structure for one address type and all of the allocators would be in an array. We will have the possibility of adding an address pool to the allocator. For example, when we create a root PCI bus, we tell the allocator that the following PCI addresses are possible: {0..0}:{0..31}.{0..7}, where {a..b} means a range from a to b, inclusive. This function call could look like this: allocatorAddPool( allocator[PCI_ADDRESS_TYPE], &outputAddress, AllocatorRange(0, 0), AllocatorRange(0, 31), AllocatorRange(0, 7)); The outputAddress would be an array owned by the caller (only filled in by the allocator). If we were to allocate IP addresses for computers from the range 192.168.1.2 to 192.168.1.240, where the router has to have the address 192.168.1.254: First, we reserve the address for the router to let the Allocator know that it's in use, and that we can track collisions in manually assigned addresses: allocatorReserveAddress( allocator[IP_ADDRESS_TYPE], &outputAddress, AllocatorValue(192), AllocatorValue(168), AllocatorValue(1), AllocatorValue(254)); Then, we assign the addresses for the computers: allocatorReserveAddress( allocator[IP_ADDRESS_TYPE], &outputAddress, AllocatorValue(192), AllocatorValue(168), AllocatorValue(1), AllocatorRange(1, 254)); Please note that AllocatorValue() is now in use. There could be a different function call to simply assign any address: allocatorReserveAny(Allocator* allocator, &outputAddress); Let's say that we want an "sda" disk. We could create a wrapper: allocatorReserveSCSIAddress(allocator, "sda"); All this wrapper does is representing the string "sda" as an integer mapping to the letter (eg. 'a' = 0, 'z' = 25): allocatorReserveAddress(allocator[SCSI_ADDRESS_TYPE], &outputAddress, 0); If an address is already determined, because it was specified in the XML or it's some specific device that has to be at some specific address, we still reserve it to let the Allocator know that it's in use. How would this work internally? One of the possible solutions is keeping a set of ranges of addresses. For example, suppose that we have two PCI busses to use. Our address pool is stored as one range: {0..1} {0..31} {0..7} Now someone reserves address 0:13.2 Our new free addresses pool is now stored as this set of ranges: {0..0} {0..12} {0..7}, {0..0} {12..12} {0..1}, {0..0} {12..12} {3..7}, {0..0} {13..31} {0..7}, {1..1} {0..31} {0..7} If we kept every address separately, it would require 2*32*8=512 addresses. The set data structure from gnulib's gl_oset.h is a candidate for keeping the ranges in a sorted fashion. Another candidate is simply a sorted list from gl_list.h. This structure would be able to handle all types of addresses that are convertible to a fixed-length list of integers. We don't mind how many of these integers there are, because we can use variadic arguments. It won't allow duplicates if we stick to using it in every place where we have some addresses. It will also be very generic, with the possibility of writing wrappers on top of it that might be more convenient for some particular address type. This way we would keep qemu_* files as they are for now, and just replace existing allocators (and manual assignment of addresses) to calls to our new allocator. My ideas: - A possibility of reserving a range of addresses might be used to reserve a whole PCI slot at once. - Flags could be added, changing the behavior of particular addresses or the whole allocator. My questions: - Is this approach good long-term? - Is address releasing expected to be required at any point? (think device hotplug/unplug, etc.) - What other features are needed? Please speak up your mind if you have remarks or thoughts about the idea. I'd really appreciate it. Tomasz

On Wed, Jun 08, 2016 at 08:04:59PM +0200, Tomasz Flendrich wrote:
Hello everyone,
Let me introduce myself - I'm Tomasz Flendrich and I'm a Google Summer of Code student from University of Wrocław, Poland.
My goal is to create a generic address allocator (for PCI, virtio, SCSI, etc.), because current handling of addresses has its flaws: sometimes addresses aren't properly assigned and checked for duplicates. Additionally, a generic solution will be very useful when more hypervisors add the feature of explicitly stating the addresses of devices.
The key goal I'm willing to achieve at this moment is defining a minimum viable product (one thing to be implemented quickly which is reasonably complete on its own, and has potential for further development in the direction which makes sense).
Hi, sorry for replying a bit later. We talked about the internals together quite a lot and it's good to see the design upstream on the list so that everybody can express their opinions.
I came up with an initial design. The internal allocator's data will be kept in struct Allocator. There will be one structure for one address type and all of the allocators would be in an array.
Just as a note for now, I won't be saying much about naming stuff. 1) it could lead us into a rabbit hole that has no way out, 2) it is not needed for the first RFCs, 3) you'll see how other things will be named around, so I guess you will feel yourself what the right name will be, and finally 4) it is *very* likely that it will be done by others and discussed more than the rest of the code ;)
We will have the possibility of adding an address pool to the allocator. For example, when we create a root PCI bus, we tell the allocator that the following PCI addresses are possible: {0..0}:{0..31}.{0..7}, where {a..b} means a range from a to b, inclusive.
This function call could look like this: allocatorAddPool( allocator[PCI_ADDRESS_TYPE], &outputAddress, AllocatorRange(0, 0), AllocatorRange(0, 31), AllocatorRange(0, 7));
The outputAddress would be an array owned by the caller (only filled in by the allocator).
If we were to allocate IP addresses for computers from the range 192.168.1.2 to 192.168.1.240, where the router has to have the address 192.168.1.254:
Even though we won't be assigning IP addresses, it is really nice analogy to use and explain stuff on since hopefully everyone on the list knows more or less about the limits and properties of them.
First, we reserve the address for the router to let the Allocator know that it's in use, and that we can track collisions in manually assigned addresses:
allocatorReserveAddress( allocator[IP_ADDRESS_TYPE], &outputAddress, AllocatorValue(192), AllocatorValue(168), AllocatorValue(1), AllocatorValue(254));
Then, we assign the addresses for the computers:
allocatorReserveAddress( allocator[IP_ADDRESS_TYPE], &outputAddress, AllocatorValue(192), AllocatorValue(168), AllocatorValue(1), AllocatorRange(1, 254)); Please note that AllocatorValue() is now in use.
Since libvirt is pure C, no C++, you need to make sure errors are caught, esp. allocation errors. If AllocatorValue() is a function and it can fail, then this won't work. But you probably know that, right?
There could be a different function call to simply assign any address: allocatorReserveAny(Allocator* allocator, &outputAddress);
Let's say that we want an "sda" disk. We could create a wrapper: allocatorReserveSCSIAddress(allocator, "sda"); All this wrapper does is representing the string "sda" as an integer mapping to the letter (eg. 'a' = 0, 'z' = 25): allocatorReserveAddress(allocator[SCSI_ADDRESS_TYPE], &outputAddress, 0);
If an address is already determined, because it was specified in the XML or it's some specific device that has to be at some specific address, we still reserve it to let the Allocator know that it's in use.
How would this work internally? One of the possible solutions is keeping a set of ranges of addresses. For example, suppose that we have two PCI busses to use. Our address pool is stored as one range: {0..1} {0..31} {0..7}
Now someone reserves address 0:13.2
Our new free addresses pool is now stored as this set of ranges: {0..0} {0..12} {0..7}, {0..0} {12..12} {0..1}, {0..0} {12..12} {3..7}, {0..0} {13..31} {0..7}, {1..1} {0..31} {0..7}
If we kept every address separately, it would require 2*32*8=512 addresses.
The set data structure from gnulib's gl_oset.h is a candidate for keeping the ranges in a sorted fashion. Another candidate is simply a sorted list from gl_list.h.
We cannot use those since they are GPL and the library part of libvirt is LGPL. We can either create our own or just use whatever there's provided already. We have macros and functions for handling arrays; if you want it sorted, then VIR_INSERT_ELEMENT and others are your friends. With arrays we would just keep pointers there and move those which is not that expensive. Sparse arrays don't make sense to me in this situation. You could also convert it to bunch of bytes (either specifying a new format or just converting them to a string) and use our hash table (see virHashCreate()). But I don't think we'll need to handle these, let me explain why a bit below.
This structure would be able to handle all types of addresses that are convertible to a fixed-length list of integers. We don't mind how many of these integers there are, because we can use variadic arguments. It won't allow duplicates if we stick to using it in every place where we have some addresses. It will also be very generic, with the possibility of writing wrappers on top of it that might be more convenient for some particular address type.
This way we would keep qemu_* files as they are for now, and just replace existing allocators (and manual assignment of addresses) to calls to our new allocator.
Actually, in order for us to be able to replace the calls we have there, we would need to go a bit different route. The one I suggested would actually look a bit different: Let's say we would start by moving qemuDomainAssignSpaprVIOAddress() to, for example, src/conf/domain_addr.c and renaming it to, for example virDomainDeviceAddressAssignSpaprVIO(). Then we would replace all its callers and so on. Every time the function needs to access something in the private data of the domain (qemuDomainObjPrivate), such data would be instead moved into the virDomainObj (or virDomainDef. Whenever qemu-specific information needs to be passed into that function, it could be converted to a parameter of that function. Either some value, flag or some more complex structure, e.g. allocopt (similarly to xmlopt which we have for parsing and formating XML). Slowly we might get to the point where lot of the code is changed to data that driver has stored either per-domain or globally.
My ideas: - A possibility of reserving a range of addresses might be used to reserve a whole PCI slot at once.
I don't see a use case for that.
- Flags could be added, changing the behavior of particular addresses or the whole allocator.
As above, we'll most probably need that.
My questions: - Is this approach good long-term? - Is address releasing expected to be required at any point? (think device hotplug/unplug, etc.)
Every time device is unplugged. At first I thought it will be only on hot-unplug, but it needs to be freed whenever the device is removed and the data will not be recalculated. And that can happen even on cold-unplug (virsh detach-device --config).
- What other features are needed?
We'll see :) Let's start with either of the approaches (I would personally suggest the one I had in mind as I think it goes nicely with current approaches in libvirt and will have less hurdles along the way. Think it through and let us know about the next design steps.
Please speak up your mind if you have remarks or thoughts about the idea. I'd really appreciate it.
Tomasz
Have a nice day, Martin

Just as a note for now, I won't be saying much about naming stuff. 1) it could lead us into a rabbit hole that has no way out, 2) it is not needed for the first RFCs, 3) you'll see how other things will be named around, so I guess you will feel yourself what the right name will be, and finally 4) it is *very* likely that it will be done by others and discussed more than the rest of the code ;) Thank you :) Naming will be dealt with later.
We will have the possibility of adding an address pool to the allocator. For example, when we create a root PCI bus, we tell the allocator that the following PCI addresses are possible: {0..0}:{0..31}.{0..7}, where {a..b} means a range from a to b, inclusive.
This function call could look like this: allocatorAddPool( allocator[PCI_ADDRESS_TYPE], &outputAddress, AllocatorRange(0, 0), AllocatorRange(0, 31), AllocatorRange(0, 7));
The outputAddress would be an array owned by the caller (only filled in by the allocator).
If we were to allocate IP addresses for computers from the range 192.168.1.2 to 192.168.1.240, where the router has to have the address 192.168.1.254:
Even though we won't be assigning IP addresses, it is really nice analogy to use and explain stuff on since hopefully everyone on the list knows more or less about the limits and properties of them.
First, we reserve the address for the router to let the Allocator know that it's in use, and that we can track collisions in manually assigned addresses:
allocatorReserveAddress( allocator[IP_ADDRESS_TYPE], &outputAddress, AllocatorValue(192), AllocatorValue(168), AllocatorValue(1), AllocatorValue(254));
Then, we assign the addresses for the computers:
allocatorReserveAddress( allocator[IP_ADDRESS_TYPE], &outputAddress, AllocatorValue(192), AllocatorValue(168), AllocatorValue(1), AllocatorRange(1, 254)); Please note that AllocatorValue() is now in use.
Since libvirt is pure C, no C++, you need to make sure errors are caught, esp. allocation errors. If AllocatorValue() is a function and it can fail, then this won't work. But you probably know that, right? These functions/macros are just for the convenience of using the allocator.
An idea on how AllocatorRange/Value might work as a macro: Every field (IP has 4 of them) of an address would be provided as a range. For example, if we wanted to allocate an address from the range 192.168.1.1 - 192.168.1.254 without using any macros, it would be: AllocatorReserveAddr(…, 192, 192, 168, 168, 1, 1, 1, 254); but we can use AllocatorValue(X), which would expand to “X, X”, and AllocatorRange(X, Y), which expands to “X, Y”, so the function call looks nicer: allocatorReserveAddress(…, AllocatorValue(192), allocatorValue(168), allocatorValue(1), allocatorRange(1, 254)); Another idea is to use a helper struct with two integers inside. This struct would represent a range. Then a call to AllocatorValue(192) would return a struct with fields {192, 192}, and AllocatorRange(1, 254) would return a struct with fields {1, 254}. AllocatorValue() could be a macro or a function in this case. These functions/macros are just for the convenience of using the allocator. I am open to suggestions though.
There could be a different function call to simply assign any address: allocatorReserveAny(Allocator* allocator, &outputAddress);
Let's say that we want an "sda" disk. We could create a wrapper: allocatorReserveSCSIAddress(allocator, "sda"); All this wrapper does is representing the string "sda" as an integer mapping to the letter (eg. 'a' = 0, 'z' = 25): allocatorReserveAddress(allocator[SCSI_ADDRESS_TYPE], &outputAddress, 0);
If an address is already determined, because it was specified in the XML or it's some specific device that has to be at some specific address, we still reserve it to let the Allocator know that it's in use.
How would this work internally? One of the possible solutions is keeping a set of ranges of addresses. For example, suppose that we have two PCI busses to use. Our address pool is stored as one range: {0..1} {0..31} {0..7}
Now someone reserves address 0:13.2
Our new free addresses pool is now stored as this set of ranges: {0..0} {0..12} {0..7}, {0..0} {12..12} {0..1}, {0..0} {12..12} {3..7}, {0..0} {13..31} {0..7}, {1..1} {0..31} {0..7}
If we kept every address separately, it would require 2*32*8=512 addresses.
The set data structure from gnulib's gl_oset.h is a candidate for keeping the ranges in a sorted fashion. Another candidate is simply a sorted list from gl_list.h.
We cannot use those since they are GPL and the library part of libvirt is LGPL. We can either create our own or just use whatever there's provided already.
We have macros and functions for handling arrays; if you want it sorted, then VIR_INSERT_ELEMENT and others are your friends. With arrays we would just keep pointers there and move those which is not that expensive. Sparse arrays don't make sense to me in this situation.
You could also convert it to bunch of bytes (either specifying a new format or just converting them to a string) and use our hash table (see virHashCreate()).
But I don't think we'll need to handle these, let me explain why a bit below.
This structure would be able to handle all types of addresses that are convertible to a fixed-length list of integers. We don't mind how many of these integers there are, because we can use variadic arguments. It won't allow duplicates if we stick to using it in every place where we have some addresses. It will also be very generic, with the possibility of writing wrappers on top of it that might be more convenient for some particular address type.
This way we would keep qemu_* files as they are for now, and just replace existing allocators (and manual assignment of addresses) to calls to our new allocator.
Actually, in order for us to be able to replace the calls we have there, we would need to go a bit different route. The one I suggested would actually look a bit different:
Let's say we would start by moving qemuDomainAssignSpaprVIOAddress() to, for example, src/conf/domain_addr.c and renaming it to, for example virDomainDeviceAddressAssignSpaprVIO(). Then we would replace all its callers and so on. Every time the function needs to access something in the private data of the domain (qemuDomainObjPrivate), such data would be instead moved into the virDomainObj (or virDomainDef. Whenever qemu-specific information needs to be passed into that function, it could be converted to a parameter of that function. Either some value, flag or some more complex structure, e.g. allocopt (similarly to xmlopt which we have for parsing and formating XML). Slowly we might get to the point where lot of the code is changed to data that driver has stored either per-domain or globally.
I started doing it. I moved almost every function that assigns addresses (apart from PCI addresses, they are the most complex ones) from qemu_domain_address.c to domain_addr.c. Please take a look at my GitHub: https://github.com/Darge/libvirt For an overview of what happened, look at the function qemuDomainAssignAddresses from qemu_domain_address.c. What happened: I changed a lot of function names from qemu* to vir* and moved them to another files. I also moved the structures virDomainCCWAddressSetPtr and virDomainVirtioSerialAddrSetPtr from qemu’s private data to virDomainObj. I had to move some definitions from domain_addr.h to domain_conf.h, because the address sets are now in domainObj, so virDomainObjDispose() has to dispose them. I am not totally sure that the resources are freed properly, because their owner changed, but I will deal with it later, especially that we haven’t even decided whether they should belong to domainDef or domainObj. My next step is to make qemuDomainReleaseDeviceAddress generic and then start doing the same thing with PCI address assignment functions. Is that what you meant? Does it look good at first glance? Thank you very much for the email! Tomasz
My ideas: - A possibility of reserving a range of addresses might be used to reserve a whole PCI slot at once.
I don't see a use case for that.
- Flags could be added, changing the behavior of particular addresses or the whole allocator.
As above, we'll most probably need that.
My questions: - Is this approach good long-term? - Is address releasing expected to be required at any point? (think device hotplug/unplug, etc.)
Every time device is unplugged. At first I thought it will be only on hot-unplug, but it needs to be freed whenever the device is removed and the data will not be recalculated. And that can happen even on cold-unplug (virsh detach-device --config).
- What other features are needed?
We'll see :) Let's start with either of the approaches (I would personally suggest the one I had in mind as I think it goes nicely with current approaches in libvirt and will have less hurdles along the way.
Think it through and let us know about the next design steps.
Please speak up your mind if you have remarks or thoughts about the idea. I'd really appreciate it.
Tomasz
Have a nice day, Martin

On 06/08/2016 02:04 PM, Tomasz Flendrich wrote:
Hello everyone,
Let me introduce myself - I'm Tomasz Flendrich and I'm a Google Summer of Code student from University of Wrocław, Poland.
My goal is to create a generic address allocator (for PCI, virtio, SCSI, etc.), because current handling of addresses has its flaws: sometimes addresses aren't properly assigned and checked for duplicates.
I know you took this sentence straight from the wiki page of GSoC ideas, but I'd be interested to see a few specific examples of the problems that you intend to solve (maybe Martin can help here, since I think he's the one who wrote that). I know of a few problems with PCI address allocation, but abstracting the data structure and functions used for allocation aren't going to solve them (for example, if you attach a device that uses PCI with "virsh attach-device .... --live", then then add another with "virsh attach-device --live --config", the new device will be assigned a different address in the persistent config than in the live domain (the result will be that the next time you shutdown and restart, the device will appear at a different address, which could confuse the guest OS). Another (which I need to fix *very* soon) is that even when a device is classified as connecting to PCIe slots rather than PCI (and there aren't enough of those right now, which is *yet another* problem I need to fix), if there aren't any empty PCIe slots available to assign, we'll fail rather than adding a new controller of the appropriate type and attached to the appropriate existing controller). But, as I said, a new data structure isn't going to solve any of those problems (and I'm not sure it will make the solution any easier to get to either).
Additionally, a generic solution will be very useful when more hypervisors add the feature of explicitly stating the addresses of devices.
The key goal I'm willing to achieve at this moment is defining a minimum viable product (one thing to be implemented quickly which is reasonably complete on its own, and has potential for further development in the direction which makes sense).
I came up with an initial design. The internal allocator's data will be kept in struct Allocator. There will be one structure for one address type and all of the allocators would be in an array.
We will have the possibility of adding an address pool to the allocator. For example, when we create a root PCI bus, we tell the allocator that the following PCI addresses are possible: {0..0}:{0..31}.{0..7}, where {a..b} means a range from a to b, inclusive.
PCI addresses are likely much more complicated than you've planned for this. For starters, each PCI bus can have a different number of slots available (the 2nd dimension, which you list as {0..31}, has different start & end indexes depending on the type of controller that provides that bus. (e.g. pci-root, pcie-root, and pci-bridge have 31 slots from 1-31, dmi-to-pci-bridge and pcie-switch-upstream-port have 32 slots from 0-31, pcie-root-port and pcie-switch-downstream-port each have a single slot at 0,), and each bus has different rules about what devices can be plugged into them (e.g. a pcie-root-port can only be plugged into pcie-root or a pcie-switch-downstream-port, some devices can only be plugged into pci-root, pci-bridge, or dmi-to-pci-bridge, some can be plugged into pcie-root directly if specifically requested, but it's otherwise avoided, etc). So it's not just a case of a 4-dimensional vector with the same size at each index on on the same axis, and each address is not the same as any other address. Are you intending for this allocator to have any sort of "Give me the next free address" function? If so, doing this for PCI is likely to require you to add lots of extra complexity that will be completely unnecessary for the other address types (although... there are several different kinds of SCSI controller (including virtio-scsi as well as several emulations of real hardware) that all share the same address space, and we need to know what model is each controller so that the user can be sure devices are placed on the desired controller.) Of course maybe your intent is that your allocator will only implement an "allocate this specific address" function, and leave auto-assignment up to a higher layer; but then the higher layer would still need a way to efficiently scan the entire list of free addresses, and each address would need a bit of data describing the controllers at that address so they they could be matched. (Another thing - at any time a new bus of just about any type can be added, sometimes in response to a request to allocate an address for which no matching slot is available (e.g. - adding a new pcie-root-port, pcie-switch-upstream-port, and list of pcie-downstream-ports when there are no more hotpluggable PCIe slots available)).
This function call could look like this: allocatorAddPool( allocator[PCI_ADDRESS_TYPE], &outputAddress, AllocatorRange(0, 0), AllocatorRange(0, 31), AllocatorRange(0, 7));
Again, this doesn't allow for some of the indexes on a particular dimension having a different start/end (or for other differences for a particular index, which possibly could be stored in some opaque pointer).
The outputAddress would be an array owned by the caller (only filled in by the allocator).
If we were to allocate IP addresses for computers from the range 192.168.1.2 to 192.168.1.240, where the router has to have the address 192.168.1.254:
First, we reserve the address for the router to let the Allocator know that it's in use, and that we can track collisions in manually assigned addresses:
allocatorReserveAddress( allocator[IP_ADDRESS_TYPE], &outputAddress, AllocatorValue(192), AllocatorValue(168), AllocatorValue(1), AllocatorValue(254));
Then, we assign the addresses for the computers:
allocatorReserveAddress( allocator[IP_ADDRESS_TYPE], &outputAddress, AllocatorValue(192), AllocatorValue(168), AllocatorValue(1), AllocatorRange(1, 254)); Please note that AllocatorValue() is now in use.
There could be a different function call to simply assign any address: allocatorReserveAny(Allocator* allocator, &outputAddress);
Ah, there it is! This won't work for PCI (or for scsi) without much more information available for the ranges that are being allocated from.
Let's say that we want an "sda" disk. We could create a wrapper: allocatorReserveSCSIAddress(allocator, "sda"); All this wrapper does is representing the string "sda" as an integer mapping to the letter (eg. 'a' = 0, 'z' = 25): allocatorReserveAddress(allocator[SCSI_ADDRESS_TYPE], &outputAddress, 0);
Disk device names are incredibly ugly beasts. Apparently once you get to sdz, you can start over at sdaa, sdab, etc. And the whole thing is pointless anyway, because that name isn't passed directly to the guest anyway (there isn't a way to do that - the guest OS determines device names on its own based on the bus and order in which the disks appear).
If an address is already determined, because it was specified in the XML or it's some specific device that has to be at some specific address, we still reserve it to let the Allocator know that it's in use.
How would this work internally? One of the possible solutions is keeping a set of ranges of addresses. For example, suppose that we have two PCI busses to use. Our address pool is stored as one range: {0..1} {0..31} {0..7}
Now someone reserves address 0:13.2
Our new free addresses pool is now stored as this set of ranges: {0..0} {0..12} {0..7}, {0..0} {12..12} {0..1}, {0..0} {12..12} {3..7}, {0..0} {13..31} {0..7}, {1..1} {0..31} {0..7}
If we kept every address separately, it would require 2*32*8=512 addresses.
...which isn't all that much, since we aren't talking about microcontrollers with tiny amounts of RAM.
The set data structure from gnulib's gl_oset.h is a candidate for keeping the ranges in a sorted fashion. Another candidate is simply a sorted list from gl_list.h.
Without looking at the implementation, I'm concerned that this may be adding lots of extra code complexity in order to save just a few bytes of RAM, which realistically is pointless - we only have one pool of PCI addresses per active domain, so the amount of memory used to store a bitmap of the availability of addresses is inconsequential in relation to the total memory requirement for a domain.
This structure would be able to handle all types of addresses that are convertible to a fixed-length list of integers. We don't mind how many of these integers there are, because we can use variadic arguments. It won't allow duplicates if we stick to using it in every place where we have some addresses. It will also be very generic, with the possibility of writing wrappers on top of it that might be more convenient for some particular address type.
This way we would keep qemu_* files as they are for now, and just replace existing allocators (and manual assignment of addresses) to calls to our new allocator.
My ideas: - A possibility of reserving a range of addresses might be used to reserve a whole PCI slot at once.
I do see the usefulness of this in general, but in this particular case rather than reserving an entire slot at once, we may want to allocate a single function where the device is going to be, but mark the rest of that slot available for further functions to be allocated, then later mark it unavailable for further allocations *without actually allocating the rest of the functions in the slot*. The reason for this is that it would make it simpler to use the same code for detaching a "lone" device on a slot as we do for detaching the functions of a multifunction PCI device. (You're likely thinking "what's the difference between allocating a single function + mark the other functions on the slot as unavailable vs. just marking the entire slot as allocated; it all has to do with being able to easily notice when the slot is once again completely availaable - if you mark all the functions in the slot as used, then when you free it you have to free all the slots; if you're doing that, then you have to figure out whether the device you'r freeing was one function of a multifunction device, or if it was a lone device plugged into a slot by itself. Of course that may not fit in with your abstracted view of generic address allocation.) Right now we "ham fist" this issue by just marking the entire slot as allocated when someone asks for a single function, but really we should do allocations (and frees) one function at a time, but just with a knowledge of when it's appropriate to allocate a 2nd (and further) function on a slot that already has some functions allocated, and when it isn't.
- Flags could be added, changing the behavior of particular addresses or the whole allocator.
My questions: - Is this approach good long-term?
Again, my concern would be about making something more complex than necessary for the individual uses and their requirements. Eliminating nearly duplicate code is good, but you need to be careful that what you create doesn't require more "glue" to make each of the cases fit into the abstract model than would have been used for a separate simpler "from-scratch" implementation for each individual case.
- Is address releasing expected to be required at any point? (think device hotplug/unplug, etc.)
Yes. And we need to be aware that sometimes devices are added to the live domain but not the persistent config, sometimes only to the persistent config, and sometimes to both (and when they're added to both, they should use the same address in both live domain and persistent config, which isn't always the case today).
- What other features are needed?
You had said you wanted the qemu_* files to remain the same for now. Exactly what are you figuring on replacing? For example, for PCI addresses (since these are obviously the only ones I care about :-)) are you just planning to replace virDomainPCIAddressBus with your new virAllocator and a set of accessor functions? Even if you do nothing higher level than this, each bus will need to have a "connectFlags" that describes what type of devices can be connected to that bus (or range, or whatever you call it).
Please speak up your mind if you have remarks or thoughts about the idea. I'd really appreciate it.
Hopefully my comments haven't just made the situation more muddy :-)

Thank you for your input, I really appreciate it. My goal was to define a data structure with a minimal set of features that would make sense and could be further developed. > I know of a few problems with PCI address allocation, but abstracting the data structure and functions used for allocation aren't going to solve them (for example, if you attach a device that uses PCI with "virsh attach-device .... --live", then then add another with "virsh attach-device --live --config", the new device will be assigned a different address in the persistent config than in the live domain (the result will be that the next time you shutdown and restart, the device will appear at a different address, which could confuse the guest OS). Another (which I need to fix *very* soon) is that even when a device is classified as connecting to PCIe slots rather than PCI (and there aren't enough of those right now, which is *yet another* problem I need to fix), if there aren't any empty PCIe slots available to assign, we'll fail rather than adding a new controller of the appropriate type and attached to the appropriate existing controller). But, as I said, a new data structure isn't going to solve any of those problems (and I'm not sure it will make the solution any easier to get to either). That’s right, the structure itself won’t solve these. Using it might make address assignment more convenient though, because you will be using a uniform interface for dealing with all addresses: 1. There will be no need to use virDomainCCWAddressAsString() because you are doing a hashtable lookup. You wouldn’t have to know if there is a hashtable. 2. You also wouldn’t have to iterate through controllers every time you want to assign one address, which happens in virDomainVirtioSerialAddrNext. You would just iterate through all the controllers once and add the addresses that they supply to the pool of available addresses. Then if you need an address for a device, just ask for one and the allocator will deal with it. > PCI addresses are likely much more complicated than you've planned for this. I spent a lot of time reading about the PCI addresses in libvirt and I am somewhat familiar with their complexity. I was trying to define a minimum viable product, so this data structure was stripped out of features. Let me elaborate how it could be expanded below. > For starters, each PCI bus can have a different number of slots available (the 2nd dimension, which you list as {0..31}, has different start & end indexes depending on the type of controller that provides that bus. (e.g. pci-root, pcie-root, and pci-bridge have 31 slots from 1-31, dmi-to-pci-bridge and pcie-switch-upstream-port have 32 slots from 0-31, pcie-root-port and pcie-switch-downstream-port each have a single slot at 0,), I might have not explained myself sufficiently on how I see this data structure in use. This data structure would keep *all* the available addresses of a given type, e.g. one allocator for all the PCI addresses. Initially, the pool of possible addresses is empty. Then, you can iterate through the buses and if you encounter a PCI bus, insert its addresses that can be used (slots and functions) into the allocator. Let’s say we want to do it with a pcie-root-port that has only one slot that has 7 functions and its index is 0: allocatorAddPool(allocator[PCIE_ADDRESS], AllocatorValue(0), AllocatorValue(1), AllocatorRange(0, 7)); If you decide that your machine also has a pci-root with 31 slots from 1 to 31 (each with 7 functions) and its index is 0, do this: allocatorAddPool(allocator[PCI_ADDRESS], AllocatorValue(0), AllocatorRange(1, 31), AllocatorRange(0, 7)); If you also add (on index 5) a new type of pci bus that has 32 slots (from 0 to 31) and the first 16 slots have 8 functions, but the remaming slots have 3 functions only, you can use the allocatorAddPool() function twice: allocatorAddPool(allocator[PCI_ADDRESS), AllocatorValue(5), AllocatorRange(0, 15), AllocatorRange(0, 7)); allocatorAddPool(allocator[PCI_ADDRESS), AllocatorValue(5), AllocatorRange(16, 31), AllocatorRange(0, 2)); After you do all of these on a single struct Allocator, it now knows the addresses provided by all of the buses and you can proceed to asking it for some addresses. You can iterate through all the devices and request a PCI address for each of them. > and each bus has different rules about what devices can be plugged into them (e.g. a pcie-root-port can only be plugged into pcie-root or a pcie-switch-downstream-port, some devices can only be plugged into pci-root, pci-bridge, or dmi-to-pci-bridge, some can be plugged into pcie-root directly if specifically requested, but it's otherwise avoided, etc). So it's not just a case of a 4-dimensional vector with the same size at each index on on the same axis, and each address is not the same as any other address. This could be configured by flags. Every address could have the ability to have some flag set for it, and during reserving an address, the caller could specify that he’s looking for addresses with some specific flag. > Are you intending for this allocator to have any sort of "Give me the next free address" function? Yes, I intend to do that. If I do it for this generic structure, it will be done for every other address type too. All the address types do have some common features, so it’s better to have it in code once. > If so, doing this for PCI is likely to require you to add lots of extra complexity that will be completely unnecessary for the other address types It might not be a bad thing, provided that the functions from the interface that will be used for other address types (other than PCI) won’t be cluttered because of PCI’s complexity. > Of course maybe your intent is that your allocator will only implement an "allocate this specific address" function, and leave auto-assignment up to a higher layer; but then the higher layer would still need a way to efficiently scan the entire list of free addresses, and each address would need a bit of data describing the controllers at that address so they they could be matched. I want this data structure to be versatile enough so that you can use it in all the required ways. If you want some specific address, just ask for it explicitly and you will get a response whether you succeeded or not (you might’ve failed because the address was never added to the pool, or because it’s already used by some other device). If you want any address, it could be done with a function: allocatorReserveAny(allocator[PCI_ADDRESS]), which could be the following function in disguise: allocatorReserveAddress(allocator[PCI_ADDRESS], AllocatorRange(INT_MIN, INT_MAX), AllocatorRange(INT_MIN, INT_MAX), AllocatorRange(INT_MIN, INT_MAX)); Since any address is between INT_MIN and INT_MAX, it would simply find the first one that is available. > and each address would need a bit of data describing the controllers at that address so they they could be matched. That’s one of my ideas for further development. Each address could store some additional information, for example what kind of SCSI controller it is. Then during reserving an address, you could restrict that you’re only interested in addresses that have that flag. >> This function call could look like this: >> allocatorAddPool( >> allocator[PCI_ADDRESS_TYPE], >> &outputAddress, >> AllocatorRange(0, 0), >> AllocatorRange(0, 31), >> AllocatorRange(0, 7)); > > Again, this doesn't allow for some of the indexes on a particular dimension having a different start/end (or for other differences for a particular index, which possibly could be stored in some opaque pointer). It’s possible that I misunderstood you, but it does allow you to do that. You can use this allocatorAddPool() function many times to add many addresses to the big pool of possible addresses.Perhaps I shouldn’t have used the word “pool”, because it suggests that there is one pool only. Think of it as of a function that is a quicker and more convenient possibility of simply specifying all the possible addresses explicitly, one by one: allocatorAddAddressThatIsPossibleToUse(0, 0, 0); allocatorAddAddressThatIsPossibleToUse(0, 0, 1); … allocatorAddAddressThatIsPossibleToUse(0, 0, 7); allocatorAddAddressThatIsPossibleToUse(0, 1, 0); etc. You can iterate through all of your PCI buses and execute allocatorAddPool() for each of them (all using the same struct Allocator). They can have various possible addresses, starting from 0 or 1 or any other value, some of them might have one lot only and it’s not a problem. > Ah, there it is! This won't work for PCI (or for scsi) without much more information available for the ranges that are being allocated from. I hope that I dealt with that somewhere above, but if you still see a problem that arises, please let me know. > Disk device names are incredibly ugly beasts. Apparently once you get to sdz, you can start over at sdaa, sdab, etc. And the whole thing is pointless anyway, because that name isn't passed directly to the guest anyway (there isn't a way to do that - the guest OS determines device names on its own based on the bus and order in which the disks appear). *sigh of relief* That’s great, I prefer to deal with numbers :) >> If we kept every address separately, it would require 2*32*8=512 addresses. > > ...which isn't all that much, since we aren't talking about microcontrollers with tiny amounts of RAM. You are right. An array or a linked list will probably be enough. >> My ideas: >> - A possibility of reserving a range of addresses might be used >> to reserve a whole PCI slot at once. > > I do see the usefulness of this in general, but in this particular case rather than reserving an entire slot at once, we may want to allocate a single function where the device is going to be, but mark the rest of that slot available for further functions to be allocated, then later mark it unavailable for further allocations *without actually allocating the rest of the functions in the slot*. The reason for this is that it would make it simpler to use the same code for detaching a "lone" device on a slot as we do for detaching the functions of a multifunction PCI device. (You're likely thinking "what's the difference between allocating a single function + mark the other functions on the slot as unavailable vs. just marking the entire slot as allocated; it all has to do with being able to easily notice when the slot is once again completely availaable - if you mark all the functions in the slot as used, then when you free it you have to free all the slots; if you're doing that, then you have to figure out whether the device you'r freeing was one function of a multifunction device, or if it was a lone device plugged into a slot by itself. Of course that may not fit in with your abstracted view of generic address allocation.) Right now we "ham fist" this issue by just marking the entire slot as allocated when someone asks for a single function, but really we should do allocations (and frees) one function at a time, but just with a knowledge of when it's appropriate to allocate a 2nd (and further) function on a slot that already has some functions allocated, and when it isn’t It does fit in with my abstracted view :) We could add some distinction between “used” addresses and “unavailable” addresses (maybe add a reason for being unavailable too). Or some flags that specify how a particular address type should behave. I must learn more about how PCI addresses behave to be sure, especially if it’s not in our code yet. In my idea about this data structure, the caller has to know whether he/she wants to free one function, or a whole slot. You can always make a wrapper though that would deal with it. >> - Is this approach good long-term? > > Again, my concern would be about making something more complex than necessary for the individual uses and their requirements. Eliminating nearly duplicate code is good, but you need to be careful that what you create doesn't require more "glue" to make each of the cases fit into the abstract model than would have been used for a separate simpler "from-scratch" implementation for each individual case. That’s a valid concern. I am trying to make it as simple as possible when still providing all the required functionality. That’s why I am asking the experienced developers what is needed right now and what might be needed in the future. If the requirements are too wide for this data structure to be sensible, I will use a different approach. >> - Is address releasing expected to be required at any point? >> (think device hotplug/unplug, etc.) > > Yes. And we need to be aware that sometimes devices are added to the live domain but not the persistent config, sometimes only to the persistent config, and sometimes to both (and when they're added to both, they should use the same address in both live domain and persistent config, which isn't always the case today). That’s really good to know. I thought that the addresses are persisted though, because I looked at the comment that said: /* if this is the live domain object, we persist the PCI addresses */ Could you elaborate on that? When/where are addresses handled wrong? >> - What other features are needed? > > You had said you wanted the qemu_* files to remain the same for now. Exactly what are you figuring on replacing? For example, for PCI addresses (since these are obviously the only ones I care about :-)) are you just planning to replace virDomainPCIAddressBus with your new virAllocator and a set of accessor functions? Even if you do nothing higher level than this, each bus will need to have a "connectFlags" that describes what type of devices can be connected to that bus (or range, or whatever you call it). The initial idea is to simply create a structure that provides an interface (through functions) which you can use to allocate addresses. That’s just an idea though and it can change depending on the feedback and what my mentor requires from me. The allocator would remove these: virDomainCCWAddressSetPtr ccwaddrs; virDomainVirtioSerialAddrSetPtr vioserialaddrs; virDomainPCIAddressSetPtr pciaddrs; virDomainPCIAddressBus would be removed too, because it’s a part of virDomainPCIAddressSetPtr. > Hopefully my comments haven't just made the situation more muddy :-) Not at all! They were really useful :)

On Mon, Jun 13, 2016 at 04:44:53AM +0200, Tomasz Flendrich wrote:
Thank you for your input, I really appreciate it.
My goal was to define a data structure with a minimal set of features that would make sense and could be further developed.
I know of a few problems with PCI address allocation, but abstracting the data structure and functions used for allocation aren't going to solve them (for example, if you attach a device that uses PCI with "virsh attach-device .... --live", then then add another with "virsh attach-device --live --config", the new device will be assigned a different address in the persistent config than in the live domain (the result will be that the next time you shutdown and restart, the device will appear at a different address, which could confuse the guest OS). Another (which I need to fix *very* soon) is that even when a device is classified as connecting to PCIe slots rather than PCI (and there aren't enough of those right now, which is *yet another* problem I need to fix), if there aren't any empty PCIe slots available to assign, we'll fail rather than adding a new controller of the appropriate type and attached to the appropriate existing controller). But, as I said, a new data structure isn't going to solve any of those problems (and I'm not sure it will make the solution any easier to get to either).
Laine probably missed my mail as a reply: https://www.redhat.com/archives/libvir-list/2016-June/msg00677.html Funnily enough, the first thing will not be dealt with at first, but it will make the solution clearer (though not necessarily way easier at first sight). We are now working with only one set of addresses and we don't distinguish between live and config scenarios. Part of what Tomasz is doing is moving the address sets from the QEMU's private data for the domain into non-QEMU object (e.g. virDomainDef). That way each definition will have it's own address allocation data. We don't have that now. Even though we could now just allocate one address and if both _CONFIG and _LIVE flags are supplied, we would just give it to both, in the future we would be able to create a function that allocates empty address in both sets. But more importantly, Tomasz mail only drafted one of the first ideas. What I suggested aiming at from the start are less complex address types that we have. I'd like to leave PCI for the finish line. To elaborate on what I wanted to achieve with the project; there are two major problems in how we deal with address allocation. One of them is that all the code is concentrated to the drivers. If multiple drivers want to do some interesting things, it is coded in every single one of them. Of course each one might want to work a bit differently, but most of the code can be shared and the rest can either be left in the driver itself or just changed a bit using function arguments. The second problem is that we only deal with addresses that someone remembers to add the allocation code for. We have many types of addresses, yet we were properly allocating only one of them two years ago. Now it's three and each one of them has it's own code. Ad that's only in QEMU. The idea here is that adding next address types to be handled would be just a matter of assigning a function into some structure or array.
That’s right, the structure itself won’t solve these. Using it might make address assignment more convenient though, because you will be using a uniform interface for dealing with all addresses: 1. There will be no need to use virDomainCCWAddressAsString() because you are doing a hashtable lookup. You wouldn’t have to know if there is a hashtable. 2. You also wouldn’t have to iterate through controllers every time you want to assign one address, which happens in virDomainVirtioSerialAddrNext. You would just iterate through all the controllers once and add the addresses that they supply to the pool of available addresses. Then if you need an address for a device, just ask for one and the allocator will deal with it.
PCI addresses are likely much more complicated than you've planned for this.
I spent a lot of time reading about the PCI addresses in libvirt and I am somewhat familiar with their complexity. I was trying to define a minimum viable product, so this data structure was stripped out of features. Let me elaborate how it could be expanded below.
That's one of the reasons I wanted to start with something less than PCI addresses.
For starters, each PCI bus can have a different number of slots available (the 2nd dimension, which you list as {0..31}, has different start & end indexes depending on the type of controller that provides that bus. (e.g. pci-root, pcie-root, and pci-bridge have 31 slots from 1-31, dmi-to-pci-bridge and pcie-switch-upstream-port have 32 slots from 0-31, pcie-root-port and pcie-switch-downstream-port each have a single slot at 0,),
Thanks for the explanation. I believe there will be even more stuff to come, right? ;-)
I might have not explained myself sufficiently on how I see this data structure in use. This data structure would keep *all* the available addresses of a given type, e.g. one allocator for all the PCI addresses.
Initially, the pool of possible addresses is empty. Then, you can iterate through the buses and if you encounter a PCI bus, insert its addresses that can be used (slots and functions) into the allocator. Let’s say we want to do it with a pcie-root-port that has only one slot that has 7 functions and its index is 0:
allocatorAddPool(allocator[PCIE_ADDRESS], AllocatorValue(0), AllocatorValue(1), AllocatorRange(0, 7));
If you decide that your machine also has a pci-root with 31 slots from 1 to 31 (each with 7 functions) and its index is 0, do this:
allocatorAddPool(allocator[PCI_ADDRESS], AllocatorValue(0), AllocatorRange(1, 31), AllocatorRange(0, 7));
If you also add (on index 5) a new type of pci bus that has 32 slots (from 0 to 31) and the first 16 slots have 8 functions, but the remaming slots have 3 functions only, you can use the allocatorAddPool() function twice:
allocatorAddPool(allocator[PCI_ADDRESS), AllocatorValue(5), AllocatorRange(0, 15), AllocatorRange(0, 7)); allocatorAddPool(allocator[PCI_ADDRESS), AllocatorValue(5), AllocatorRange(16, 31), AllocatorRange(0, 2));
After you do all of these on a single struct Allocator, it now knows the addresses provided by all of the buses and you can proceed to asking it for some addresses. You can iterate through all the devices and request a PCI address for each of them.
and each bus has different rules about what devices can be plugged into them (e.g. a pcie-root-port can only be plugged into pcie-root or a pcie-switch-downstream-port, some devices can only be plugged into pci-root, pci-bridge, or dmi-to-pci-bridge, some can be plugged into pcie-root directly if specifically requested, but it's otherwise avoided, etc). So it's not just a case of a 4-dimensional vector with the same size at each index on on the same axis, and each address is not the same as any other address.
This could be configured by flags. Every address could have the ability to have some flag set for it, and during reserving an address, the caller could specify that he’s looking for addresses with some specific flag.
Are you intending for this allocator to have any sort of "Give me the next free address" function?
Yes, I intend to do that. If I do it for this generic structure, it will be done for every other address type too. All the address types do have some common features, so it’s better to have it in code once.
This should be more like a "give this device in this domain definition an address and make sure these conditions are met", however it might also end up being part of "Make sure all addresses in this virDomainDef are assigned.
If so, doing this for PCI is likely to require you to add lots of extra complexity that will be completely unnecessary for the other address types
It might not be a bad thing, provided that the functions from the interface that will be used for other address types (other than PCI) won’t be cluttered because of PCI’s complexity.
Of course maybe your intent is that your allocator will only implement an "allocate this specific address" function, and leave auto-assignment up to a higher layer; but then the higher layer would still need a way to efficiently scan the entire list of free addresses, and each address would need a bit of data describing the controllers at that address so they they could be matched.
I want this data structure to be versatile enough so that you can use it in all the required ways.
If you want some specific address, just ask for it explicitly and you will get a response whether you succeeded or not (you might’ve failed because the address was never added to the pool, or because it’s already used by some other device).
If you want any address, it could be done with a function:
allocatorReserveAny(allocator[PCI_ADDRESS]), which could be the following function in disguise: allocatorReserveAddress(allocator[PCI_ADDRESS], AllocatorRange(INT_MIN, INT_MAX), AllocatorRange(INT_MIN, INT_MAX), AllocatorRange(INT_MIN, INT_MAX));
Since any address is between INT_MIN and INT_MAX, it would simply find the first one that is available.
and each address would need a bit of data describing the controllers at that address so they they could be matched.
That’s one of my ideas for further development. Each address could store some additional information, for example what kind of SCSI controller it is. Then during reserving an address, you could restrict that you’re only interested in addresses that have that flag.
This function call could look like this: allocatorAddPool( allocator[PCI_ADDRESS_TYPE], &outputAddress, AllocatorRange(0, 0), AllocatorRange(0, 31), AllocatorRange(0, 7));
Again, this doesn't allow for some of the indexes on a particular dimension having a different start/end (or for other differences for a particular index, which possibly could be stored in some opaque pointer).
It’s possible that I misunderstood you, but it does allow you to do that. You can use this allocatorAddPool() function many times to add many addresses to the big pool of possible addresses.Perhaps I shouldn’t have used the word “pool”, because it suggests that there is one pool only.
Think of it as of a function that is a quicker and more convenient possibility of simply specifying all the possible addresses explicitly, one by one:
allocatorAddAddressThatIsPossibleToUse(0, 0, 0); allocatorAddAddressThatIsPossibleToUse(0, 0, 1); … allocatorAddAddressThatIsPossibleToUse(0, 0, 7); allocatorAddAddressThatIsPossibleToUse(0, 1, 0); etc.
You can iterate through all of your PCI buses and execute allocatorAddPool() for each of them (all using the same struct Allocator). They can have various possible addresses, starting from 0 or 1 or any other value, some of them might have one lot only and it’s not a problem.
Ah, there it is! This won't work for PCI (or for scsi) without much more information available for the ranges that are being allocated from.
I hope that I dealt with that somewhere above, but if you still see a problem that arises, please let me know.
Disk device names are incredibly ugly beasts. Apparently once you get to sdz, you can start over at sdaa, sdab, etc. And the whole thing is pointless anyway, because that name isn't passed directly to the guest anyway (there isn't a way to do that - the guest OS determines device names on its own based on the bus and order in which the disks appear).
*sigh of relief* That’s great, I prefer to deal with numbers :)
If we kept every address separately, it would require 2*32*8=512 addresses.
...which isn't all that much, since we aren't talking about microcontrollers with tiny amounts of RAM.
You are right. An array or a linked list will probably be enough.
Linked lists are nice to teach at school but there are like five real-world cases where they are used to store data that you can move freely around. Especially if the data are pointers. Array should be fine for a long time =) I don't know why we couldn't use the virDomainPCIAddressSet for starters. It should be more about adding abstract "wrappers" around all these functions so that we are moving code from QEMU (and possibly other drivers) into src/conf/. Sorry for the confusions, I had it planed out a bit better, but that was when I wrote the idea, more than 3 years ago :-/
My ideas: - A possibility of reserving a range of addresses might be used to reserve a whole PCI slot at once.
I do see the usefulness of this in general, but in this particular case rather than reserving an entire slot at once, we may want to allocate a single function where the device is going to be, but mark the rest of that slot available for further functions to be allocated, then later mark it unavailable for further allocations *without actually allocating the rest of the functions in the slot*. The reason for this is that it would make it simpler to use the same code for detaching a "lone" device on a slot as we do for detaching the functions of a multifunction PCI device. (You're likely thinking "what's the difference between allocating a single function + mark the other functions on the slot as unavailable vs. just marking the entire slot as allocated; it all has to do with being able to easily notice when the slot is once again completely availaable - if you mark all the functions in the slot as used, then when you free it you have to free all the slots; if you're doing that, then you have to figure out whether the device you'r freeing was one function of a multifunction device, or if it was a lone device plugged into a slot by itself. Of course that may not fit in with your abstracted view of generic address allocation.) Right now we "ham fist" this issue by just marking the entire slot as allocated when someone asks for a single function, but really we should do allocations (and frees) one function at a time, but just with a knowledge of when it's appropriate to allocate a 2nd (and further) function on a slot that already has some functions allocated, and when it isn’t
It does fit in with my abstracted view :) We could add some distinction between “used” addresses and “unavailable” addresses (maybe add a reason for being unavailable too). Or some flags that specify how a particular address type should behave. I must learn more about how PCI addresses behave to be sure, especially if it’s not in our code yet.
In my idea about this data structure, the caller has to know whether he/she wants to free one function, or a whole slot. You can always make a wrapper though that would deal with it.
- Is this approach good long-term?
Again, my concern would be about making something more complex than necessary for the individual uses and their requirements. Eliminating nearly duplicate code is good, but you need to be careful that what you create doesn't require more "glue" to make each of the cases fit into the abstract model than would have been used for a separate simpler "from-scratch" implementation for each individual case.
That’s a valid concern. I am trying to make it as simple as possible when still providing all the required functionality. That’s why I am asking the experienced developers what is needed right now and what might be needed in the future. If the requirements are too wide for this data structure to be sensible, I will use a different approach.
- Is address releasing expected to be required at any point? (think device hotplug/unplug, etc.)
Yes. And we need to be aware that sometimes devices are added to the live domain but not the persistent config, sometimes only to the persistent config, and sometimes to both (and when they're added to both, they should use the same address in both live domain and persistent config, which isn't always the case today).
That’s really good to know. I thought that the addresses are persisted though, because I looked at the comment that said: /* if this is the live domain object, we persist the PCI addresses */
Could you elaborate on that? When/where are addresses handled wrong?
- What other features are needed?
You had said you wanted the qemu_* files to remain the same for now. Exactly what are you figuring on replacing? For example, for PCI addresses (since these are obviously the only ones I care about :-)) are you just planning to replace virDomainPCIAddressBus with your new virAllocator and a set of accessor functions? Even if you do nothing higher level than this, each bus will need to have a "connectFlags" that describes what type of devices can be connected to that bus (or range, or whatever you call it).
The initial idea is to simply create a structure that provides an interface (through functions) which you can use to allocate addresses. That’s just an idea though and it can change depending on the feedback and what my mentor requires from me.
The allocator would remove these: virDomainCCWAddressSetPtr ccwaddrs; virDomainVirtioSerialAddrSetPtr vioserialaddrs; virDomainPCIAddressSetPtr pciaddrs;
virDomainPCIAddressBus would be removed too, because it’s a part of virDomainPCIAddressSetPtr.
Hopefully my comments haven't just made the situation more muddy :-)
Not at all! They were really useful :)
All ideas and comments are welcomed!
participants (3)
-
Laine Stump
-
Martin Kletzander
-
Tomasz Flendrich