This patch series is a combination of series done by
Wolfgang Mauerer to support proper SCSI drive hotplug
and new work by myself to introduce generic addressing
for all devices.
Wolfgang's most recent posting was
http://www.redhat.com/archives/libvir-list/2009-November/msg00574.html
http://www.redhat.com/archives/libvir-list/2009-November/msg00701.html
When testing that series I came across a few minor issues,
but more importantly it made me realize how important it is
that we introduce explicit device addressing in our XML format.
Wolfgang's series had added new element for SCSI controllers,
with PCI address info about the controller
<controller type='ide' index='0'>
<address domain='0x0000' bus='0x00' slot='0x01'
function='0x1'/>
</controller>
And had also extended the <disk> element to include a SCSI
controller name, and bus/unit ID. eg
<disk>
...
<controller name="<identifier>"
pci_addr="<addr>" bus="<number>"
unit="<number>"/>
</disk>
I then remembered that for support NIC/VirtIO/hostdev disk unplug
Mark M had previously added PCI address information to the internal
XML state files for <interface>, <disk> and <hostdev> elements.
All of these places using PCI addresses suffered from the fact
that we only knew the addresses of devices we'd hotplug and had no
idea of addresses of devices present at boot time.
A further issue with the addition of <controller> to the <disk>
element, is that not all disk types have a concept of a controller.
For example, every VirtIO disk is in fact a full PCI device, so
having a <controller> element in <disk> is not meaningful for
VirtIO.
The solution that I believe solves all our problems, is to add a
generic <address> element to every single device. This address
element contains info about how the device is associated with the
logical parent device. There will be three types of address in
this series of patches, though we could imagine adding more later.
- PCI address - domain, bus, slot, function
- USB address - bus, device
- Drive address - controller, bus, unit
Anything that is a PCI device will obviously use PCI addresses.
- PCI: sound, audio, video, all virtio, watchdog, disk controllers
- USB: all usb devices
- Drive: SCSI, IDE & Floppy disks, but *not* VirtIO/Xen disks
Xen paravirt devices aren't really covered in this scheme. I
could imagine adding a fourth address type for Xen. This would in
fact let us handle driver domains - ie a backend outside dom0.
I won't deal with Xen in this series though.
The XML for each address type looks like
<address type='pci' mode='static' domain='0x0000'
bus='0x1e' slot='0x07' function='0x0'/>
<address type='usb' mode='dynamic' bus='007'
dev='003'/>
<address type='drive' mode='dynamic' controller='1'
bus='0' unit='5'/>
The 'mode' attribute for any of them is allowed to be either
'static' or 'dynamic'. A static address is one specified by
the end user when defining the XML, while a dynamic address is
one automatically chosen by libvirt/QEMU every time a guest
boots. The idea of static addresses is to allow management
apps to guarentee that PCI device & drive numbering never
changes. This series does not actually implement static
addressing for PCI yet, because it requires that we change
the way we generate QEMU command line arguments. It does
do static addressing for disks.
libvirt itself will auto-assign all drive addresses, and QEMU
will auto-assign PCI adresses in dynamic mode. When starting
a guest VM we run 'info pci' to get a list of PCI vendor/product
IDs and matching PCI addresses. We then attempt to match those
up with the devices we specified to QEMU. It sounds nasty, but
it actually works fairly well. This means we also now make it
possible to hotunplug any device, even those the VM was initially
booted with
There are two ways I can envisage mgmt apps using this address
functionality
- Boot a guest with no addresses specified, grab the XML and
change all 'dynamic' attrs to 'static' and then define the
persistent config with this. The addresses will then be
unchanged forever more
- Explicitly give a full list of addresses the very first time
a guest is created.
There is one small issue with all this, we need to know every
PCI device in the guest. It turns out there are a handful of
devices in QEMU we don't represent in XML yet
- Virtio balloon device
- Virtio console (this is easy to address with matt's patches)
- ISA bridge
- USB controller
- Some kind of PCI bridge (not clear what this is, it has
PCI ID 8086:7113
If an management application is to be able to fully control
static PCI addressing, we need to represent these somehow,
so apps can give them addresses. Technically we could get
away with not representing the ISA/PCI bridge since QEMU
always gives them the first PCI slot no matter what. Still
need the VirtIO & USB devices dealt with.
Finally, here is an example of a guest running with a huge
number of devices. Notice how we've auto-detected the PCI
address of every device, and every disk. In particular
notice how VirtiO disks got the PCI address, while SCSI
disks got the drive address.
<domain type='kvm' id='2'>
<name>plain</name>
<uuid>c7a1edbd-edaf-9455-926a-d65c16db1809</uuid>
<memory>219200</memory>
<currentMemory>219136</currentMemory>
<vcpu>1</vcpu>
<os>
<type arch='i686' machine='pc-0.11'>hvm</type>
<kernel>/home/berrange/vmlinuz-PAE</kernel>
<initrd>/home/berrange/initrd-PAE.img</initrd>
<boot dev='hd'/>
</os>
<features>
<acpi/>
</features>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
<emulator>/usr/bin/qemu</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/home/berrange/VirtualMachines/plain.qcow'/>
<target dev='vda' bus='virtio'/>
<address type='pci' mode='dynamic' domain='0x0000'
bus='0x00' slot='0x0b' function='0x0'/>
</disk>
<disk type='file' device='cdrom'>
<source file='/home/berrange/gpxe.iso'/>
<target dev='hdc' bus='ide'/>
<readonly/>
<address type='drive' mode='dynamic' controller='0'
bus='1' unit='0'/>
</disk>
<disk type='file' device='disk'>
<source file='/home/berrange/output.img'/>
<target dev='sda' bus='scsi'/>
<address type='drive' mode='dynamic' controller='0'
bus='0' unit='0'/>
</disk>
<disk type='file' device='disk'>
<source file='/home/berrange/output.img'/>
<target dev='sdd' bus='scsi'/>
<address type='drive' mode='dynamic' controller='0'
bus='0' unit='3'/>
</disk>
<disk type='file' device='disk'>
<source file='/home/berrange/output.img'/>
<target dev='sdf' bus='scsi'/>
<address type='drive' mode='dynamic' controller='0'
bus='0' unit='5'/>
</disk>
<controller type='scsi' index='0'>
<address type='pci' mode='dynamic' domain='0x0000'
bus='0x00' slot='0x0a' function='0x0'/>
</controller>
<controller type='ide' index='0'>
<address type='pci' mode='dynamic' domain='0x0000'
bus='0x00' slot='0x01' function='0x1'/>
</controller>
<controller type='fdc' index='0'/>
<interface type='user'>
<mac address='52:54:00:5b:ef:21'/>
<model type='ne2k_pci'/>
<address type='pci' mode='dynamic' domain='0x0000'
bus='0x00' slot='0x03' function='0x0'/>
</interface>
<interface type='user'>
<mac address='52:54:00:1c:dc:98'/>
<model type='virtio'/>
<address type='pci' mode='dynamic' domain='0x0000'
bus='0x00' slot='0x04' function='0x0'/>
</interface>
<interface type='user'>
<mac address='52:54:00:f7:c5:0e'/>
<model type='e1000'/>
<address type='pci' mode='dynamic' domain='0x0000'
bus='0x00' slot='0x05' function='0x0'/>
</interface>
<interface type='user'>
<mac address='52:54:00:56:6c:55'/>
<model type='pcnet'/>
<address type='pci' mode='dynamic' domain='0x0000'
bus='0x00' slot='0x06' function='0x0'/>
</interface>
<interface type='user'>
<mac address='52:54:00:ca:0d:58'/>
<model type='rtl8139'/>
<address type='pci' mode='dynamic' domain='0x0000'
bus='0x00' slot='0x07' function='0x0'/>
</interface>
<serial type='pty'>
<source path='/dev/pts/5'/>
<target port='0'/>
</serial>
<console type='pty' tty='/dev/pts/5'>
<source path='/dev/pts/5'/>
<target port='0'/>
</console>
<input type='mouse' bus='ps2'/>
<graphics type='vnc' port='5900' autoport='yes'/>
<sound model='ac97'>
<address type='pci' mode='dynamic' domain='0x0000'
bus='0x00' slot='0x08' function='0x0'/>
</sound>
<sound model='es1370'>
<address type='pci' mode='dynamic' domain='0x0000'
bus='0x00' slot='0x09' function='0x0'/>
</sound>
<video>
<model type='cirrus' vram='9216' heads='1'/>
<address type='pci' mode='dynamic' domain='0x0000'
bus='0x00' slot='0x02' function='0x0'/>
</video>
<watchdog model='i6300esb' action='reset'>
<address type='pci' mode='dynamic' domain='0x0000'
bus='0x00' slot='0x0c' function='0x0'/>
</watchdog>
</devices>
<seclabel type='dynamic' model='selinux'>
<label>system_u:system_r:svirt_t:s0:c181,c286</label>
<imagelabel>system_u:object_r:svirt_image_t:s0:c181,c286</imagelabel>
</seclabel>
</domain>
Daniel