Am 19.11.2013 11:36, schrieb Laine Stump:
On 11/15/2013 03:35 PM, Thomas Kuther wrote:
> Hello,
>
> I'm trying to migrate a working qemu command line configuration to
> libvirt.
> The part I'm currently failing on is:
>
> $ qemu-system-x86_64 -M Q35 ... -device
> vfio-pci,host=05:00.0,bus=pcie.0
>
> The right way to translate this into libvirt XML seems to be using
> <hostdev>, but I seem to be unable to plug it into the pcie-root port
>
> This is how the interesting part looks like when I let "virsh edit"
> generate an <address>
>
> <controller type='pci' index='0'
model='pcie-root'/>
> <controller type='pci' index='1'
model='dmi-to-pci-bridge'>
> <address type='pci' domain='0x0000' bus='0x00'
slot='0x02'
> function='0x0'/>
> </controller>
> <controller type='pci' index='2'
model='pci-bridge'>
> <address type='pci' domain='0x0000' bus='0x01'
slot='0x01'
> function='0x0'/>
> </controller>
> [...]
> <hostdev mode='subsystem' type='pci'
managed='yes'>
> <driver name='vfio'/>
> <source>
> <address domain='0x0000' bus='0x03' slot='0x00'
> function='0x0'/>
> </source>
> <address type='pci' domain='0x0000' bus='0x02'
slot='0x06'
> function='0x0'/>
> </hostdev>
> [...]
>
> To my understanding, this will plug the host device into the
> pci-bridge controller.
> The guest OS doesn't boot with this and resets right after bios.
Ugh. That's very unfortunate. This is the first report I've heard of
something failing in such a bad way due to being plugged into a
pci-bridge slot; up until now I'd only heard that there is some extra
PCIe functionality that would be missing if a device was plugged into a
PCI slot vs. PCIe.
Can I ask what type of device this is?
It's a Marvell 88SE9172 SATA controller, here is the lspci -vvv
03:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9172 SATA
6Gb/s Controller (rev 11) (prog-if 01 [AHCI 1.0])
Subsystem: Gigabyte Technology Co., Ltd Device b000
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 47
Region 0: I/O ports at d040 [disabled] [size=8]
Region 1: I/O ports at d030 [disabled] [size=4]
Region 2: I/O ports at d020 [disabled] [size=8]
Region 3: I/O ports at d010 [disabled] [size=4]
Region 4: I/O ports at d000 [disabled] [size=16]
Region 5: Memory at f7610000 (32-bit, non-prefetchable)
[disabled] [size=512]
Expansion ROM at f7600000 [disabled by cmd] [size=64K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot+,D3cold-)
Status: D3 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
Address: 00000000 Data: 0000
Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s
<1us, L1 <8us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr-
TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1,
Latency L0 <512ns, L1 <64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain-
CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+
DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+,
LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-,
LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance-
SpeedDis-
Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB,
EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt-
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout-
NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout-
NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap-
ChkEn-
Kernel driver in use: vfio-pci
The second one I'm trying to pass through is a Renesas uPD720201 USB 3.0
Host Controller, but first I wanted to get the SATA controller working
in libvirt. I will try to leave out the SATA controller and see what
happens with only the USB3 controller.
>
> Manually setting
> <address type='pci' domain='0x0000' bus='0x00'
slot='0x1E'
> function='0x0'/>
> cause XML validation failure.
>
> Is there any way in libvirt XML to plug a host's PCI-E device directly
> into the pcie-root port, like it works on qemu command line?
I'm sorry to say, no. With very few (and specific) exceptions, libvirt
insists that all guest devices be plugged into a hot-pluggable PCI slot
- this eliminates both the PCIe "root complex" (a.k.a. pcie.0) as well
as the dmi-to-pci-controller that is plugged into pcie.0 (because
pci-to-dmi controllers' slots don't support hot-plug).
This is done because, for now, almost all devices that qemu knows about
are PCI (no PCI-e) devices, and if we allowed plugging them into pcie.0
now, then on the day in the future when qemu begins enforcing the
difference between PCI and PCIe (currently it doesn't), the world would
be full of libvirt configs that would no longer work.
There was some discussion about this a month or two ago either on
libvir-list or maybe it was the qemu-devel list. We decided that qemu
needs to provide some sort of introspection of the devices' connection
types so that libvirt can determine what device can plug into which
slots; at that time we'll be able to allow exactly what's proper in
each
case. In the meantime we're stuck with being overly cautious in order
to
prevent future catastrophe.
Understood, thanks for the explanation.
>
> I'm aware I could use something like
>
> <qemu:commandline>
> <qemu:arg value='-device'/>
> <qemu:arg value='vfio-pci,host=05:00.0,bus=pcie.0'/>
> </qemu:commandline>
>
> but I insist on running the VM as non-root, and if I got that right I
> need to configure at least one vfio device (or memory locking) in
> order for libvirt to set a proper RLIMIT_MEMLOCK value.
>
> Any help would be be appreciated.
For now at least, you'll need to let it plug into the pci-bridge device
pci.2 (which, as you've found, libvirt will automatically find when you
don't specify any address). Unfortunately that doesn't do you much
good,
since that particular device you're assigning actually requires that it
be plugged into the PCIe bus.
I'm wondering as I type if possibly we could relax the enforcement of
the "PCI only" rule such that we allow explicitly placing any device on
any type of bus, but only auto-assign to a plain PCI slot. That may be
a
reasonable compromise until qemu has the required new device/controller
introspection info available.
I like the idea.
Regards,
Thomas