[libvirt] Mystery of qemu being unable to open NBD Unix domain socket in current directory

I'm chasing down a very frustrating bug which only happens on i686 & Koji during the nbdkit tests and seemingly nowhere else. Anyway this is what I've been able to put together: The libguestfs appliance (guest) is created with this XML snippet: <disk device="disk" type="network"> <source protocol="nbd"> <host transport="unix" socket="cow.sock"/> </source> <target dev="sda" bus="scsi"/> <driver name="qemu" type="raw" cache="writeback"/> <address type="drive" controller="0" bus="0" target="0" unit="0"/> </disk> It fails to start with the error: internal error: process exited while connecting to monitor: 2018-06-06T17:02:54.450507Z qemu-system-i386: -drive file=nbd+unix://?socket=cow.sock,format=raw,if=none,id=drive-scsi0-0-0-0,cache=writeback: Failed to connect socket cow.sock: No such file or directory [code=1 int1=-1] The socket definitely exists in the directory of the program running libvirt. I verified that by adding ls -l commands to the build: srwxr-xr-x. 1 mockbuild mockbuild 0 Jun 6 17:02 cow.sock Permissions on the containing directories are fine (the whole lot is running as mockbuild): + pwd /builddir/build/BUILD/nbdkit-1.3.1/tests + ls -ld /builddir drwx------. 5 mockbuild mockbuild 4096 Jun 6 17:00 /builddir + ls -ld /builddir/build drwxr-xr-x. 9 mockbuild mockbuild 4096 Jun 6 16:57 /builddir/build + ls -ld /builddir/build/BUILD drwxrwxr-x. 3 mockbuild 1000 4096 Jun 6 16:59 /builddir/build/BUILD + ls -ld /builddir/build/BUILD/nbdkit-1.3.1 drwxr-xr-x. 11 mockbuild mockbuild 4096 Jun 6 17:00 /builddir/build/BUILD/nbdkit-1.3.1 + ls -ld /builddir/build/BUILD/nbdkit-1.3.1/tests drwxr-xr-x. 4 mockbuild mockbuild 4096 Jun 6 17:02 /builddir/build/BUILD/nbdkit-1.3.1/tests The qemu log doesn't give any clues: 2018-06-06 17:02:54.412+0000: starting up libvirt version: 4.4.0, package: 1.fc29 (Fedora Project, 2018-06-05-11:53:38, buildvm-12.phx2.fedoraproject.org), qemu version: 2.12.0qemu-2.12.0-2.fc29, kernel: 4.16.8-300.fc28.x86_64, hostname: buildhw-10.phx2.fedoraproject.org LC_ALL=C PATH=/builddir/build/BUILD/nbdkit-1.3.1:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/sbin HOME=/builddir USER=mockbuild LOGNAME=mockbuild QEMU_AUDIO_DRV=none TMPDIR=/var/tmp /usr/bin/qemu-system-i386 -name guest=guestfs-62ak9dzrjxfoctua,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/builddir/.config/libvirt/qemu/lib/domain-3-guestfs-62ak9dzrjxfo/master-key.aes -machine pc-i440fx-2.12,accel=tcg,usb=off,dump-guest-core=off -m 500 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 6094da49-2def-4256-b901-e4fe5f1409c8 -display none -no-user-config -nodefaults -device sga -chardev socket,id=charmonitor,path=/builddir/.config/libvirt/qemu/lib/domain-3-guestfs-62ak9dzrjxfo/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -no-acpi -boot strict=on -kernel /var/tmp/.guestfs-1000/appliance.d/kernel -initrd /var/tmp/.guestfs-1000/appliance.d/initrd -append 'panic=1 noapic console=ttyS0 edd=off udevtimeout=6000 udev.event-timeout=6000 no_timer_check printk.time=1 cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable 8250.nr_uarts=1 root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=vt100' -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive 'file=nbd+unix://?socket=cow.sock,format=raw,if=none,id=drive-scsi0-0-0-0,cache=writeback' -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on -drive file=/tmp/libguestfsprC4lc/overlay1.qcow2,format=qcow2,if=none,id=drive-scsi0-0-1-0,cache=unsafe -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=1,lun=0,drive=drive-scsi0-0-1-0,id=scsi0-0-1-0,write-cache=on -chardev socket,id=charserial0,path=/tmp/libguestfsj35K6W/console.sock -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/tmp/libguestfsj35K6W/guestfsd.sock -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.libguestfs.channel.0 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x4 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on 2018-06-06 17:02:54.412+0000: Domain id=3 is tainted: custom-argv 2018-06-06T17:02:54.450507Z qemu-system-i386: -drive file=nbd+unix://?socket=cow.sock,format=raw,if=none,id=drive-scsi0-0-0-0,cache=writeback: Failed to connect socket cow.sock: No such file or directory 2018-06-06 17:02:54.453+0000: shutting down, reason=failed At this point I don't really have any ideas. Does libvirt now run qemu in a different directory? Does the error message mean something else apart from the file not existing? Also the nbd+unix syntax doesn't appear to be documented in qemu. Is this a new thing? Normally we use nbd:unix:... Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v

On 06/06/2018 12:29 PM, Richard W.M. Jones wrote:
I'm chasing down a very frustrating bug which only happens on i686 & Koji during the nbdkit tests and seemingly nowhere else. Anyway this is what I've been able to put together:
The libguestfs appliance (guest) is created with this XML snippet:
<disk device="disk" type="network"> <source protocol="nbd"> <host transport="unix" socket="cow.sock"/>
Have you tried an absolute path?
At this point I don't really have any ideas. Does libvirt now run qemu in a different directory? Does the error message mean something else apart from the file not existing?
Libvirtd does chdir("/") early on, so likely qemu is run with / (rather than the original current working directory) as its root. So an absolute path should fix things.
2018-06-06T17:02:54.450507Z qemu-system-i386: -drive file=nbd+unix://?socket=cow.sock,format=raw,if=none,id=drive-scsi0-0-0-0,cache=writeback: Failed to connect socket cow.sock: No such file or directory
Also the nbd+unix syntax doesn't appear to be documented in qemu. Is this a new thing? Normally we use nbd:unix:...
nbd:unix:... is the old-style legacy form, nbd+unix:// is the URI style. The URI style is a bit more flexible (you can add options without having to add more ad-hoc parsing). Ideally, we'll be moving libvirt to an even newer -blockdev style (which can either directly use the JSON you'd hand to QMP blockdev-add, or which uses a dotted syntax similar to -drive); that's been an ongoing task for Peter. As for lack of documentation in qemu, I'm not surprised. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org

On Wed, Jun 06, 2018 at 01:25:27PM -0500, Eric Blake wrote:
On 06/06/2018 12:29 PM, Richard W.M. Jones wrote:
I'm chasing down a very frustrating bug which only happens on i686 & Koji during the nbdkit tests and seemingly nowhere else. Anyway this is what I've been able to put together:
The libguestfs appliance (guest) is created with this XML snippet:
<disk device="disk" type="network"> <source protocol="nbd"> <host transport="unix" socket="cow.sock"/>
Have you tried an absolute path?
Yes it works with an absolute path. Isn't this a bug in libvirt? Also it works on x86_64 and in earlier versions of libvirt. I'll post my nbdkit workaround soon. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html

I don't know whether or not we decided this was a bug, but I have filed one anyway: https://bugzilla.redhat.com/show_bug.cgi?id=1588447 Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top

On Thu, Jun 07, 2018 at 12:20:16 +0100, Richard W.M. Jones wrote:
I don't know whether or not we decided this was a bug, but I have filed one anyway:
Your XML does not conform to the libvirt domain XML schema which always called for an absolute path as we do with all paths in the domain XML: <group> <attribute name="transport"> <value>unix</value> </attribute> <attribute name="socket"> <ref name="absFilePath"/> </attribute> </group> We just did not validate this particular code path in the parser itself (when the domain schema validation is not used).

On Thu, Jun 07, 2018 at 01:38:47PM +0200, Peter Krempa wrote:
On Thu, Jun 07, 2018 at 12:20:16 +0100, Richard W.M. Jones wrote:
I don't know whether or not we decided this was a bug, but I have filed one anyway:
Your XML does not conform to the libvirt domain XML schema which always called for an absolute path as we do with all paths in the domain XML:
<group> <attribute name="transport"> <value>unix</value> </attribute> <attribute name="socket"> <ref name="absFilePath"/> </attribute> </group>
We just did not validate this particular code path in the parser itself (when the domain schema validation is not used).
Sure. Libvirt does need to validate the paths because it's possible that the *wrong* path will be opened if a relative path is used, eg. if a socket with the same name happens to exist in "/" (or whatever is the working directory that libvirt changes to). Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
participants (3)
-
Eric Blake
-
Peter Krempa
-
Richard W.M. Jones