On Fri, Sep 01, 2023 at 03:24:11PM -0500, Jonathon Jongsma wrote:
On 8/16/23 4:19 PM, Jonathon Jongsma wrote:
>On 8/8/23 6:00 AM, Stefano Garzarella wrote:
>>On Mon, Aug 07, 2023 at 03:41:21PM +0200, Peter Krempa wrote:
>>>On Thu, Aug 03, 2023 at 09:48:01 +0200, Stefano Garzarella wrote:
>>>>On Wed, Aug 2, 2023 at 10:33 PM Jonathon Jongsma
>>>><jjongsma(a)redhat.com> wrote:
>>>>> On 7/24/23 8:05 AM, Peter Krempa wrote:
>>>>
>>>>[...]
>>>>
>>>>> >
>>>>> > I've also noticed that using 'qcow2' format for the
device
>>>>doesn't work:
>>>>> >
>>>>> > error: internal error: process exited while connecting to
>>>>monitor: 2023-07-24T12:54:15.818631Z qemu-system-x86_64:
>>>>-blockdev
{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage"}:
>>>>Could not read qcow2 header: Invalid argument
>>>>> >
>>>>> > If that is supposed to work, then qemu devs will probably
>>>>need to know
>>>>> > about that, if that is not supposed to work, libvirt needs to
add a
>>>>> > check, because the error doesn't tell much. It's also
possible I've
>>>>> > messed up when formatting the image though, as didn't really
try to
>>>>> > figure out what's happening.
>>>>> >
>>>>>
>>>>>
>>>>> That's a good question, and I don't actually know the
>>>>answer. Were you
>>>>> using an actual vdpa block device for your tests or were you
>>>>using the
>>>>> vdpa block simulator kernel module? How did you set it up? Adding
>>>>> Stefano to cc for his thoughts.
>>>>
>>>>Yep, I would also like to understand how you initialized the device
>>>>with a qcow2 format.
>>>
>>>Naively and originally I've simply used it as 'raw' at first and
>>>formatted it from the guest OS. Then I've shut-down the VM and started
>>>it back reconfiguring the image format as qcow2. This normally works
>>>with real-file backed storage, and since the vdpa simulator seems to
>>>persist the contents I supposed this would work.
>>
>>Cool, I'll try that.
>>Can you try to reboot the VM, use it as `raw`, and read the qcow2 in the
>>vm from the guest OS?
>>
>>Note: there could be some bugs in the simulator!
>>
>>>
>>>>Theoretically, the best use case for vDPA block is that the backend
>>>>handles formats, for QEMU it should just be a virtio device, but being
>>>>a blockdev, we should be able to use formats anyway, so it should
>>>>work.
>>>
>>>Yeah, ideally there will be no format driver in qemu used for these
>>>devices (this is not yet the case, I'll need to fix libvirt to stop
>>>using the 'raw' driver if not needed).
>>>
>>>Here I'm more interested whether it is supposed to work, in which case
>>>we want to allow using qcow2 as a format in libvirt, or it's not
>>>supposed to work and we should forbid it before the user gets a
>>>suboptimal error message such as now.
>>
>>This is a good question. We certainly haven't tested it, because it's an
>>uncommon scenario, but as I said before, maybe it should work. I need to
>>check it better.
>>
>>>
>>>>
>>>>For now, waiting for real hardware, the only way to test vDPA block
>>>>support in QEMU is to use the simulator in the kernel or VDUSE.
>>>>
>>>>With the kernel simulator we only have a 128 MB ramdisk available,
>>>>with VDUSE you can use QSD with any file:
>>>>
>>>>$ modprobe -a vhost_vdpa vduse
>>>>$ qemu-storage-daemon \
>>>> --blockdev
file,filename=/path/to/image.qcow2,cache.direct=on,aio=native,node-name=file
>>>>\
>>>> --blockdev qcow2,file=file,node-name=qcow2 \
>>>> --export
vduse-blk,id=vduse0,name=vduse0,num-queues=1,node-name=qcow2,writable=on
>>>>
>>>>$ vdpa dev add name vduse0 mgmtdev vduse
>>>>
>>>>Then you have a /dev/vhost-vdpa-X device that you can use with the
>>>>`virtio-blk-vhost-vdpa` blockdev (note: vduse requires QEMU with a
>>>>memory-backed with `share=on`), but using raw since the qcow2 is
>>>>handled by QSD.
>>>>Of course, we should be able to use raw file with QSD and qcow2 on
>>>>qemu (although it's not the optimal configuration), but I don't
know
>>>>how to initialize a `virtio-blk-vhost-vdpa` blockdev with a qcow2
>>>>image :-(
>>>
>>>With the above qemu storage daemon you should be able to do that by
>>>simply dropping the qcow2 format driver and simply exposing a qcow2
>>>formatted image. It similarly works with NBD:
>>>
>>>I've formatted 2 qcow2 images:
>>>
>>># qemu-img create -f qcow2 /root/image1.qcow2 100M
>>># qemu-img create -f qcow2 /root/image2.qcow2 100M
>>>
>>>And then exported them both via vduse and nbd without interpreting
>>>qcow2, thus making the QSD into just a dumb storage device:
>>>
>>># qemu-storage-daemon \
>>> --blockdev
file,filename=/root/image1.qcow2,cache.direct=on,aio=native,node-name=file1
>>>\
>>> --export
vduse-blk,id=vduse0,name=vduse0,num-queues=1,node-name=file1,writable=on
>>>\
>>> --blockdev
file,filename=/root/image2.qcow2,cache.direct=on,aio=native,node-name=file2
>>>\
>>> --nbd-server addr.type=unix,addr.path=/tmp/nbd.sock \
>>> --export nbd,id=nbd0,node-name=file2,writable=on,name=exportname
>>
>>Cool! Thanks for sharing!
>>
>>>
>>>Now when I start a VM using the NBD export in qcow2 format:
>>>
>>> <disk type='network' device='disk'>
>>> <driver name='qemu' type='qcow2'/>
>>> <source protocol='nbd' name='exportname'>
>>> <host transport='unix' socket='/tmp/nbd.sock'/>
>>> </source>
>>> <target dev='vda' bus='virtio'/>
>>> <address type='pci' domain='0x0000' bus='0x00'
slot='0x02'
>>>function='0x0'/>
>>> </disk>
>>>
>>>The VM starts fine, but when using:
>>>
>>> <disk type='vhostvdpa' device='disk'>
>>> <driver name='qemu' type='qcow2'
cache='none'/>
>>> <source dev='/dev/vhost-vdpa-0'/>
>>> <target dev='vda' bus='virtio'/>
>>> <address type='pci' domain='0x0000' bus='0x00'
slot='0x02'
>>>function='0x0'/>
>>> </disk>
>>>
>>>I get:
>>>
>>>error: internal error: QEMU unexpectedly closed the monitor
>>>(vm='vdpa'): 2023-08-07T12:34:21.628520Z qemu-system-x86_64:
>>>-blockdev
{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage"}:
>>>Could not read qcow2 header: Invalid argument
>>
>>mmm, I just tried this scenario using QEMU directly and it worked.
>>
>>These are the steps I did (qemu upstream,
>>commit 9400601a689a128c25fa9c21e932562e0eeb7a26):
>> ./build/storage-daemon/qemu-storage-daemon \
>> --blockdev
>>file,filename=test.qcow2,cache.direct=on,aio=native,node-name=file
>>\
>> --export
vduse-blk,id=vduse0,name=vduse0,num-queues=1,node-name=file,writable=on
>>
>> vdpa dev add name vduse0 mgmtdev vduse
>>
>> /build/qemu-system-x86_64 -m 512M -smp 2 \
>> -M q35,accel=kvm,memory-backend=mem \
>> -drive file=f38-vm-build.qcow2,format=qcow2,if=none,id=hd0 \
>> -device virtio-blk-pci,drive=hd0,bootindex=1 \
>> -blockdev
node-name=drive_src1,driver=virtio-blk-vhost-vdpa,path=/dev/vhost-vdpa-0,cache.direct=on
>>\
>> -blockdev qcow2,node-name=qcow2,file=drive_src1 \
>> -device virtio-blk-pci,id=src1,bootindex=2,drive=qcow2 \
>> -object
memory-backend-file,share=on,id=mem,size=512M,mem-path="/dev/hugepages"
>>
>>Then I'm able to see /dev/vdb and /dev/vdb1.
>>(test.qcow2 has a fs on the first partition)
>>
>>I mounted vdb1 and did I did md5sum on a file.
>>
>>Then I turned off the machine, moved the `-blockdev qcow2...` from qemu
>>to QSD, and I did the same steps and checked that md5 is the same.
>>
>>So it seems to work, but maybe we have something different.
>>My kernel host is: 6.4.7-200.fc38.x86_64
>>
>>Thanks,
>>Stefano
>
>
>By the way, I get the same "Could not read qcow2 header" error that
>Peter reported when I use this direct qemu commandline. My laptop is
>a little bit behind so I'm still on fedora 37.
>
>Jonathon
I recently upgraded my laptop to fedora 38 and retested this. I used
the above procedure and used qemu-storage-daemon to export a vdpa
block device for a qcow2 disk image and it was readable inside the
guest launched from my libvirt branch. So it looks like the "Could not
read qcow2 header" error has been fixed upstream already.
Thanks for the update!
Yep, vDPA is under continuous development, so it's likely there was some
problem in QEMU or Linux that we fixed in a new release.
So I don't think there's anything else blocking this libvirt
implementation.
Yep, I agree!
I'll post an updated patch soon.
Cool, thanks!
Stefano