Download of volume: alternating frames of 262120 and 24 bytes
Hi, While analysing traffic generated by "virsh vol-download", I noticed that the download stream exhibits the following pattern (over qemu+ssh://): <Start> Read 4 bytes (= frame length field) Read 262144 bytes (= header + payload ; VIR_NET_MESSAGE_HEADER_MAX (24) + VIR_NET_MESSAGE_LEGACY_PAYLOAD_MAX (262120)) Read 4 bytes (= frame length field) Read 48 bytes (= header + payload) <Go back to Start> While nothing seems to be actually wrong with this pattern, the average payload on large data transfers is only (262120+24)/2 == 131072 and the CPU makes double the iterations necessary. I'll be trying to figure out where this comes from, but I imagine that will be pretty hard to track down. Two questions: 1. Do you see the same thing? 2. Any ideas/hints where I could start looking? Regards, -- Bye, Erik. http://efficito.com -- Hosted accounting and ERP. Robust and Flexible. No vendor lock-in.
On Mon, Jan 19, 2026 at 07:43:42PM +0100, Erik Huelsmann wrote:
Hi,
While analysing traffic generated by "virsh vol-download", I noticed that the download stream exhibits the following pattern (over qemu+ssh://):
<Start> Read 4 bytes (= frame length field) Read 262144 bytes (= header + payload ; VIR_NET_MESSAGE_HEADER_MAX (24) + VIR_NET_MESSAGE_LEGACY_PAYLOAD_MAX (262120)) Read 4 bytes (= frame length field) Read 48 bytes (= header + payload) <Go back to Start>
While nothing seems to be actually wrong with this pattern, the average payload on large data transfers is only (262120+24)/2 == 131072 and the CPU makes double the iterations necessary. I'll be trying to figure out where this comes from, but I imagine that will be pretty hard to track down.
Two questions: 1. Do you see the same thing? 2. Any ideas/hints where I could start looking?
I'd suggest enabling RPC tracing with systemtap, or debug logs, to see what the messages being sent are. The 262144 byte message will be VIR_NET_STREAM with data payload. The 48 byte message is slightly surprising, but given you're using vol-download, I'm going to guess that the volume is sparse, and say this the 48 byte message is a VIR_NET_STREAM_HOLE message skipping over the transfer of a large area of zeros. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Hi Daniel, Thanks for the quick response!
While analysing traffic generated by "virsh vol-download", I noticed that the download stream exhibits the following pattern (over qemu+ssh://):
<Start> Read 4 bytes (= frame length field) Read 262144 bytes (= header + payload ; VIR_NET_MESSAGE_HEADER_MAX (24) + VIR_NET_MESSAGE_LEGACY_PAYLOAD_MAX (262120)) Read 4 bytes (= frame length field) Read 48 bytes (= header + payload) <Go back to Start>
While nothing seems to be actually wrong with this pattern, the
I meant to say: while nothing seems to be wrong *with the data* being transferred in this pattern...
average payload on large data transfers is only (262120+24)/2 == 131072 and the CPU makes double the iterations necessary. I'll be trying to figure out where this comes from, but I imagine that will be pretty hard to track down.
Two questions: 1. Do you see the same thing? 2. Any ideas/hints where I could start looking?
I'd suggest enabling RPC tracing with systemtap, or debug logs, to see what the messages being sent are.
The 262144 byte message will be VIR_NET_STREAM with data payload.
The 48 byte message is slightly surprising, but given you're using vol-download, I'm going to guess that the volume is sparse, and
I didn't specify the --sparse flag; not sure if there's anything turning it regardless, but I wouldn't expect any holes to be in the transfer.
say this the 48 byte message is a VIR_NET_STREAM_HOLE message skipping over the transfer of a large area of zeros.
Using Protocol::Sys::Virt, I wrote a protocol analyzer and using the "ext" transfer mechanism I'm able to capture the transfer streams. I'm seeing the pattern over qemu+ssh:// to a remote server, but I see the same pattern in the stream I recorded using the qemu+ext:///. This is what shows up: 4: 262144: ["header",{"proc":209,"prog":536903814,"serial":10,"status":2,"type":3,"vers":1},"final",false,"data","<data length: 262120>","hole",null] 4: 48: ["header",{"proc":209,"prog":536903814,"serial":10,"status":2,"type":3,"vers":1},"final",false,"data","<data length: 24>","hole",null] The "4: 48: " means 2 reads, one of 4 bytes and one of 48 bytes. The second read is a header with the fields given above and a data section of 24 bytes. The important part is the "type" field where the value 3 is VIR_NET_STREAM, just as the message right above it. From this, I conclude that both transfer data and neither transfers a hole. I was thinking that the first data frame might be 24 bytes short, causing the 24-byte-out-of-sync of all future frames, but that's not the case: the initial stream data frame is 262144; here's the rpc response and the first two stream frames: 4: 24: ["header",{"proc":209,"prog":536903814,"serial":10,"status":0,"type":1,"vers":1},"data",null,"error",null] 4: 262144: ["header",{"proc":209,"prog":536903814,"serial":10,"status":2,"type":3,"vers":1},"final",false,"data","<data length: 262120>","hole",null] 4: 48: ["header",{"proc":209,"prog":536903814,"serial":10,"status":2,"type":3,"vers":1},"final",false,"data","<data length: 24>","hole",null] If you want, I can share the tools and the protocol-decoded data stream (the gzipped decoded data stream is only ~10kB)? By the way, the "capture" tool I'm using is a simple script: #!/usr/bin/bash tee -p ./recording-upstream | nc -q0 -U /run/libvirt/libvirt-sock | tee -p ./recording-downstream I'll add the protocol decoder to the Protocol::Sys::Virt distribution in the examples directory and upload to CPAN. -- Bye, Erik. http://efficito.com -- Hosted accounting and ERP. Robust and Flexible. No vendor lock-in.
I'll add the protocol decoder to the Protocol::Sys::Virt distribution in the examples directory and upload to CPAN.
See Protocol::Sys::Virt v12.0.4 => https://metacpan.org/release/EHUELS/Protocol-Sys-Virt-v12.0.4 -- Bye, Erik. http://efficito.com -- Hosted accounting and ERP. Robust and Flexible. No vendor lock-in.
Hi,
While analysing traffic generated by "virsh vol-download", I noticed that the download stream exhibits the following pattern (over qemu+ssh://):
<Start> Read 4 bytes (= frame length field) Read 262144 bytes (= header + payload ; VIR_NET_MESSAGE_HEADER_MAX (24) + VIR_NET_MESSAGE_LEGACY_PAYLOAD_MAX (262120)) Read 4 bytes (= frame length field) Read 48 bytes (= header + payload) <Go back to Start>
While nothing seems to be actually wrong with this pattern, the average payload on large data transfers is only (262120+24)/2 == 131072 and the CPU makes double the iterations necessary. I'll be trying to figure out where this comes from, but I imagine that will be pretty hard to track down.
Two questions: 1. Do you see the same thing? 2. Any ideas/hints where I could start looking?
I'd suggest enabling RPC tracing with systemtap, or debug logs, to see what the messages being sent are.
The 262144 byte message will be VIR_NET_STREAM with data payload.
The 48 byte message is slightly surprising, but given you're using vol-download, I'm going to guess that the volume is sparse, and say this the 48 byte message is a VIR_NET_STREAM_HOLE message skipping over the transfer of a large area of zeros.
I think I found the problem by trying to figure out the various call paths in the code base: the root cause is a difference in buffer sizes and how the remote driver works with those. I found that the remote driver uses the virFDStream API from src/util/virfdstream.c to read local data. This API reads blocks of 256kB and sends those off into the (local) stream [1]. Then the remote driver finds that the input message to be transferred is larger than maximum data frame on the wire protocol (also 256kB, but *including* the header of 24 bytes), meaning it will cut the buffer into two parts: the maximum size of the data frame payload and "the rest". (By the way, if we had read 1MB instead of 256kB, it would have split this one input chunk into 5 chunks on the wire protocol.) The actual splitting is happening by daemonStreamHandleWrite() in remote_daemon_stream.c, because it's not considering there could be a next message in the queue which would be able to fill the remaining space in the data frame, but it's only handling the "remaining" data in the current message. I *do* understand the perspective of the virfdstream implementation to read chunks of 256kB (and not 256kB minus 24). I do understand that - from a simplicity perspective - it makes sense to push out one message from the transfer queue at a time, but is the intent that the average transfer then becomes 128kB minus 12 bytes? Regards, Erik. [1] https://gitlab.com/libvirt/libvirt/-/blob/master/src/util/virfdstream.c?ref_... -- Bye, Erik. http://efficito.com -- Hosted accounting and ERP. Robust and Flexible. No vendor lock-in.
On Mon, Jan 19, 2026 at 10:55:06PM +0100, Erik Huelsmann wrote:
Hi,
While analysing traffic generated by "virsh vol-download", I noticed that the download stream exhibits the following pattern (over qemu+ssh://):
<Start> Read 4 bytes (= frame length field) Read 262144 bytes (= header + payload ; VIR_NET_MESSAGE_HEADER_MAX (24) + VIR_NET_MESSAGE_LEGACY_PAYLOAD_MAX (262120)) Read 4 bytes (= frame length field) Read 48 bytes (= header + payload) <Go back to Start>
While nothing seems to be actually wrong with this pattern, the average payload on large data transfers is only (262120+24)/2 == 131072 and the CPU makes double the iterations necessary. I'll be trying to figure out where this comes from, but I imagine that will be pretty hard to track down.
Two questions: 1. Do you see the same thing? 2. Any ideas/hints where I could start looking?
I'd suggest enabling RPC tracing with systemtap, or debug logs, to see what the messages being sent are.
The 262144 byte message will be VIR_NET_STREAM with data payload.
The 48 byte message is slightly surprising, but given you're using vol-download, I'm going to guess that the volume is sparse, and say this the 48 byte message is a VIR_NET_STREAM_HOLE message skipping over the transfer of a large area of zeros.
I think I found the problem by trying to figure out the various call paths in the code base: the root cause is a difference in buffer sizes and how the remote driver works with those.
I found that the remote driver uses the virFDStream API from src/util/virfdstream.c to read local data. This API reads blocks of 256kB and sends those off into the (local) stream [1]. Then the remote driver finds that the input message to be transferred is larger than maximum data frame on the wire protocol (also 256kB, but *including* the header of 24 bytes), meaning it will cut the buffer into two parts: the maximum size of the data frame payload and "the rest". (By the way, if we had read 1MB instead of 256kB, it would have split this one input chunk into 5 chunks on the wire protocol.)
The actual splitting is happening by daemonStreamHandleWrite() in remote_daemon_stream.c, because it's not considering there could be a next message in the queue which would be able to fill the remaining space in the data frame, but it's only handling the "remaining" data in the current message.
I *do* understand the perspective of the virfdstream implementation to read chunks of 256kB (and not 256kB minus 24). I do understand that - from a simplicity perspective - it makes sense to push out one message from the transfer queue at a time, but is the intent that the average transfer then becomes 128kB minus 12 bytes?
Ok, this is an implementation flaw in virfdstream. There are two scenarios (1) reading from a local file - fdstream spawns a background helper thread; (1) reading from a non-file - fdstream reads directly from the FD. When reading directly from the FD, the buffer passed to virFDStreamRead will be filled with as much data as is available. When reading via the helper thread, the buffer passed to virFDStreamRead will only be filled with the data from the 1st pending message from the background thread. We need to fix the latter so that it will fill the buffer with data from *all* pending messages from the thread, until the buffer is full. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
I *do* understand the perspective of the virfdstream implementation to read chunks of 256kB (and not 256kB minus 24). I do understand that - from a simplicity perspective - it makes sense to push out one message from the transfer queue at a time, but is the intent that the average transfer then becomes 128kB minus 12 bytes?
Ok, this is an implementation flaw in virfdstream.
There are two scenarios (1) reading from a local file - fdstream spawns a background helper thread; (1) reading from a non-file - fdstream reads directly from the FD.
When reading directly from the FD, the buffer passed to virFDStreamRead will be filled with as much data as is available.
When reading via the helper thread, the buffer passed to virFDStreamRead will only be filled with the data from the 1st pending message from the background thread.
I've studied the code and I see what you are saying indeed. As soon as the data in that message runs out the function prepares to return and just before it returns, pops the message from the transfer queue. It could instead go back to the beginning (nearly) to grab the next message and repeat itself. I guess that's not too hard to add.
We need to fix the latter so that it will fill the buffer with data from *all* pending messages from the thread, until the buffer is full.
Ok. I think I understand what needs to happen here. Do you mean me by "we" or were you looking at someone who's part of the "regulars"? If it's me, I *think* I can do this, but I'll need a bit of help to confirm my understanding that *any* "hole" this function encounters, is an unexpected message; I mean: should this function - after it's been changed - take into account that a "STREAM_HOLE" message may come in and that it needs to stop processing data, but it should *not* raise an error (as it would do now, because STREAM_HOLE messages are not expected right now). Kind regards, -- Bye, Erik. http://efficito.com -- Hosted accounting and ERP. Robust and Flexible. No vendor lock-in.
On Tue, Jan 20, 2026 at 11:34:23PM +0100, Erik Huelsmann wrote:
I *do* understand the perspective of the virfdstream implementation to read chunks of 256kB (and not 256kB minus 24). I do understand that - from a simplicity perspective - it makes sense to push out one message from the transfer queue at a time, but is the intent that the average transfer then becomes 128kB minus 12 bytes?
Ok, this is an implementation flaw in virfdstream.
There are two scenarios (1) reading from a local file - fdstream spawns a background helper thread; (1) reading from a non-file - fdstream reads directly from the FD.
When reading directly from the FD, the buffer passed to virFDStreamRead will be filled with as much data as is available.
When reading via the helper thread, the buffer passed to virFDStreamRead will only be filled with the data from the 1st pending message from the background thread.
I've studied the code and I see what you are saying indeed. As soon as the data in that message runs out the function prepares to return and just before it returns, pops the message from the transfer queue. It could instead go back to the beginning (nearly) to grab the next message and repeat itself. I guess that's not too hard to add.
We need to fix the latter so that it will fill the buffer with data from *all* pending messages from the thread, until the buffer is full.
Ok. I think I understand what needs to happen here. Do you mean me by "we" or were you looking at someone who's part of the "regulars"?
Anyone - if you have time that's great, otherwise just file a bug.
If it's me, I *think* I can do this, but I'll need a bit of help to confirm my understanding that *any* "hole" this function encounters, is an unexpected message; I mean: should this function - after it's been changed - take into account that a "STREAM_HOLE" message may come in and that it needs to stop processing data, but it should *not* raise an error (as it would do now, because STREAM_HOLE messages are not expected right now).
IIUC, for the *first* message on the queue, we should maintain the current logic /* Shortcut, if the stream is in the trailing hole, * return 0 immediately. */ if (msg->type == VIR_FDSTREAM_MSG_TYPE_HOLE && msg->stream.hole.len == 0) { ret = 0; goto cleanup; } If we've already copied some data out of the *first* message, and thus are looking at a subsequent message, we should simply return what data we have already copied if we see a MSG_TYPE_HOLE on the queue pending next. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (2)
-
Daniel P. Berrangé -
Erik Huelsmann