
On Wed, Nov 25, 2020 at 2:47 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Wed, Nov 25, 2020 at 02:33:44PM +0100, Christian Ehrhardt wrote:
On Wed, Nov 25, 2020 at 1:38 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Wed, Nov 25, 2020 at 01:28:09PM +0100, Christian Ehrhardt wrote:
On Wed, Nov 25, 2020 at 10:55 AM Christian Ehrhardt <christian.ehrhardt@canonical.com> wrote:
On Tue, Nov 24, 2020 at 4:30 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Tue, Nov 24, 2020 at 16:05:53 +0100, Christian Ehrhardt wrote: > Hi,
[...]
BTW to reduce the scope what to think about - I have rebuilt 6.8 as well it works. Thereby I can confirm that the offending change should be in between 6.8.0 -> 6.9.0.
I was able to get this working in git bisect builds from git between v6.8 / v6.9. I identified the following offending commit: 7d959c30 rpc: Fix virt-ssh-helper detection
Ok that makes a bit of sense, first we had in 6.8 f8ec7c84 rpc: use new virt-ssh-helper binary for remote tunnelling That makes it related to tunneling which matches our broken use-case.
The identified commit "7d959c30 rpc: Fix virt-ssh-helper detection" might finally really enable the new helper and that is then broken?
With that knowledge I was able to confirm that it really is the native mode
$ virsh migrate --unsafe --live --p2p --tunnelled h-migr-test qemu+ssh://testkvm-hirsute-to/system?proxy=netcat <works> $ virsh migrate --unsafe --live --p2p --tunnelled h-migr-test qemu+ssh://testkvm-hirsute-to/system?proxy=native <hangs>
I recently discussed with Andrea if we'd need apparmor rules for virt-ssh-helper, but there are no denials nor libvirt log entries related to virt-ssh-helper. But we don't need such rules since it is spawned on the ssh login and not under libvirtd itself.
PS output of the hanging receiving virt-ssh-helper (looks not too unhappy): Source: 4 0 41305 1 20 0 1627796 23360 poll_s Ssl ? 0:05 /usr/sbin/libvirtd 0 0 41523 41305 20 0 9272 4984 poll_s S ? 0:02 \_ ssh -T -e none -- testkvm-hirsute-to sh -c 'virt-ssh-helper 'qemu:///system'' Target 4 0 213 1 20 0 13276 4132 poll_s Ss ? 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 250-500 startups 4 0 35148 213 20 0 19048 11320 poll_s Ss ? 0:02 \_ sshd: root@notty 4 0 35206 35148 20 0 2584 544 do_wai Ss ? 0:00 \_ sh -c virt-ssh-helper qemu:///system 0 0 35207 35206 20 0 81348 26684 - R ? 0:34 \_ virt-ssh-helper qemu:///system
I've looked at it with strace [1] and gdb for backtraces [2] - it is not dead or stuck and keeps working. Could it be just so slow that it appears to hang until it times out? Or is the event mechanism having issues and it wakes up too rarely?
Lets take migration out of the picture. What if you simply do
virsh -c qemu+ssh://testkvm-hirsute-to/system?proxy=native list
does that work ?
Yes it does, no hang and proper results
Ok, so that shows virt-ssh-helper is not completely broken at least.
Makes me think there is possibly something related to streams code that causes the issue.
You might try the virsh "console" or "vol-upload" commands to test the streams stuff in isolation. If that also works, then the problem is specific to migration,
Thanks for the hint Daniel, it is indeed not migration specific - it seems that virs-ssh-helper is just very slow. rm testfile; virsh -c qemu+ssh://testkvm-hirsute-to/system?proxy=netcat vol-download --pool uvtool h-migr-test.qcow testfile & for i in $(seq 1 20); do sleep 1s; ll -laFh testfile; done [1] 42285 -rw-r--r-- 1 root root 24M Nov 25 15:20 testfile -rw-r--r-- 1 root root 220M Nov 25 15:20 testfile -rw-r--r-- 1 root root 396M Nov 25 15:20 testfile -rw-r--r-- 1 root root 558M Nov 25 15:20 testfile -rw-r--r-- 1 root root 756M Nov 25 15:20 testfile -rw-r--r-- 1 root root 868M Nov 25 15:20 testfile [1]+ Done virsh -c qemu+ssh://testkvm-hirsute-to/system?proxy=netcat vol-download --pool uvtool h-migr-test.qcow testfile rm testfile; virsh -c qemu+ssh://testkvm-hirsute-to/system?proxy=native vol-download --pool uvtool h-migr-test.qcow testfile & for i in $(seq 1 20); do sleep 1s; ll -laFh testfile; done [1] 42307 -rw-r--r-- 1 root root 1.8M Nov 25 15:21 testfile -rw-r--r-- 1 root root 6.8M Nov 25 15:21 testfile -rw-r--r-- 1 root root 9.8M Nov 25 15:21 testfile -rw-r--r-- 1 root root 13M Nov 25 15:21 testfile -rw-r--r-- 1 root root 15M Nov 25 15:21 testfile -rw-r--r-- 1 root root 16M Nov 25 15:21 testfile -rw-r--r-- 1 root root 18M Nov 25 15:21 testfile -rw-r--r-- 1 root root 19M Nov 25 15:22 testfile -rw-r--r-- 1 root root 21M Nov 25 15:22 testfile -rw-r--r-- 1 root root 22M Nov 25 15:22 testfile -rw-r--r-- 1 root root 23M Nov 25 15:22 testfile -rw-r--r-- 1 root root 25M Nov 25 15:22 testfile -rw-r--r-- 1 root root 26M Nov 25 15:22 testfile -rw-r--r-- 1 root root 27M Nov 25 15:22 testfile -rw-r--r-- 1 root root 28M Nov 25 15:22 testfile -rw-r--r-- 1 root root 29M Nov 25 15:22 testfile -rw-r--r-- 1 root root 30M Nov 25 15:22 testfile -rw-r--r-- 1 root root 31M Nov 25 15:22 testfile -rw-r--r-- 1 root root 32M Nov 25 15:22 testfile -rw-r--r-- 1 root root 32M Nov 25 15:22 testfile That is ~150-200 MB/s vs 1-5 MB/s and as seen it seems to start slow AND degrades further. I'm not at 90MB overall and down to ~150 KB/s
and we'll probably wnt to colllect debug level logs from src+dst hosts.
I already had debug level logs of the migration [1] attached to the launchpad bug I use to take my notes on this. Taken with these configs: log_filters="1:qemu 1:libvirt 3:object 3:json 3:event 1:util" log_outputs="1:file:/var/log/libvirtd.log" You can fetch the logs (of a migration), but I'm happy to generate you logs of any other command (or with other log settings) as you'd prefer them. The network used in this case is a bridge between two containers, but we can cut out more components. I found that the same vol-download vs 127.0.0.1 gives the same results. That in turn makes it easier to gather results as we only need one system. Here are logs of: virsh -c qemu+ssh://127.0.0.1/system?proxy=netcat vol-download --pool uvtool h-migr-test.qcow testfile => ~150-220MB/s [2] vs virsh -c qemu+ssh://127.0.0.1/system?proxy=native vol-download --pool uvtool h-migr-test.qcow testfile => 5 MB/s degrading to ~200 KB/s and less [3] [1]: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1904584/+attachment/5437... [2]: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1904584/+attachment/5437... [3]: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1904584/+attachment/5437...
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
-- Christian Ehrhardt Staff Engineer, Ubuntu Server Canonical Ltd