On Wed, Nov 25, 2020 at 1:38 PM Daniel P. Berrangé <berrange(a)redhat.com> wrote:
On Wed, Nov 25, 2020 at 01:28:09PM +0100, Christian Ehrhardt wrote:
> On Wed, Nov 25, 2020 at 10:55 AM Christian Ehrhardt
> <christian.ehrhardt(a)canonical.com> wrote:
> >
> > On Tue, Nov 24, 2020 at 4:30 PM Peter Krempa <pkrempa(a)redhat.com> wrote:
> > >
> > > On Tue, Nov 24, 2020 at 16:05:53 +0100, Christian Ehrhardt wrote:
> > > > Hi,
> > >
> > > [...]
> >
> > BTW to reduce the scope what to think about - I have rebuilt 6.8 as
> > well it works.
> > Thereby I can confirm that the offending change should be in between
> > 6.8.0 -> 6.9.0.
>
> I was able to get this working in git bisect builds from git between
> v6.8 / v6.9.
> I identified the following offending commit:
> 7d959c30 rpc: Fix virt-ssh-helper detection
>
> Ok that makes a bit of sense, first we had in 6.8
> f8ec7c84 rpc: use new virt-ssh-helper binary for remote tunnelling
> That makes it related to tunneling which matches our broken use-case.
>
> The identified commit "7d959c30 rpc: Fix virt-ssh-helper detection" might
> finally really enable the new helper and that is then broken?
>
> With that knowledge I was able to confirm that it really is the native mode
>
> $ virsh migrate --unsafe --live --p2p --tunnelled h-migr-test
> qemu+ssh://testkvm-hirsute-to/system?proxy=netcat
> <works>
> $ virsh migrate --unsafe --live --p2p --tunnelled h-migr-test
> qemu+ssh://testkvm-hirsute-to/system?proxy=native
> <hangs>
>
> I recently discussed with Andrea if we'd need apparmor rules for
> virt-ssh-helper,
> but there are no denials nor libvirt log entries related to virt-ssh-helper.
> But we don't need such rules since it is spawned on the ssh login and
> not under libvirtd itself.
>
> PS output of the hanging receiving virt-ssh-helper (looks not too unhappy):
> Source:
> 4 0 41305 1 20 0 1627796 23360 poll_s Ssl ?
> 0:05 /usr/sbin/libvirtd
> 0 0 41523 41305 20 0 9272 4984 poll_s S ?
> 0:02 \_ ssh -T -e none -- testkvm-hirsute-to sh -c 'virt-ssh-helper
> 'qemu:///system''
> Target
> 4 0 213 1 20 0 13276 4132 poll_s Ss ?
> 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 250-500 startups
> 4 0 35148 213 20 0 19048 11320 poll_s Ss ?
> 0:02 \_ sshd: root@notty
> 4 0 35206 35148 20 0 2584 544 do_wai Ss ?
> 0:00 \_ sh -c virt-ssh-helper qemu:///system
> 0 0 35207 35206 20 0 81348 26684 - R ?
> 0:34 \_ virt-ssh-helper qemu:///system
>
> I've looked at it with strace [1] and gdb for backtraces [2] - it is
> not dead or stuck and keeps working.
> Could it be just so slow that it appears to hang until it times out?
> Or is the event mechanism having issues and it wakes up too rarely?
Lets take migration out of the picture. What if you simply do
virsh -c qemu+ssh://testkvm-hirsute-to/system?proxy=native list
does that work ?
Yes it does, no hang and proper results
I migrated the system there (with ?proxy=netcat) to ensure there is
something to report and indeed some data is coming back through this.
root@testkvm-hirsute-from:~# virsh migrate --unsafe --live --p2p
--tunnelled h-migr-test
qemu+ssh://testkvm-hirsute-to/system?proxy=netcat
root@testkvm-hirsute-from:~# virsh -c
qemu+ssh://testkvm-hirsute-to/system?proxy=native list
Id Name State
-----------------------------
6 h-migr-test running
--
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd