Guest vm doesn't recover after the nfs connection resume

Dear developers: I found one issue during regular test and I could not confirm whether it is a libvirt|qemu issue or it is a nfs client issue or it is not an issue, so could you help to check it? Below is the issue reproduce steps: 1.there is a nfs server with exports file like: /nfs *(async,rw,no_root_squash) 2. host machine soft mount nfs: mount nfs_server_ip:/nfs /var/lib/libvirt/images/nfs -o v4,soft 3. start a guest vm with disk tag xml like below: <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/nfs/RHEL-8.6.0-20211102.1-x86_64.qcow2' index='1'/> <backingStore/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> </disk> 4.Start the vm and during the guest vm boot, apply the iptables rule to drop the nfs connection to nfs server iptables -A OUTPUT -d nfs_server_ip -p tcp --dport 2049 -j DROP 5. Wait until the error log appear in /var/log/message kernel: nfs: server nfs_server_ip not responding, timed out 6. delete the iptables rule to retain the connection to nfs server iptables -D OUTPUT -d nfs_server_ip -p tcp --dport 2049 -j DROP 7. check the guest vm, found the boot process with error and can not recover. rror: ../../grub-core/disk/i386/pc/biosdisk.c:546:failure reading sector 0x7ab8 from `hd0'. error: ../../grub-core/disk/i386/pc/biosdisk.c:546:failure reading sector 0x9190 from `hd0'. error: ../../grub-core/disk/i386/pc/biosdisk.c:546:failure reading sector 8. found there is error message in /var/log/message kernel: NFS: __nfs4_reclaim_open_state: Lock reclaim failed! Thanks Liang Cong

On Thu, Dec 09, 2021 at 05:54:15PM +0800, Liang Cong wrote:
Dear developers:
I found one issue during regular test and I could not confirm whether it is a libvirt|qemu issue or it is a nfs client issue or it is not an issue, so could you help to check it? Below is the issue reproduce steps:
1.there is a nfs server with exports file like: /nfs *(async,rw,no_root_squash) 2. host machine soft mount nfs: mount nfs_server_ip:/nfs /var/lib/libvirt/images/nfs -o v4,soft 3. start a guest vm with disk tag xml like below: <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/nfs/RHEL-8.6.0-20211102.1-x86_64.qcow2' index='1'/> <backingStore/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> </disk> 4.Start the vm and during the guest vm boot, apply the iptables rule to drop the nfs connection to nfs server iptables -A OUTPUT -d nfs_server_ip -p tcp --dport 2049 -j DROP 5. Wait until the error log appear in /var/log/message kernel: nfs: server nfs_server_ip not responding, timed out 6. delete the iptables rule to retain the connection to nfs server iptables -D OUTPUT -d nfs_server_ip -p tcp --dport 2049 -j DROP 7. check the guest vm, found the boot process with error and can not recover. rror: ../../grub-core/disk/i386/pc/biosdisk.c:546:failure reading sector
0x7ab8 from `hd0'.
error: ../../grub-core/disk/i386/pc/biosdisk.c:546:failure reading sector
0x9190 from `hd0'.
error: ../../grub-core/disk/i386/pc/biosdisk.c:546:failure reading sector
So this shows that I/O errors have been sent from the host to the guest. This means two things: - The host has reported I/O errors to QEMU - QEMU is confjigured to reporte I/O errors to the guest (rerror/werror attributes for disk config) I expect the first point there is a result of you using 'soft' for the NFS mount - try it again with 'hard'. The alternative for 'rerror/werror' is to pause the guest, allowing the host problem to be solved whereupon you unpause the guest. Overall this behaviour just looks like a result of your config choices. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Hi Daniel, Thanks for your reply. I tried the nfs hard mount, and got the same behavior of the soft mount. But in the /var/log/message, got nfs server recovery message which is not printed when mounting as soft mode. Dec 14 02:12:47 test-1 kernel: nfs: server ip not responding, still trying Dec 14 02:13:39 test-1 kernel: nfs: server ip not responding, timed out *Dec 14 02:14:34 test-1 kernel: nfs: server ip OK* Dec 14 02:14:34 test-1 kernel: NFS: __nfs4_reclaim_open_state: Lock reclaim failed! According to my understanding the vm boot process will not recover(the vm is still in running state, never paused) to normal until restarting the vm guest. And it is not the issue of libvirt or qemu, it is just the correct behavior with the nfs connection timeout, right? Thanks, Liang Cong On Thu, Dec 9, 2021 at 6:03 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Thu, Dec 09, 2021 at 05:54:15PM +0800, Liang Cong wrote:
Dear developers:
I found one issue during regular test and I could not confirm whether it is a libvirt|qemu issue or it is a nfs client issue or it is not an issue, so could you help to check it? Below is the issue reproduce steps:
1.there is a nfs server with exports file like: /nfs *(async,rw,no_root_squash) 2. host machine soft mount nfs: mount nfs_server_ip:/nfs /var/lib/libvirt/images/nfs -o v4,soft 3. start a guest vm with disk tag xml like below: <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/nfs/RHEL-8.6.0-20211102.1-x86_64.qcow2' index='1'/> <backingStore/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> </disk> 4.Start the vm and during the guest vm boot, apply the iptables rule to drop the nfs connection to nfs server iptables -A OUTPUT -d nfs_server_ip -p tcp --dport 2049 -j DROP 5. Wait until the error log appear in /var/log/message kernel: nfs: server nfs_server_ip not responding, timed out 6. delete the iptables rule to retain the connection to nfs server iptables -D OUTPUT -d nfs_server_ip -p tcp --dport 2049 -j DROP 7. check the guest vm, found the boot process with error and can not recover. rror: ../../grub-core/disk/i386/pc/biosdisk.c:546:failure reading sector
0x7ab8 from `hd0'.
error: ../../grub-core/disk/i386/pc/biosdisk.c:546:failure reading sector
0x9190 from `hd0'.
error: ../../grub-core/disk/i386/pc/biosdisk.c:546:failure reading sector
So this shows that I/O errors have been sent from the host to the guest.
This means two things:
- The host has reported I/O errors to QEMU - QEMU is confjigured to reporte I/O errors to the guest (rerror/werror attributes for disk config)
I expect the first point there is a result of you using 'soft' for the NFS mount - try it again with 'hard'.
The alternative for 'rerror/werror' is to pause the guest, allowing the host problem to be solved whereupon you unpause the guest.
Overall this behaviour just looks like a result of your config choices.
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Tue, Dec 14, 2021 at 03:35:42PM +0800, Liang Cong wrote:
Hi Daniel,
Thanks for your reply. I tried the nfs hard mount, and got the same behavior of the soft mount. But in the /var/log/message, got nfs server recovery message which is not printed when mounting as soft mode.
Dec 14 02:12:47 test-1 kernel: nfs: server ip not responding, still trying Dec 14 02:13:39 test-1 kernel: nfs: server ip not responding, timed out *Dec 14 02:14:34 test-1 kernel: nfs: server ip OK* Dec 14 02:14:34 test-1 kernel: NFS: __nfs4_reclaim_open_state: Lock reclaim failed!
According to my understanding the vm boot process will not recover(the vm is still in running state, never paused) to normal until restarting the vm guest. And it is not the issue of libvirt or qemu, it is just the correct behavior with the nfs connection timeout, right?
With 'hard' mount I would not expect QEMU/guest to see any errors at all, though your messages here about 'Lock reclaim failed' are a little concerning as it suggests something is not working right wth NFS. This is beyond my knowledge of NFS though. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (2)
-
Daniel P. Berrangé
-
Liang Cong