[libvirt] Xen and out of memory stuff, blocks libvirtd

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 [2008-10-08 19:14:05 13001] ERROR (XendCheckpoint:157) Save failed on domain klant1_monetdb (17) - resuming. Traceback (most recent call last): File "usr/lib64/python2.5/site-packages/xen/xend/XendCheckpoint.py", line 125, in save forkHelper(cmd, fd, saveInputHandler, False) File "usr/lib64/python2.5/site-packages/xen/xend/XendCheckpoint.py", line 358, in forkHelper child = xPopen3(cmd, True, -1, [fd, xc.handle()]) File "usr/lib64/python2.5/site-packages/xen/util/xpopen.py", line 100, in __init__ self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory [2008-10-08 19:14:05 13001] DEBUG (XendDomainInfo:2456) XendDomainInfo.resumeDomain(17) After this point, I am unable to contact the libvird daemon. Stefan -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEAREKAAYFAkjs7qQACgkQYH1+F2Rqwn0heQCeNJWDo2h3BR1IhcXmt+w9tpjD 304An1SlqWbig7nL8jgeElzJgu/KqgIa =dl3M -----END PGP SIGNATURE-----

On Wed, Oct 08, 2008 at 07:32:21PM +0200, Stefan de Konink wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
[2008-10-08 19:14:05 13001] ERROR (XendCheckpoint:157) Save failed on domain klant1_monetdb (17) - resuming. Traceback (most recent call last): File "usr/lib64/python2.5/site-packages/xen/xend/XendCheckpoint.py", line 125, in save forkHelper(cmd, fd, saveInputHandler, False) File "usr/lib64/python2.5/site-packages/xen/xend/XendCheckpoint.py", line 358, in forkHelper child = xPopen3(cmd, True, -1, [fd, xc.handle()]) File "usr/lib64/python2.5/site-packages/xen/util/xpopen.py", line 100, in __init__ self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory [2008-10-08 19:14:05 13001] DEBUG (XendDomainInfo:2456) XendDomainInfo.resumeDomain(17)
After this point, I am unable to contact the libvird daemon.
Well your host OS has run out of memory, so pretty much everything will be fubar at this point. If it can't even fork() then there's not much libvirt until this is resolved. libvirt tries to handle OOM scenario but there's some places we might have missed, but even so, the best libvirt can do is to drop all further requests as gracefully as possible. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Daniel P. Berrange schreef:
On Wed, Oct 08, 2008 at 07:32:21PM +0200, Stefan de Konink wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
[2008-10-08 19:14:05 13001] ERROR (XendCheckpoint:157) Save failed on domain klant1_monetdb (17) - resuming. Traceback (most recent call last): File "usr/lib64/python2.5/site-packages/xen/xend/XendCheckpoint.py", line 125, in save forkHelper(cmd, fd, saveInputHandler, False) File "usr/lib64/python2.5/site-packages/xen/xend/XendCheckpoint.py", line 358, in forkHelper child = xPopen3(cmd, True, -1, [fd, xc.handle()]) File "usr/lib64/python2.5/site-packages/xen/util/xpopen.py", line 100, in __init__ self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory [2008-10-08 19:14:05 13001] DEBUG (XendDomainInfo:2456) XendDomainInfo.resumeDomain(17)
After this point, I am unable to contact the libvird daemon.
Well your host OS has run out of memory, so pretty much everything will be fubar at this point. If it can't even fork() then there's not much libvirt until this is resolved. libvirt tries to handle OOM scenario but there's some places we might have missed, but even so, the best libvirt can do is to drop all further requests as gracefully as possible.
Not quite right, the host OS still has 32MB of memory at this point. But I agree strange things, even with later saves. If there is anything that is able to remove that painful xend stuff... I'm happy to give it a shot. Stefan -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEAREKAAYFAkjs9HMACgkQYH1+F2Rqwn2RygCdF37bTe5yxGaMxF+cSRIRqTAX sH4An341jO4GSiWG84if9zEScS09XBgZ =rG7F -----END PGP SIGNATURE-----

On Wed, Oct 08, 2008 at 07:57:08PM +0200, Stefan de Konink wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Daniel P. Berrange schreef:
On Wed, Oct 08, 2008 at 07:32:21PM +0200, Stefan de Konink wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
[2008-10-08 19:14:05 13001] ERROR (XendCheckpoint:157) Save failed on domain klant1_monetdb (17) - resuming. Traceback (most recent call last): File "usr/lib64/python2.5/site-packages/xen/xend/XendCheckpoint.py", line 125, in save forkHelper(cmd, fd, saveInputHandler, False) File "usr/lib64/python2.5/site-packages/xen/xend/XendCheckpoint.py", line 358, in forkHelper child = xPopen3(cmd, True, -1, [fd, xc.handle()]) File "usr/lib64/python2.5/site-packages/xen/util/xpopen.py", line 100, in __init__ self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory [2008-10-08 19:14:05 13001] DEBUG (XendDomainInfo:2456) XendDomainInfo.resumeDomain(17)
After this point, I am unable to contact the libvird daemon.
Well your host OS has run out of memory, so pretty much everything will be fubar at this point. If it can't even fork() then there's not much libvirt until this is resolved. libvirt tries to handle OOM scenario but there's some places we might have missed, but even so, the best libvirt can do is to drop all further requests as gracefully as possible.
Not quite right, the host OS still has 32MB of memory at this point. But I agree strange things, even with later saves. If there is anything that is able to remove that painful xend stuff... I'm happy to give it a shot.
The host had 32 MB at the time you measured it. That's not the same as the available memory at the time the kernel tried to run fork(). When fork() failed, and Xen aborted the restoration attempt, it will have free'd up a bunch of resources being used for the restore. This is nothing todo with XenD - the error shows fork() failed due to lack of OS memory. Whether running XenD or not, if the kernel runs out of memory you're doomed. You need more RAM allocated to your host. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Daniel P. Berrange schreef:
This is nothing todo with XenD - the error shows fork() failed due to lack of OS memory. Whether running XenD or not, if the kernel runs out of memory you're doomed. You need more RAM allocated to your host.
If XenD eats 90% of the available memory (at that time 256MB) I am seriously doomed aswell. Stefan -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEAREKAAYFAkjtChkACgkQYH1+F2Rqwn18SQCgjzFHT7lG5tPUsopnQ8+ejeXh 82sAn3kq5/Mj87EYtxitMB/fn7mnozOG =TzFV -----END PGP SIGNATURE-----

On Wed, Oct 08, 2008 at 09:29:29PM +0200, Stefan de Konink wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Daniel P. Berrange schreef:
This is nothing todo with XenD - the error shows fork() failed due to lack of OS memory. Whether running XenD or not, if the kernel runs out of memory you're doomed. You need more RAM allocated to your host.
If XenD eats 90% of the available memory (at that time 256MB) I am seriously doomed aswell.
Whatever the cause, its not a libvirt bug. If you think its wrong that XenD is eating 90% of your ram, then speak to the Xen developers - we can't help fix XenD here. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Daniel P. Berrange schreef:
On Wed, Oct 08, 2008 at 09:29:29PM +0200, Stefan de Konink wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Daniel P. Berrange schreef:
This is nothing todo with XenD - the error shows fork() failed due to lack of OS memory. Whether running XenD or not, if the kernel runs out of memory you're doomed. You need more RAM allocated to your host. If XenD eats 90% of the available memory (at that time 256MB) I am seriously doomed aswell.
Whatever the cause, its not a libvirt bug. If you think its wrong that XenD is eating 90% of your ram, then speak to the Xen developers - we can't help fix XenD here.
Yes you can by providing a great technology preview for the alternate technique where XenD is not required anymore. But I'll have patience, Stefan -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEAREKAAYFAkjtXy4ACgkQYH1+F2Rqwn0GXgCdFke4xY+DVmb3AwFy4F6upbc1 Z0gAn2S/Vvr+oYN97HZu6pZ/zn7AK5Nk =50Us -----END PGP SIGNATURE-----

On Thu, Oct 09, 2008 at 03:32:30AM +0200, Stefan de Konink wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Daniel P. Berrange schreef:
On Wed, Oct 08, 2008 at 09:29:29PM +0200, Stefan de Konink wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Daniel P. Berrange schreef:
This is nothing todo with XenD - the error shows fork() failed due to lack of OS memory. Whether running XenD or not, if the kernel runs out of memory you're doomed. You need more RAM allocated to your host. If XenD eats 90% of the available memory (at that time 256MB) I am seriously doomed aswell.
Whatever the cause, its not a libvirt bug. If you think its wrong that XenD is eating 90% of your ram, then speak to the Xen developers - we can't help fix XenD here.
Yes you can by providing a great technology preview for the alternate technique where XenD is not required anymore.
Again that's not something we're doing in the context of libvirt project. Gerd is working on this with upstream QEMU community. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
participants (2)
-
Daniel P. Berrange
-
Stefan de Konink