[libvirt-users] virDomainMemoryPeek: bad behavior under workload

Greetings, I am working on a platform for analysis automation. I need to run several Virtual Environments concurrently and record information about their behavior. I wrote some months ago about the capability of reading the Memory during the Environment's execution (in paused state). What do I need is the complete linear memory image, byte per byte, nothing special; I will give this output to tools and parsers like Volatility to get the value from it. I looked around and the only way to get the memory in such a way is using the QEMU monitor command `pmemsave`. I am using libvirt through its Python bindings and the virDomainQemuMonitorCommand seems not to be exposed by the API so, as suggested in some mails I read into the mailig list, I switched to virDomainMemoryPeek. Using this function keeps up to 14-16 seconds to read 512Mb of memory with the 64Kb limitation and 2-3 seconds with the 1Mb one; but the most annoying thing is that I can't run several environment concurrently as the function keeps failing. Here's the typical output: File "/home/nox/workspace/NOX/src/NOX/hooks.py", line 134, in trigger hook.trigger(event) File "/home/nox/workspace/NOX/src/NOX/hooks.py", line 33, in trigger self.handlers[event]() File "/home/nox/workspace/NOX/hooks/volatility.py", line 81, in memory_dump for block in Memory(self.ctx): File "/home/see/workspace/NOX/src/NOX/lib/libtools.py", line 179, in next libvirt.VIR_MEMORY_PHYSICAL) File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1759, in memoryPeek ret = libvirtmod.virDomainMemoryPeek(self._o, start, size, flags) SystemError: error return without exception set I can't run more than 3 environments concurrently on a Xeon Quad with 8Gb of memory. I guess the RPC reply goes in timeout because the system is under heavy load but I'm not sure as the error output is quite obscure. Is there any solution to this issue? Is it possible to raise the RPC reply timeout value so that, even if slowly, I eventually get the memory dump? If through virsh I use the QEMU `pmemsave` command, I get the memory dump in less than one second; is there any way to obtain the same performance? Thanks anyway for making libvirt the great tool it is! NoxDaFox

On Fri, Aug 31, 2012 at 03:23:18PM +0300, NoxDaFox wrote:
Greetings,
I am working on a platform for analysis automation. I need to run several Virtual Environments concurrently and record information about their behavior.
I wrote some months ago about the capability of reading the Memory during the Environment's execution (in paused state). What do I need is the complete linear memory image, byte per byte, nothing special; I will give this output to tools and parsers like Volatility to get the value from it.
If you want the complete memory image, perhaps you can just run the virDomainCoreDump() command, with the VIR_DUMP_MEMORY_ONLY flag (though this flag only works on very recent QEMU)
I looked around and the only way to get the memory in such a way is using the QEMU monitor command `pmemsave`. I am using libvirt through its Python bindings and the virDomainQemuMonitorCommand seems not to be exposed by the API so, as suggested in some mails I read into the mailig list, I switched to virDomainMemoryPeek.
Using this function keeps up to 14-16 seconds to read 512Mb of memory with the 64Kb limitation and 2-3 seconds with the 1Mb one; but the most annoying thing is that I can't run several environment concurrently as the function keeps failing.
FYI, the virDomainMemoryPeek command was not really designed with scalability in mind, in particular not really intended for dumping the entire of guest memory. Its use case was tools like the virt-dmesg command, where you just want to peek at a handful of small memory regions.
Here's the typical output:
File "/home/nox/workspace/NOX/src/NOX/hooks.py", line 134, in trigger hook.trigger(event) File "/home/nox/workspace/NOX/src/NOX/hooks.py", line 33, in trigger self.handlers[event]() File "/home/nox/workspace/NOX/hooks/volatility.py", line 81, in memory_dump for block in Memory(self.ctx): File "/home/see/workspace/NOX/src/NOX/lib/libtools.py", line 179, in next libvirt.VIR_MEMORY_PHYSICAL) File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1759, in memoryPeek ret = libvirtmod.virDomainMemoryPeek(self._o, start, size, flags) SystemError: error return without exception set
Hmm, that's a peculiar message to see - I can't find anywhere in the libvirt code that uses that particular messages, so I'm not sure what has gone wrong here.
I can't run more than 3 environments concurrently on a Xeon Quad with 8Gb of memory.
I guess the RPC reply goes in timeout because the system is under heavy load but I'm not sure as the error output is quite obscure. Is there any solution to this issue? Is it possible to raise the RPC reply timeout value so that, even if slowly, I eventually get the memory dump?
For the memory peek API, we invoked a QEMU monitor command - we should not timeout on this at all, unless you are trying to invoke other monitor commands against the same QEMU process concurrently
If through virsh I use the QEMU `pmemsave` command, I get the memory dump in less than one second; is there any way to obtain the same performance?
If virsh works properly, then this suggests the problem is somewhere in the python code, either libvirt's python binding, or your apps usage. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Fri, Aug 31, 2012 at 08:09:46AM -0700, Daniel P. Berrange wrote:
On Fri, Aug 31, 2012 at 03:23:18PM +0300, NoxDaFox wrote:
Here's the typical output:
File "/home/nox/workspace/NOX/src/NOX/hooks.py", line 134, in trigger hook.trigger(event) File "/home/nox/workspace/NOX/src/NOX/hooks.py", line 33, in trigger self.handlers[event]() File "/home/nox/workspace/NOX/hooks/volatility.py", line 81, in memory_dump for block in Memory(self.ctx): File "/home/see/workspace/NOX/src/NOX/lib/libtools.py", line 179, in next libvirt.VIR_MEMORY_PHYSICAL) File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1759, in memoryPeek ret = libvirtmod.virDomainMemoryPeek(self._o, start, size, flags) SystemError: error return without exception set
Hmm, that's a peculiar message to see - I can't find anywhere in the libvirt code that uses that particular messages, so I'm not sure what has gone wrong here.
Oh, I think this might be a python message. Our C binding does LIBVIRT_BEGIN_ALLOW_THREADS; c_retval = virDomainMemoryPeek(domain, start, size, buf, flags); LIBVIRT_END_ALLOW_THREADS; if (c_retval < 0) goto cleanup; py_retval = PyString_FromStringAndSize(buf, size); cleanup: VIR_FREE(buf); return py_retval; } In the 'c_retval < 0' check, I think we should have been doing if (c_retval < 0) { py_retval = VIR_PY_NONE; goto cleanup; } so we actually return Python's idea of None, rather than C's NULL, at which point you'd probably see the real error message from libvirt/QEMU Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 31/08/12 18:20, Daniel P. Berrange wrote:
On Fri, Aug 31, 2012 at 08:09:46AM -0700, Daniel P. Berrange wrote:
Here's the typical output:
File "/home/nox/workspace/NOX/src/NOX/hooks.py", line 134, in trigger hook.trigger(event) File "/home/nox/workspace/NOX/src/NOX/hooks.py", line 33, in trigger self.handlers[event]() File "/home/nox/workspace/NOX/hooks/volatility.py", line 81, in memory_dump for block in Memory(self.ctx): File "/home/see/workspace/NOX/src/NOX/lib/libtools.py", line 179, in next libvirt.VIR_MEMORY_PHYSICAL) File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1759, in memoryPeek ret = libvirtmod.virDomainMemoryPeek(self._o, start, size, flags) SystemError: error return without exception set Hmm, that's a peculiar message to see - I can't find anywhere in the
On Fri, Aug 31, 2012 at 03:23:18PM +0300, NoxDaFox wrote: libvirt code that uses that particular messages, so I'm not sure what has gone wrong here. Oh, I think this might be a python message. Our C binding does
LIBVIRT_BEGIN_ALLOW_THREADS; c_retval = virDomainMemoryPeek(domain, start, size, buf, flags); LIBVIRT_END_ALLOW_THREADS;
if (c_retval< 0) goto cleanup;
py_retval = PyString_FromStringAndSize(buf, size);
cleanup: VIR_FREE(buf); return py_retval; }
In the 'c_retval< 0' check, I think we should have been doing
if (c_retval< 0) { py_retval = VIR_PY_NONE; goto cleanup; }
so we actually return Python's idea of None, rather than C's NULL, at which point you'd probably see the real error message from libvirt/QEMU
Daniel I looked more deeply into the code and I agree with you about the reason of that weird behavior when the error shows. What I don't get is why the error should show. I made more tests on different platforms and the behavior seems more random than expected.
What I'm doing is iterate over the memory, repeatedly calling the memoryPeek() command on memory blocks of 64Kb as libvirt's RPC driver is limited to this size. Everything happens from separate processes peeking memory of separate virtual machines in suspended state during all the process; this means that whatever is happening on the guests is not running at the moment, neither the guest itself. Do you know which version of QEMU is supporting the dump-guest-memory? NoxDaFox

On Fri, Aug 31, 2012 at 03:23:18PM +0300, NoxDaFox wrote:
Greetings,
I am working on a platform for analysis automation. I need to run several Virtual Environments concurrently and record information about their behavior.
I wrote some months ago about the capability of reading the Memory during the Environment's execution (in paused state). What do I need is the complete linear memory image, byte per byte, nothing special; I will give this output to tools and parsers like Volatility to get the value from it. If you want the complete memory image, perhaps you can just run the virDomainCoreDump() command, with the VIR_DUMP_MEMORY_ONLY flag (though this flag only works on very recent QEMU) I'm working on Debian Wheezy, the flag you're talking about is in
On 31/08/12 18:09, Daniel P. Berrange wrote: libvirt 0.9.13, it's in experimental now, do you know if it's going to be included into the next stable release? I'm using libvirt 0.9.13 and QEMU 1.1.1 (from experimental as well) and the result is an error which tells the feature is not supported by QEMU, do I need an even more recent version?
I looked around and the only way to get the memory in such a way is using the QEMU monitor command `pmemsave`. I am using libvirt through its Python bindings and the virDomainQemuMonitorCommand seems not to be exposed by the API so, as suggested in some mails I read into the mailig list, I switched to virDomainMemoryPeek.
Using this function keeps up to 14-16 seconds to read 512Mb of memory with the 64Kb limitation and 2-3 seconds with the 1Mb one; but the most annoying thing is that I can't run several environment concurrently as the function keeps failing. FYI, the virDomainMemoryPeek command was not really designed with scalability in mind, in particular not really intended for dumping the entire of guest memory. Its use case was tools like the virt-dmesg command, where you just want to peek at a handful of small memory regions.
What I realized is a simple Python iterator that allows to iterate through Disk and Memory with chunk of 64K or 1M depending on the libvirt version. The users may have the need to read the entire memory before finding what they're looking for. Do you think would be feasible to rework on this command allowing more scalability at least?
Here's the typical output:
File "/home/nox/workspace/NOX/src/NOX/hooks.py", line 134, in trigger hook.trigger(event) File "/home/nox/workspace/NOX/src/NOX/hooks.py", line 33, in trigger self.handlers[event]() File "/home/nox/workspace/NOX/hooks/volatility.py", line 81, in memory_dump for block in Memory(self.ctx): File "/home/see/workspace/NOX/src/NOX/lib/libtools.py", line 179, in next libvirt.VIR_MEMORY_PHYSICAL) File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1759, in memoryPeek ret = libvirtmod.virDomainMemoryPeek(self._o, start, size, flags) SystemError: error return without exception set Hmm, that's a peculiar message to see - I can't find anywhere in the libvirt code that uses that particular messages, so I'm not sure what has gone wrong here.
Is a Python message, it's triggered once a C method returns NULL.
I can't run more than 3 environments concurrently on a Xeon Quad with 8Gb of memory.
I guess the RPC reply goes in timeout because the system is under heavy load but I'm not sure as the error output is quite obscure. Is there any solution to this issue? Is it possible to raise the RPC reply timeout value so that, even if slowly, I eventually get the memory dump? For the memory peek API, we invoked a QEMU monitor command - we should not timeout on this at all, unless you are trying to invoke other monitor commands against the same QEMU process concurrently So there's some other problem; I run several tests and seems to happen randomly, is not depending which block of memory is reading. It is enough to read memory from 3 different running guests at the same time to see this issue. If through virsh I use the QEMU `pmemsave` command, I get the memory dump in less than one second; is there any way to obtain the same performance? If virsh works properly, then this suggests the problem is somewhere in the python code, either libvirt's python binding, or your apps usage.
The virsh monitor command uses an unexposed API method: virDomainQemuMonitorCommand. This method is not available in the Python APIs neither is documented in libvirt documentation. I assume is for internal use only. I can use this strategy as a workaround, but spawning a virsh process just for run this command while I can access to such resources as the connection to the Hypervisor and the Domain doesn't seem a clever solution to me. NoxDaFox
participants (2)
-
Daniel P. Berrange
-
NoxDaFox