[libvirt] Re: kernel summit topic - 'containers end-game'

Monday, 6 July 2009

Quoting Daniel Lezcano (dlezcano(a)fr.ibm.com):
...
 Serge E. Hallyn wrote: ...
...
 Checkpoint:
 	- The initiator of the checkpoint initialize the barrier and send a  
 signal SIGCKPT to all the checkpointable tasks and these ones will jump  
 on the handler and block on the barrier.

 	- When all these tasks reach this barrier, the initiator of the
 checkpoint dumps the system wide resources (memory, sysv ipc, struct  
 files, etc ...).

 	- When this is done, the tasks are released and they store their  
 process wide resources (semundo, file descriptor, etc ...) to a  
 current->ckpt_restart buffer and then set the status of the operation  
 and block on the barrier.

 	- The initiator of the checkpoint then collects all these informations  
 and dump them. 
Do you envision all of the dumping being done in kernel or by userspace?

...

...
 	- Finally the initiator of the checkpoint release the tasks.

 Restart:
 	- The user executes the statefile, that spawns the process tree and all  
 the processes are blocked in the barrier.

 	- The initiator of the restart restore the system wide resources
 and fill the restarted processes' current->ckpt_restart buffer. 
Same question about restore...

...
 	- The initiator sends a SIGRESTART to all the tasks and unblock the
tasks

 	- all the tasks restore their process wide resources regarding the  
 current->ckpt_restart buffer.

 	- all the tasks write their status and block on the barrier

 	- the initiator of the restart release the tasks which will return to  
 their execution context when they were checkpointed.

 This approach is different of you are doing but I am pretty sure most of  
 the code is re-usable. I see different advantages of this approach:

  - because the process resources are checkpointed / restarted from  
 current, it would be easy to reuse some syscalls code (from the kernel  
 POV) and that would reduce the code duplication and maintenance overhead.

  - the approach is more fine grained as we can implement piece by piece  
 the checkpoint / restart.

  - as the statefile is in the elf format, gdb could be used to debug a  
 statefile as a core file 
Note btw that Dave has found that a checkpoint is faster than a core-dump
at the moment :)  That's not entirely an aside - I need to reread your
email a few times and really process your suggestion, but given that some
users want to dump hundreds of gigabytes of memory, not slowing down the
checkpoint is a big consideration.

...
  - as each process checkpoint / restart themselves, most of the  
 execution context is stored in the stack which is CR with the memory, so  
 when returning from the signal handler, the process returns to the right  
 context. That is less complicated and more generic than externally  
 checkpoint the execution context of a frozen task which would be  
 potentially different for the restart.

 I hope Serge you can present this approach as an alternative of the  
 current patchset __if__ this one is not acceptable. 
I'll try to understand it better than I do right now - I don't think
it's for discussing at ksummit, but definately if we have a mini-summit
or during the next round of discussions (during or immediately after
the ckpt-v17 publish).

thanks,
-serge

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

[libvirt] Re: kernel summit topic - 'containers end-game'