
On Wed, 13 Jan 2016 17:53:16 +0000 "Daniel P. Berrange" <berrange@redhat.com> wrote:
On Wed, Jan 13, 2016 at 05:51:34PM +0100, Martin Kletzander wrote:
On Wed, Jan 13, 2016 at 07:29:46AM -0500, John Ferlan wrote:
Reposting my cgroup fixes series:
http://www.redhat.com/archives/libvir-list/2016-January/msg00236.html
partially because I originally forgot to CC the author (Henning Schild) of the original series for which these patch fix a couple of issues discovered during regression testing (virt-test memtune failures in Red Hat regression environment), but also to bring them up to date with the top of libvirt git.
NB: I did send Henning the changes after the fact, but my resend using the same message-id skills so that replies are left in the onlist series are lacking. Henning has looked at the first patch - with a response here:
http://www.redhat.com/archives/libvir-list/2016-January/msg00443.html
Finally, I think these changes should go into 1.3.1 since that's when the regression was introduced.
It would be nice to have them in, I really tried reviewing them, but I can't wrap my head around last two of them. Maybe because I'm already late for an appointment I have.
So unfortunately I have to leave you without the review for those two as I really need to go, but anyone else feel free to continue. And even re-check my reviews for 1 and 2 if you want. It would be a pity not to fix a regression when we could.
I agree we need to get these into the next release, but please hold off from merging them. I want to re-examine Henning's original patches in more detail, as I have a bad feeling we might need to simply revert all of them and start again.
Until when do these patches need to be reviewed? The 1/4 is obvious but the other ones need a closer look. I can just say that they do not seem right. I pulled virCgroupAddTask out of virCgroupNewMachine* and that should be done. But it seems to me that the way virCgroupAddTask was called contained important error handling that should remain in virCgroupNewMachine. If we only have a couple of days to the next release i would suggest reverting my changes to give us time to figure the whole thing out. I will have limited time to look into that until the end of the month. But i certainly want those changes merged, or something that helps solve/mitigate the realtime problems of the current implementation. My changes address only a fraction of the problem. My suggestion for a proper solution would be using the exclusive flags of cgroups, which will require a totally different cgroup layout. Something like a "machine.slice_excl" next to "machine.slice" with the current structure replicated in there. Or a "top-level" exclusive cgroup per instance. So far i only looked at the implications for cpu_exclusive, where such a setup can work and will properly "protect" the reserved cpus from accidential use. Regards, Henning