Re: Revisiting parallel save/restore

Friday, 26 April 2024

Daniel P. Berrangé <berrange(a)redhat.com&gt; writes:

...
 On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel
wrote:
> A good starting point on this journey is supporting the new mapped-ram
> capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I
> assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm not
> sure how to detect if a saved image is in mapped-ram format vs the existing,
> sequential stream format.

 Yes, we'll need to be supporting 'mapped-ram', so a good first step.

 A question is whether we make that feature mandatory for all save images,
 or implied by another feature (parallel save), or an directly controllable
 feature with opt-in.

 The former breaks back compat with existnig libvirt, while the latter 2
 options are net new so don't have compat implications.

 In terms of actual data blocks written on disk mapped-ram should be be the
 same size, or smaller, than the existing format.

 In terms of logical file size, however, mapped-ram will almost always be
 larger.

 This is because mapped-ram will result in a file whose logical size matches
 the guest RAM size, plus some header overhead, while being sparse so not
 all blocks are written.

 If tools handling save images aren't sparse-aware this could come across
 as a surprise and even be considered a regression.

 Mapped ram is needed for parallel saves since it lets each thread write
 to a specific region of the file.

 Mapped ram is good for non-parallel saves too though, because the mapping
 of RAM into the file is aligned suitably to allow for O_DIRECT to be used.
 Currently libvirt has to tunnnel over its iohelper to futz alignment
 needed for O_DIRECT. This makes it desirable to use in general, but back
 compat hurts... 
Note that QEMU doesn't support O_DIRECT without multifd.

From mapped-ram patch series v4:

- Dropped support for direct-io with fixed-ram _without_ multifd. This
  is something I said I would do for this version, but I had to drop
  it because performance is really bad. I think the single-threaded
  precopy code cannot cope with the extra latency/synchronicity of
  O_DIRECT.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: Revisiting parallel save/restore