On 02/23/2017 11:48 AM, Daniel P. Berrange wrote:
On Thu, Feb 23, 2017 at 11:16:17AM +0100, Michal Privoznik wrote:
> On 02/23/2017 11:02 AM, Daniel P. Berrange wrote:
>> On Thu, Feb 23, 2017 at 10:02:48AM +0100, Michal Privoznik wrote:
>>> So, majority of the code is just ready as-is. Well, with one
>>> slight change: differentiate between dimm and nvdimm in places
>>> like device alias generation, generating the command line and so
>>> on.
>>>
>>> Speaking of the command line, we also need to append 'nvdimm=on'
>>> to the '-machine' argument so that the nvdimm feature is
>>> advertised in the ACPI tables properly.
>>>
>>> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
>>
>>
>>> + if (prealloc &&
>>> + virJSONValueObjectAdd(props,
>>> + "b:prealloc", true,
>>> + NULL) < 0)
>>> + goto cleanup;
>>
>> As discussed on IRC, using prealloc with QEMU causes it to memset()
>> the first byte of every page of memory to 0. With NVDIMM this is
>> obviously corrupting application data stored in the NVDIMM.
>>
>> Obviously this needs fixing in QEMU, but I would think that libvirt
>> needs to block use of prealloc==true when running against existing
>> broken QEMU versions, otherwise users are going to be rather upset
>> to see their data corrupted on every boot
>
> They are, but should we work around broken QEMUs? And if so - how do we
> detect whether the qemu we are dealing with is broken or already fixed
> in that respect?
Most likely we'll just have todo a version number check I think, since
fixing this isn't going to involve any externally visible change to
QEMU.
What about backports then? If some distro decides to backport your fix
posted on the qemu list, this check here would prevent them from running
fixed qemu. Also, I don't think we want to work around qemu bugs.
> On the other hand, we can just not set prealloc true for nvdimm. That
> will have one downside though - after qemu mmaps() the file, kernel is
> not forced to create a private copy of the pages.
If the guest has requested prealloc, then I don't think we can ignore
it for NVDIMM, because its a clear violation of the memory guarantee
we're claiming to provide apps. Thus I think we've no option but to
report an error if we see prealloc + broken QEMU
Do you mean user instead of guest? Because it's user who requests
prealloc (well, in theory it is user; libvirt does not allow users to
request prealloc but rather has some rules worked in that request it
instead).
Michal