On Thu, Feb 23, 2017 at 02:25:13PM +0100, Michal Privoznik wrote:
On 02/23/2017 02:16 PM, Daniel P. Berrange wrote:
> On Thu, Feb 23, 2017 at 02:07:27PM +0100, Michal Privoznik wrote:
>> On 02/23/2017 11:48 AM, Daniel P. Berrange wrote:
>>> On Thu, Feb 23, 2017 at 11:16:17AM +0100, Michal Privoznik wrote:
>>>> On 02/23/2017 11:02 AM, Daniel P. Berrange wrote:
>>>>> On Thu, Feb 23, 2017 at 10:02:48AM +0100, Michal Privoznik wrote:
>>>>>> So, majority of the code is just ready as-is. Well, with one
>>>>>> slight change: differentiate between dimm and nvdimm in places
>>>>>> like device alias generation, generating the command line and
so
>>>>>> on.
>>>>>>
>>>>>> Speaking of the command line, we also need to append
'nvdimm=on'
>>>>>> to the '-machine' argument so that the nvdimm feature
is
>>>>>> advertised in the ACPI tables properly.
>>>>>>
>>>>>> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
>>>>>
>>>>>
>>>>>> + if (prealloc &&
>>>>>> + virJSONValueObjectAdd(props,
>>>>>> + "b:prealloc",
true,
>>>>>> + NULL) < 0)
>>>>>> + goto cleanup;
>>>>>
>>>>> As discussed on IRC, using prealloc with QEMU causes it to memset()
>>>>> the first byte of every page of memory to 0. With NVDIMM this is
>>>>> obviously corrupting application data stored in the NVDIMM.
>>>>>
>>>>> Obviously this needs fixing in QEMU, but I would think that libvirt
>>>>> needs to block use of prealloc==true when running against existing
>>>>> broken QEMU versions, otherwise users are going to be rather upset
>>>>> to see their data corrupted on every boot
>>>>
>>>> They are, but should we work around broken QEMUs? And if so - how do we
>>>> detect whether the qemu we are dealing with is broken or already fixed
>>>> in that respect?
>>>
>>> Most likely we'll just have todo a version number check I think, since
>>> fixing this isn't going to involve any externally visible change to
>>> QEMU.
>>
>> What about backports then? If some distro decides to backport your fix
>> posted on the qemu list, this check here would prevent them from running
>> fixed qemu. Also, I don't think we want to work around qemu bugs.
>
> If the distro backports the QEMU fix, then they would have to adjust
> their libvirt to relax the version check too. Obviously we try to
> avoid this kind of scenario, but sometimes there's no other option,
> and this looks like one of those times.
Okay, fair enough.
>
>>>> On the other hand, we can just not set prealloc true for nvdimm. That
>>>> will have one downside though - after qemu mmaps() the file, kernel is
>>>> not forced to create a private copy of the pages.
>>>
>>> If the guest has requested prealloc, then I don't think we can ignore
>>> it for NVDIMM, because its a clear violation of the memory guarantee
>>> we're claiming to provide apps. Thus I think we've no option but to
>>> report an error if we see prealloc + broken QEMU
>>
>> Do you mean user instead of guest? Because it's user who requests
>> prealloc (well, in theory it is user; libvirt does not allow users to
>> request prealloc but rather has some rules worked in that request it
>> instead).
>
> Sorry yes, i meant user. eg if they set
>
> <memoryBacking>
> <allocation mode="immediate"/>
> </memoryBacking>
>
> I guess this is fairly new syntax so probably few people/apps are using
> it. IIRC, if this is not present in XML, then we just automagically
> assume allocation=immediate if using huge pages.
>
> So I guess we could say, if allocation=immediate is *not* in the guest
> XML, then NVDIMMs are assumed to have allocation=ondemand by default.
>
> It'd be inconsistent behaviour, but not wrong.
>
> That way we'd only need to raise an error if the user had explicitly
> set allocation=immediate
Frankly, I'd rather be consistent AND bug-free at the same time. Can't
we make NVDIMM work with just fixed QEMUs? At the level code, the nvdimm
capability would be set if nvdimm is found in a list of devices provided
by qemu AND if the qemu version is at least 2.8 (assuming your fix makes
it into the release).
Yep that's a valid approach to take assuming QEMU get a fix done in a
timely manner, so that people aren't pressuring us to support the buggy
QEMU. Lets aim todo that until someone complains :-)
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://entangle-photo.org -o-
http://search.cpan.org/~danberr/ :|