Daniel P. Berrangé <berrange(a)redhat.com> writes:
On Thu, Jul 02, 2020 at 01:21:15PM +0200, Milan Zamazal wrote:
> Hi,
>
> I've met two situations with NVDIMM support in libvirt where I'm not
> sure all the parties (libvirt & I) do the things correctly.
>
> The first problem is with memory alignment and size changes. In
> addition to the size changes applied to NVDIMMs by QEMU, libvirt also
> makes some NVDIMM size changes for better alignments, in
> qemuDomainMemoryDeviceAlignSize. This can lead to the size being
> rounded up, exceeding the size of the backing device and QEMU failing to
> start the VM for that reason (I've experienced that actually). I work
> with emulated NVDIMM devices, not a bare metal hardware, so one might
> argue that in practice the device sizes should already be aligned, but
> I'm not sure it must be always the case considering labels or whatever
> else the user decides to set up. And I still don't feel very
> comfortable that I have to count with two internal size adjustments
> (libvirt & QEMU) to the `size' value I specify, with the ultimate goal
> of getting the VM started and having the NVDIMM aligned properly to make
> (non-NVDIMM) memory hot plug working. Is the size alignment performed
> by libvirt, especially rounding up, completely correct for NVDIMMs?
The comment on the function says QEMU aligns to "page size", which
is something that can vary depending not only on architecture, and
also the build config options for the kernel on that architecture.
eg aarch64 has different page size in RHEL than other distros because
of different choice of page size in kernel config.
Libvirt rounds up to 1 MB,
Actually 2 MB, at least in my case, apparently in
qemuDomainGetMemoryModuleSizeAlignment. But it's just a detail.
essentially so that the size works no matter what architecture or
build options were used. I think this is quite compelling as I don't
think mgmt apps are likely to care enough about non-x86 architectures
to pick the right rounded sizes.
If we're enforcing this 1 MB rounding though, we really should be
documenting it clearly, so that apps can pick the right backing file
size. I think we dropped the ball on docs.
Yes, OK. I also wonder how exactly label size is counted in. It's
added to the aligned value in qemuDomainNVDimmAlignSizePseries with the
argument that label size is mandatory on ppc. But it's also permitted
on other architectures and I can't see a similar adjustment for them. I
think QEMU handles it fine in either case (by subtracting label size
from the overall size and aligning the result down) and I guess the
special handling of ppc in libvirt is just not to waste 256 MB
unnecessarily. Still, all the size shuffling scares me and I can only
hope that I compute my target sizes for the domain XML correctly to make
everything working well...
> The second problem is that a VM fails to start with a backing
NVDIMM in
> devdax mode due to SELinux preventing access to the /dev/dax* device (it
> doesn't happen with any other NVDIMM modes). Who should be responsible
> for handling the SELinux label appropriately in that case? libvirt, the
> system administrator, anybody else? Using <seclabel> in NVDIMM's source
> doesn't seem to be accepted by the domain XML schema.
The expectation is that out of the box SELinux will "just work". So
anything that is broken is a bug in either libvirt or selinux policy.
There is no expectation/requirement to use <seclabel> unless you want
to setup non-default behaviour which isn't the case here.
IOW this sounds like a genuine bug.
OK, I'll try to find out what and where is the problem exactly.
Thank you for the clarifications,
Milan