On Mon, Sep 30, 2019 at 10:55:00AM +0200, Martin Kletzander wrote:
On Fri, Sep 27, 2019 at 01:52:25PM +0100, Daniel P. Berrangé wrote:
> Add a check reporting if any CPU vulnerabilities have not been mitigated
> by the kernel. It further reports whether it is safe to use Intel SMT
> for KVM guests or not, as several of the vulnerabilities are dangerous
> when combined with SMT and KVM, even if mitigations are in effect.
>
> eg on a host with mitigations, but unsafe SMT still enabled:
>
> Checking CPU hardware vulnerability mitigation...PASS
> Checking CPU hardware vulnerability SMT safety...FAIL
>
> Signed-off-by: Daniel P. Berrangé <berrange(a)redhat.com>
> ---
> libvirt.spec.in | 1 +
> tools/Makefile.am | 1 +
> .../rules/linux-cpu-hardware-flaws.yaml | 165 ++++++++++++++++++
> 3 files changed, 167 insertions(+)
> create mode 100644 tools/host-validate/rules/linux-cpu-hardware-flaws.yaml
>
> diff --git a/libvirt.spec.in b/libvirt.spec.in
> index f336296a08..8aa226798a 100644
> --- a/libvirt.spec.in
> +++ b/libvirt.spec.in
> @@ -1901,6 +1901,7 @@ exit 0
> %{_datadir}/libvirt/host-validate/linux-acpi.yaml
> %{_datadir}/libvirt/host-validate/linux-cgroups.yaml
> %{_datadir}/libvirt/host-validate/linux-cpu.yaml
> +%{_datadir}/libvirt/host-validate/linux-cpu-hardware-flaws.yaml
> %{_datadir}/libvirt/host-validate/linux-devices.yaml
> %{_datadir}/libvirt/host-validate/linux-iommu.yaml
> %{_datadir}/libvirt/host-validate/linux-namespaces.yaml
> diff --git a/tools/Makefile.am b/tools/Makefile.am
> index 728de475a2..907b0195c2 100644
> --- a/tools/Makefile.am
> +++ b/tools/Makefile.am
> @@ -173,6 +173,7 @@ virt_host_validate_rules_DATA = \
> $(srcdir)/host-validate/rules/linux-acpi.yaml \
> $(srcdir)/host-validate/rules/linux-cgroups.yaml \
> $(srcdir)/host-validate/rules/linux-cpu.yaml \
> + $(srcdir)/host-validate/rules/linux-cpu-hardware-flaws.yaml \
> $(srcdir)/host-validate/rules/linux-devices.yaml \
> $(srcdir)/host-validate/rules/linux-iommu.yaml \
> $(srcdir)/host-validate/rules/linux-namespaces.yaml \
> diff --git a/tools/host-validate/rules/linux-cpu-hardware-flaws.yaml
b/tools/host-validate/rules/linux-cpu-hardware-flaws.yaml
> new file mode 100644
> index 0000000000..6a243df96d
> --- /dev/null
> +++ b/tools/host-validate/rules/linux-cpu-hardware-flaws.yaml
> @@ -0,0 +1,165 @@
> +#
> +# Define facts related to CPU hardware vulnerabilities
> +#
> +
> +facts:
> +- name: cpu.vulnerability.meltdown
> + filter:
> + fact:
> + name: os.kernel
> + value: Linux
> + value:
> + file:
> + path: /sys/devices/system/cpu/vulnerabilities/meltdown
> + ignoreMissing: true
> + parse:
> + scalar:
> + regex: (\w+)
> + match: 1
> +- name: cpu.vulnerability.spectre_v1
> + filter:
> + fact:
> + name: os.kernel
> + value: Linux
> + value:
> + file:
> + path: /sys/devices/system/cpu/vulnerabilities/spectre_v1
> + ignoreMissing: true
> + parse:
> + scalar:
> + regex: (\w+)
> + match: 1
> +- name: cpu.vulnerability.spectre_v2
> + filter:
> + fact:
> + name: os.kernel
> + value: Linux
> + value:
> + file:
> + path: /sys/devices/system/cpu/vulnerabilities/spectre_v2
> + ignoreMissing: true
> + parse:
> + scalar:
> + regex: (\w+)
> + match: 1
> +- name: cpu.vulnerability.spec_store_bypass
> + filter:
> + fact:
> + name: os.kernel
> + value: Linux
> + value:
> + file:
> + path: /sys/devices/system/cpu/vulnerabilities/spec_store_bypass
> + ignoreMissing: true
> + parse:
> + scalar:
> + regex: (\w+)
> + match: 1
> +- name: cpu.vulnerability.mds
> + filter:
> + fact:
> + name: os.kernel
> + value: Linux
> + value:
> + file:
> + path: /sys/devices/system/cpu/vulnerabilities/mds
> + ignoreMissing: true
> + parse:
> + scalar:
> + regex: (\w+)
> + match: 1
> +- name: cpu.vulnerability.mds_smt
> + filter:
> + fact:
> + name: os.kernel
> + value: Linux
> + value:
> + file:
> + path: /sys/devices/system/cpu/vulnerabilities/mds
> + ignoreMissing: true
> + parse:
> + scalar:
> + regex: SMT (\w+)
> + match: 1
> +- name: cpu.vulnerability.l1tf
> + filter:
> + fact:
> + name: os.kernel
> + value: Linux
> + value:
> + file:
> + path: /sys/devices/system/cpu/vulnerabilities/l1tf
> + ignoreMissing: true
> + parse:
> + scalar:
> + regex: (\w+)
> + match: 1
> +- name: cpu.vulnerability.l1tf_smt
> + filter:
> + fact:
> + name: os.kernel
> + value: Linux
> + value:
> + file:
> + path: /sys/devices/system/cpu/vulnerabilities/l1tf
> + ignoreMissing: true
> + parse:
> + scalar:
> + regex: SMT (\w+)
> + match: 1
Given the fact that most of these could just be virFileReadValueUint() it does
not even make it easier to read or write the code.
Errr, virFileReadValueUint reads a single integer from a file.
This is extracting a single *word* from a specific field in a
file.
Doing this in code would involve doing a regex match the same as
here.
Every time someone will want to add a new check or a fact they will
need to find
a similar one, copy-paste it, change it and hope for the best. This introduces
yet another "language" on top of the two you are adding already. I really do
not see any benefit in this.
Taking this particular example.
- name: cpu.vulnerability.l1tf_smt
filter:
fact:
name: os.kernel
value: Linux
value:
file:
path: /sys/devices/system/cpu/vulnerabilities/l1tf
ignoreMissing: true
parse:
scalar:
regex: SMT (\w+)
match: 1
It could be approximately equiv to code
if runtime.GOOS == "Linux" {
return "", nil
}
data, err =
ioutils.ReadFile("/sys/devices/system/cpu/vulnerabilities/l1tf")
if err != nil {
if os.IsNotExist(err) {
return "", nil
}
return "", err
}
match, err := regexp.Find(`SMT (\w+)`, data)
if err != nil {
return "", err
}
err = SetFact("cpu.vulnerability.l1tf_smt", match)
if err != nil {
return "", err
}
The code does look more familiar, but it is certainly not simpler.
Most notabley there are alot of error handling cases here that need
to be dealt with, even when written in Go. Doing this in C with its
string handling support involves even more error handling. This
is a significant maintainence burden that is not well handled in
a consistent manner across the code.
This is a fairly simple check example, case where we want to extract
data from a more more complex file formats result in much more code
complexity.
In the pure code approach, we have mixed up code related to reading
files, running commands, parsing data formats, setting facts, printing
messages, skipping execution when certain other conditions applied.
Hidden in this is the information about what feature we are actually
checking for.
A data driven approach to this problem means that we get clear separation
between distinct parts/jobs.
The engine written in code concerns itself with generic facilities for
reading files, running commands, parsing data, and all the error handling
and data processing.
Thereafter the interesting checks is simply a declarative expression of
what data we wish to extract for each fature, and declarative expression
of dependancies between features.
This engine code has greater complexity than what exists today for sure,
but the key thing is that this is mostly a one time upfront code, and is
the least interesting part of the tool. What is most interest is the set
of features being checked for and the relationships between the features.
Having the declarative part separated from the functional part is where
the value arises.
- Changes to the functional code apply consistently across all
the checks made. For example, an improvement to error handling
to regex processing was done once and applied to all the checks
that used regex. Or consistently handling when reading files
that are potentially missing on systems.
- New checks can be introduced in a more reliable and consistent
manner. These CPU vulnerability checks are an example of this
in practice. We merely have to express what files we're interested
in and how to match the relevant data in them. No need to worry
about reading files, compiling & running regexs, making sure the
error handling was done in the same way as other existing checks.
- The declarative data can be processed algorithmically. For example
given the depedancies expressed between the facts, I saw that it
was not neccessary to define all the facts in the same data file.
It was possible split them across many files, load those files in
any order, and algorithmically determine exactly which subset of
checks need to be executed & in what order.
- Having separated data from the code it is obviously possible to
extend this without introducing any code changes. This is possible
to use from outside libvirt. For example there are usage scenarios
which may not be supported by certain versions of QEMU. The QEMU
package can drop in some facts which will be picked up by the
virt-host-validate tool. Alternatively going up the stack, an
app like OpenStack which uses libvirt may wish to restrict what
they use, or wish to check extra virt host features interesting
to their app's usage of libvirt. Again their package can now
drop in extra facts.
- New features can be built ontop of the declarative data. At first
I just set out to replicate the existing tool's output as is. Once
that was done, it was obvious that we could trivially add a new
feature allowing us to print the raw informationm about each
fact that was checked, as an alternative to the human targetted
message strings. IOW we got a machine readable output format
essentially for free by virtue of using declarative data.
- The implementation can be rewritten again at will. If the original
C code had this split of data vs processing logic, this conversion
of langauge would have been even more compelling, as none of othe
declarative data would have needed changing. I'm not expecting
another rewrite, but this capability was already valuable, between
the v1 and v2 posting of this patch I changed from XML to YAML for
the data format. This conversion was entirely automated, because
it could be algorithmically processed.
These are all typical & expected benefits that arise from separating
functional logic from declarative data. In the long term this is a
clear win, & as I mentioned I would have done this sooner had it not
been for our use of C.
If I was to pick a new feature we could benefit from, I would much
rather prefer
having an opt-in for report-home of HW features and usage for some very rough
anonymous statistics.
I don't think we want to get involved in reporting user data back to any
server, as it is a data privacy minefield. This tool can be used by other
existing reporting tools though - eg Red Hat distros often include the
'sosreport' tool that bundles stuff up for reporting customer tickets.
Regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|