Re: [libvirt] [PATCH v10 2/4] domain: Add optional 'tls' attribute for TCP chardev

20 Oct 2016

      [...]
...
...
...
...
+    <p>
+      <span class="since">Since 2.4.0,</span> the optional attribute
+      <code>tls</code> can be used to control whether a serial chardev
+      TCP communication channel would utilize a hypervisor configured
+      TLS X.509 certificate environment in order to encrypt the data
+      channel. For the QEMU hypervisor, usage of a TLS envronment can
+      be controlled on the host by the <code>chardev_tls</code> and
+      <code>chardev_tls_x509_cert_dir</code> or
+      <code>default_tls_x509_cert_dir</code> settings in the file
+      /etc/libvirt/qemu.conf. If <code>chardev_tls</code> is enabled,
+      then unless the <code>tls</code> attribute is set to "no", libvirt
+      will use the host configured TLS environment.
+      If <code>chardev_tls</code> is disabled, but the <code>tls</code>
+      attribute is set to "yes", then libvirt will attempt to use the
+      host TLS environment if either the <code>chardev_tls_x509_cert_dir</code>
+      or <code>default_tls_x509_cert_dir</code> TLS directory structure exists.
+    </p>
Nice, this is a good description how to use the *tls* attribute.
BTW (regarding your followup reply):
The 4 "consumers" of virDomainChrSourceDefParseXML (where this would be
parsed) refer to this as a "serial chardev"
This is a generic function that parses source for a lot of different device
types/
Shortcutting in my mind to <source mode='{connect|bind} host='%s'
service='%s'/> which is the VIR_DOMAIN_DEVICE_CHR, smartcard, rng, and
redirdev. And yes, VIR_DOMAIN_DEVICE_CHR has paths for parallels,
serials, consoles, and channels that are defined using <%s type='tcp'>.
...
...
The 'virDomainChrDefParseXML' comments have a list of "<serial ..." XML
types. The location of the above description is describing a <serial
type="tcp"> definition.
Well, the comment have that list but is used to parse all character devices,
not only serial char device.  TLS encryption can be used also for those types:
parallel, channel, and console.
My continual battle against under documentation.  The code is not self
documenting in all cases...
...
...
The 'smartcard' discussion for a 'passthrough' device that would use
this code says "Rather than having the hypervisor directly communicate
with the host, it is possible to tunnel all requests through a secondary
character device to a third-party provider (which may in turn be talking
to a smartcard or using three certificate files). In this mode of
operation, an additional attribute type is required, matching one of the
supported serial device types, to describe the host side of the tunnel;..."
This comment is wrong, it should be "supported character device types".  This
attribute tells what interface is presented to the host.  Check this part of
documentation for all character devices:
<http://libvirt.org/formatdomain.html#elementsConsole>
there is this sentence:
"The interface presented to the host is given in the type attribute of the
  top-level element. The host interface is configured by the source element."
So it refers to all host interfaces:
<http://libvirt.org/formatdomain.html#elementsCharHostInterface>
...
The 'rng' discussion for backend that would use this code says "This
backend connects to a source using the EGD protocol. The source is
specified as a character device. Refer to character device host
interface for more information. ..."
This is correct, there is no reference to serial character device.
...
Redevdir says "An additional attribute type is required, matching one of
the supported serial device types, to describe the host side of the
tunnel; type='tcp' or type='spicevmc' ..."
This is the same case as smartcard device, this is wrong.
...
So the long and short of it is, IMO it's a serial chardev device.
Semantically it could be claimed otherwise, but the parsing proves
otherwise as does the existing documentation of "Host interface"
character devices.
I prefer to keep it described as is. It's only ever used, parsed, etc.
when <devices>... <serial type="tcp">... <source mode='connect'..."
If anything, the description should become more restrictive to indicate
that the option shouldn't be used for smartcards, rngs, and redirdevs,
but I'll save that discussion for patch 3.
Based on the documentation it may appear that it should be a serial chardev,
but that's misleading and it should refer only to the "host interfaces of
character devices".
I cannot begin to describe how many times I've scrolled up and down
through that discussion and thought how does anyone get this stuff
correct... Trial and error I suppose.

In any case, it seems of the 3 the rng is the most correct and the other
two should get patches in order to be more correct. Not sure I can do it
justice. It would seem to me that smartcard and redirdev should use the
pointer to the elementsCharHostInterface.

Still for the purposes of supported 'elementsCharHostInterface' when
being used for specific smartcard, rng, and redirdev entries that are
using "type='tcp'", only the <source mode='{connect|bind}' .../> would
"appear to me" to apply as the "style" one would use in order to use
TLS. That style just happens to have examples that list <serial
type="tcp" <source mode=... />.

Hence why I see this as "a serial chardev TCP" or a "host interface
serial chardev TCP". There's got to be some means to describe it that
focuses the attention on the <source ...> and not the "<%s type="tcp">"
that I focused on.

[...]
...
...
...
...
bool
-qemuDomainSupportTLSChardevTCP(virQEMUDriverConfigPtr cfg)
+qemuDomainSupportTLSChardevTCP(virQEMUDriverConfigPtr cfg,
+                               const virDomainChrSourceDef *dev)
 {
-    if (cfg->chardevTLS)
+    if (cfg->chardevTLS && dev->data.tcp.haveTLS != VIR_TRISTATE_BOOL_NO)
+        return true;
+    if (!cfg->chardevTLS && dev->data.tcp.haveTLS == VIR_TRISTATE_BOOL_YES &&
+        virFileExists(cfg->chardevTLSx509certdir))
         return true;
     return false;
 }
So this function let's you decide whether we should try to set up *tls* for
chardev or not.  It work's but I have few issues with it.
At first I don't like that libvirt would try to do something smart and don't
even tell user about the result.  This will silently ignore the *tls*
attribute if no certificate is found.  In case that *tls* attribute is set
to "yes" in XML and there is no certificate file to use we shouldn't start
that domain and print an error to user.
This is a boolean function - so printing an error here isn't right.
Well, my comment implies to not use boolean function.
The callers were boolean checks, hence the generation of a boolean function.
...
...
Adding something to post parse processing is possible, but there's an
impact based on whether we're setting a default value for haveTLS
(something I disagree with doing).
Beyond that would a check go in qemuDomainDeviceDefPostParse or
qemuDomainDeviceDefValidate? It's never crystal clear to me which should
be used when just reading the code. Although since I see this as a "new"
and "optional" value, I'd lean toward PostParse. Then there's dealing
with the parseFlags that it's impacted by other decisions.
I could also rationalize that someone adding "tls='yes'" to their
chardev would "know" what they're doing because they read the
documentation. How else would they know to have this very specific
combination (unless of course 'something' set things up that way based
on the assumption of how a domain is currently running).
Again, IMO 'haveTLS' is new and optional. The only indication that a
domain is using TLS was handled via 'tlscreds'. IIRC, the "reason" that
'tlscreds' exists and how "tls-creds" gets added to the command line is
because the JSON code processing for a chardev is buried in
qemu_monitor_json.c and fishing for a host configuration option at that
time wasn't viable.
...
Secondly this way we don't reflect the current state for live domain in the
live XML.  This was probably lost during the discussion, but in general if
there is an attribute that can affect running domain we should reflect the
current state using that attribute.  I know, there are some cases where we
probably don't do that and they should be fixed.
No it wasn't lost - I considered it as I see you've seen from the cover.
And you're probably right regarding attributes that trigger usage of
some qemu option that we don't specifically save in the status XML.
That's a different rat hole.
...
I figure out that we cannot simply use haveTLS = cfg->chardevTLS but we
can set the haveTLS based on cfg->chardevTLS.  The whole purpose of
qemuProcessPrepareDomain() is to prepare the domain definition so the
qemuBuildCommandLine() don't have to check other places to enable some
feature and not update the live definition.  If the *tls* attribute is
properly set in live definition that it will be saved to status XML and
there is no need to do anything for qemuProcessReconnect.
I disagree on setting haveTLS during qemuProcessReconnect based upon
chardevTLS. The 'haveTLS' is an optional attribute and by setting a
value I believe we end up making an assumption.
I wrote that there is no need to do anything for qemuProcessReconnect.
I misinterpreted, but I also still have in my mind the previous
discussion on this.
...
...
If *anything* was to be done it would be based solely on whether
"tls-creds" is set on the command line of the reconnected domain.
However, that too has a similar problem about setting a value for an
optional attribute based upon the assumption that we know better.
Again, the only indication that 'tls-creds' is on the command line was
from the 'tlscreds' boolean that was set because the host configuration
information was available. A domain that is running and has tls-creds
will continue to have it. Altering that domain's configuration file
because we add a new optional *configuration* value has no bearing on
the *status* XML. When/if a configuration XML is updated, it's not
checking that 'tlscreds' value to determine that at some point in
history the domain used TLS because the host was configured that way.
The whole point of having 'tls' attribute in live XML is to ensure that when
libvirtd is restarted the attribute is still present in the XML because it will
be saved to status XML and loaded again from status XML.  There is no need to do
any magic by parsing qemu command line, we have statu XML for that purpose to
store all information about domain.
Let's see, you see the "tls={'yes'|'no'}" as essentially replacing the
chardev_tls qemu.conf variable.

I don't see it that way. The whole purpose of an optional property is
just that to be optional. I should be able to choose to add it or not.
If it's there and I remove it, but then on every restart libvirt
replaces it - that seems to go against the idea of being optional. If it
never existed and then it shows up as soon as I start the domain; is
that right?

If I'm comparing pure migration xml, wouldn't the to/newer system now
have the field defined (thus creating different XML). Does that migrated
safely back to the previous version (2.3.0)? It would have a field that
the old system wouldn't know what to do with.
...
...
Let's consider the bool function from above and this automagic setting
being requested. Let's say we reconnect to a domain, find the
'tls-creds' set on the command line, and set 'haveTLS=YES' based solely
on that. Let's say at some point in time after, someone edits their
qemu.conf file and sets 'chardev_tls=0' (or comments out the
'chardev_tls=1'). In their mind, they've now disabled chardev TLS for
their host and any domain they will run in the future. They stop the
domain they knew was running using TLS before and restart it expecting
that it won't use TLS anymore, but on restart they discover that in our
infinite wisdom we have set the optional "tls='yes'" property for that
chardev on that domain. Now if we 'error' out on that start like you
request above, then that means they will have to edit their domain and
remove the seemingly optional property.
Next, let's assume they read the documentation and found that they can
disable the qemu.conf value, but still have the domain chardev value if
they set the "tls='yes'" property as long as they have their valid TLS
directory configuration. In this case, they have made the conscious
decision how they want their domain configured based on what they know
is configured on the host.
TBH: I take this single patch as a "feature request" add-on to the
original feature request that I believe in the long run won't be used. I
could be wrong, but it's a feeling.
Furthermore, the purpose of any optional attribute is just that. It's
optional based on some host wide setting. It's up to the consumer to
decide how they should proceed, not the software to make that decision
for them.
I guess that I'm unable to describe exactly what I mean so I'm attaching two
patches, one is for introducing TLS attribute and the second one is to make
sure that migration to libvirt-2.3.0 will work.
And I think the provided code proves that point. While "technically"
still optional in the schema, the change in qemuProcessPrepareDomain
forces it to be set as soon as a domain is started. Again, a bistate !=
tristate. We don't track the difference between undefined or defined,
but set to 0 other than in the return values from qemu.conf parsing. I
already had to remove a boolean that tracked that from a prior review.

I think it should be strictly optional and that's where we differ. I see
no reason to change the domain xml unless as a consumer that's what you
want to do - be able to control which domains will have the setting.
What else would be the purpose of a host wide setting to go with a
domain optional setting?

Finally, if your idea is accepted, that means for any configuration with
chardev_tls=0 (either because it's commented or set that way), every
domain that starts will be updated to have this new attribute
"tls='no'". Then one day, I read up on this wonderful new feature and
modify my qemu.conf file to set chardev_tls=1 and set up the TLS
environment properly. I go to start my domain, but wait it's not using
it. Closer inspection finds, someone put "tls='no'" into my domain... To
me that's not right.  And I won't necessarily know unless I know to look
at the cmdline of the started domain to find that 'tls-creds' or I in
some way "track" when TLS is being used.

John

Re: [libvirt] [PATCH v10 2/4] domain: Add optional 'tls' attribute for TCP chardev

John Ferlan