[libvirt] [PATCH v2] nodedev: Document the udevEventHandleThread
by John Ferlan
Commit cdbe1332 neglected to document the API. So let's add some
details about the algorithm and why it was used to help future
readers understand the issues encountered. Based largely off a
description provided by Erik Skultety <eskultet(a)redhat.com>.
NB: Management of the processing udev device notification is a
delicate balance between the udev process, the scheduler, and when
exactly the data from/for the socket is received. The balance is
particularly important for environments when multiple devices are
added into the system more or less simultaneously such as is done
for mdev. In these cases scheduler blocking on the udev recv() call
is more noticeable. It's expected that future devices will follow
similar algorithms. Even though the algorithm does present some
challenges for older OS's (such as Centos 6), trying to rewrite
the algorithm to fit both models would be more complex and involve
pulling the monitor object out of the private data lockable object
and would need to be guarded by a separate lock. Devising such an
algorithm to work around issues with older OS's at the expense
of more modern OS algorithms in newer event processing code may
result in unexpected issues, so the choice is to encourage use
of newer OS's with newer udev event processing code.
Signed-off-by: John Ferlan <jferlan(a)redhat.com>
---
v1: https://www.redhat.com/archives/libvir-list/2018-October/msg01053.html
Changes are from code review with some minor tweaks/editing as I
went along. Mistakes are my own!
src/node_device/node_device_udev.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c
index 22897591de..f2c2299d4d 100644
--- a/src/node_device/node_device_udev.c
+++ b/src/node_device/node_device_udev.c
@@ -1591,6 +1591,26 @@ udevEventMonitorSanityCheck(udevEventDataPtr priv,
}
+/**
+ * udevEventHandleThread
+ * @opaque: unused
+ *
+ * Thread to handle the udevEventHandleCallback processing when udev
+ * tells us there's a device change for us (add, modify, delete, etc).
+ *
+ * Once notified there is data to be processed, the actual @device
+ * data retrieval by libudev may be delayed due to how threads are
+ * scheduled. In fact, the event loop could be scheduled earlier than
+ * the handler thread, thus potentially emitting the very same event
+ * the handler thread is currently trying to process, simply because
+ * the data hadn't been retrieved from the socket.
+ *
+ * NB: Usage of event based socket algorithm causes some issues with
+ * older platforms such as CentOS 6 in which the data socket is opened
+ * without the NONBLOCK bit set. That can be avoided by moving the reset
+ * priv->dataReady to just after the udev_monitor_receive_device; however,
+ * scheduler issues would still come into play.
+ */
static void
udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
{
@@ -1637,6 +1657,9 @@ udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
return;
}
+ /* Trying to move the reset of the @priv->dataReady flag to
+ * after the udev_monitor_receive_device wouldn't help much
+ * due to event mgmt and scheduler timing. */
virObjectLock(priv);
priv->dataReady = false;
virObjectUnlock(priv);
@@ -1646,6 +1669,11 @@ udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
udevHandleOneDevice(device);
udev_device_unref(device);
+
+ /* Instead of waiting for the next event after processing @device
+ * data, let's keep reading from the udev monitor and only wait
+ * for the next event once either a EAGAIN or a EWOULDBLOCK error
+ * is encountered. */
}
}
--
2.17.2
6 years
[libvirt] [jenkins-ci PATCH v2 0/3] Add libvirt-ocaml
by Pino Toscano
Now that the libvirt-ocaml repository is fixed, let's build it in CI.
Changes from v1 to v2:
- split according to Andrea's hints
- removed --with-libvirt as argument for configure, no more needed
after upstream changes
Pino Toscano (3):
guests: add mappings for OCaml components
guests: pull dependencies for libvirt-ocaml
Add libvirt-ocaml project
guests/host_vars/libvirt-centos-7/main.yml | 1 +
guests/host_vars/libvirt-debian-9/main.yml | 1 +
guests/host_vars/libvirt-debian-sid/main.yml | 1 +
guests/host_vars/libvirt-fedora-27/main.yml | 1 +
guests/host_vars/libvirt-fedora-28/main.yml | 1 +
.../host_vars/libvirt-fedora-rawhide/main.yml | 1 +
guests/host_vars/libvirt-freebsd-10/main.yml | 1 +
guests/host_vars/libvirt-freebsd-11/main.yml | 1 +
.../libvirt-freebsd-current/main.yml | 1 +
guests/host_vars/libvirt-ubuntu-16/main.yml | 1 +
guests/host_vars/libvirt-ubuntu-18/main.yml | 1 +
guests/playbooks/build/jobs/defaults.yml | 2 ++
.../build/projects/libvirt-ocaml.yml | 27 +++++++++++++++++++
guests/vars/mappings.yml | 6 +++++
guests/vars/projects/libvirt-ocaml.yml | 5 ++++
jobs/defaults.yaml | 2 ++
projects/libvirt-ocaml.yaml | 23 ++++++++++++++++
17 files changed, 76 insertions(+)
create mode 100644 guests/playbooks/build/projects/libvirt-ocaml.yml
create mode 100644 guests/vars/projects/libvirt-ocaml.yml
create mode 100644 projects/libvirt-ocaml.yaml
--
2.17.2
6 years
[libvirt] [PATCH] qemu: Restore lost shutdown reason
by John Ferlan
When qemuProcessReconnectHelper was introduced (commit d38897a5d)
reconnection failure used VIR_DOMAIN_SHUTOFF_FAILED; however, that
was changed in commit bda2f17d to either VIR_DOMAIN_SHUTOFF_CRASHED
or VIR_DOMAIN_SHUTOFF_UNKNOWN.
When QEMU_CAPS_NO_SHUTDOWN checking was removed in commit fe35b1ad6
the conditional state was just left at VIR_DOMAIN_SHUTOFF_CRASHED.
Restore the logic adjustment using the same conditions as command
line building and alter the comment to describe the reasoning.
Signed-off-by: John Ferlan <jferlan(a)redhat.com>
---
This was "discovered" while reviewing Nikolay's patch related
to adding "DAEMON" as the shutdown reason:
https://www.redhat.com/archives/libvir-list/2018-October/msg00493.html
While working through the iterations of that - figured I'd at
least post this.
src/qemu/qemu_process.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 3955eda17c..4a39111d51 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -7985,11 +7985,16 @@ qemuProcessReconnect(void *opaque)
if (virDomainObjIsActive(obj)) {
/* We can't get the monitor back, so must kill the VM
* to remove danger of it ending up running twice if
- * user tries to start it again later
- * If we couldn't get the monitor since QEMU supports
- * no-shutdown, we can safely say that the domain
- * crashed ... */
- state = VIR_DOMAIN_SHUTOFF_CRASHED;
+ * user tries to start it again later.
+ *
+ * If we cannot get to the monitor when the QEMU command
+ * line used -no-shutdown, then we can safely say that the
+ * domain crashed; otherwise we don't really know. */
+ if (priv->monJSON && priv->allowReboot == VIR_TRISTATE_BOOL_YES)
+ state = VIR_DOMAIN_SHUTOFF_CRASHED;
+ else
+ state = VIR_DOMAIN_SHUTOFF_UNKNOWN;
+
/* If BeginJob failed, we jumped here without a job, let's hope another
* thread didn't have a chance to start playing with the domain yet
* (it's all we can do anyway).
--
2.17.2
6 years
[libvirt] [PATCH] nodedev: Document the udevEventHandleThread
by John Ferlan
Add details of the algorithm and why it was used to help future
readers understand the issues encountered. Based largely off a
description provided by Erik Skultety <eskultet(a)redhat.com>.
Signed-off-by: John Ferlan <jferlan(a)redhat.com>
---
see: https://www.redhat.com/archives/libvir-list/2018-October/msg00915.html
src/node_device/node_device_udev.c | 49 ++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c
index 22897591de..8ca81e65ad 100644
--- a/src/node_device/node_device_udev.c
+++ b/src/node_device/node_device_udev.c
@@ -1591,6 +1591,53 @@ udevEventMonitorSanityCheck(udevEventDataPtr priv,
}
+/**
+ * udevEventHandleThread
+ * @opaque: unused
+ *
+ * Thread to handle the udevEventHandleCallback processing when udev
+ * tells us there's a device change for us (add, modify, delete, etc).
+ *
+ * Management of the processing udev device notification is a delicate
+ * balance between the udev process, the scheduler, and when exactly the
+ * data from/for the socket is received. The udev processing code notifies
+ * the udevEventHandleCallback that it has data; however, the actual recv()
+ * done in the udev code to fill in the @device structure can be delayed
+ * due to how threads are scheduled.
+ *
+ * As it turns out, the event loop could be scheduled (and it in fact
+ * was) earlier than the handler thread. What that essentially means is
+ * that by the time the thread actually handled the event and read the
+ * data from the monitor, the event loop fired the very same event, simply
+ * because the data hadn't been retrieved from the socket at that point yet.
+ *
+ * Thus this algorithm is that once udevEventHandleCallback is notified
+ * there is data, this loop will process as much data as possible until
+ * udev_monitor_receive_device fails to get the @device. This may be
+ * because we've processed everything or because the scheduling thread
+ * hasn't been able to populate data in the socket. Once we're done
+ * processing everything we can, wait on the next event/notification.
+ * Although this may incur a slight delay if the socket data wasn't
+ * populated, the event/notifications are fairly consistent so there's
+ * no need to use virCondWaitUntil.
+ *
+ * NB: Usage of event based socket algorithm causes some issues with
+ * older platforms such as CentOS 6 in which the data socket is opened
+ * without the NONBLOCK bit set. Still even if the reset of priv->dataReady
+ * were moved to just after the udev_monitor_receive_device in order to
+ * avoid the NONBLOCK issue, the scheduler would still come into play
+ * especially for environments when multiple devices are added into the
+ * system. Even those environments would still be blocked on the udev
+ * socket recv() call. The algorithm for handling that scenario would
+ * be more complex and involve pulling the monitor object out of the
+ * private data lockable object and would need to be guarded by a
+ * separate lock. Devising algorithms to work around issues with older
+ * OS's at the expense of more modern OS algorithms in newer event
+ * processing code may result in unexpected issues, so the choice is
+ * to encourage use of newer OS's with newer udev event processing code.
+ * Newer devices, such as mdevs make heavy usage of event processing
+ * and it's expected future devices would follow that model.
+ */
static void
udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
{
@@ -1637,6 +1684,8 @@ udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
return;
}
+ /* See the above description why this needs to be here
+ * and not after the udev_monitor_receive_device. */
virObjectLock(priv);
priv->dataReady = false;
virObjectUnlock(priv);
--
2.17.2
6 years
[libvirt] [PATCH] qemu: Report warning if QoS is set for vhostuser interface
by Michal Privoznik
https://bugzilla.redhat.com/show_bug.cgi?id=1524230
Because of historical reasons, we are not denying starting a
domain which has QoS set for unsupported type of device. We do
report just a warning instead. And even though we perhaps used to
do so for vhostuser it got lost somewhere. Bring it back.
Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
---
src/qemu/qemu_command.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index 6e3ff67660..489e8bc689 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -8246,6 +8246,12 @@ qemuBuildVhostuserCommandLine(virQEMUDriverPtr driver,
goto cleanup;
}
+ if (virDomainNetGetActualBandwidth(net)) {
+ VIR_WARN("setting bandwidth on interfaces of "
+ "type '%s' is not implemented yet",
+ virDomainNetTypeToString(VIR_DOMAIN_NET_TYPE_VHOSTUSER));
+ }
+
switch ((virDomainChrType)net->data.vhostuser->type) {
case VIR_DOMAIN_CHR_TYPE_UNIX:
if (!(chardev = qemuBuildChrChardevStr(logManager, secManager,
--
2.18.1
6 years
[libvirt] [PATCH libvirt-python] event-test.py: Report ERROR events
by Philipp Hahn
VIR_DOMAIN_EVENT_ID_IO_ERROR and VIR_DOMAIN_EVENT_ID_IO_ERROR_REASON
callbacks receive the same 'action' parameter, so also translate that
numeric action to a descriptive text for the first callback.
Signed-off-by: Philipp Hahn <hahn(a)univention.de>
---
examples/event-test.py | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/examples/event-test.py b/examples/event-test.py
index dabf4b0..709277b 100755
--- a/examples/event-test.py
+++ b/examples/event-test.py
@@ -526,8 +526,8 @@ def myDomainEventWatchdogCallback(conn, dom, action, opaque):
def myDomainEventIOErrorCallback(conn, dom, srcpath, devalias, action, opaque):
- print("myDomainEventIOErrorCallback: Domain %s(%s) %s %s %d" % (
- dom.name(), dom.ID(), srcpath, devalias, action))
+ print("myDomainEventIOErrorCallback: Domain %s(%s) %s %s %s" % (
+ dom.name(), dom.ID(), srcpath, devalias, ERROR_EVENTS[action]))
def myDomainEventIOErrorReasonCallback(conn, dom, srcpath, devalias, action, reason, opaque):
--
2.11.0
6 years
[libvirt] [PATCH libvirt-python] event-test.py: Fix ERROR event
by Philipp Hahn
ERROR_EVENTS translates the numeric 'action' argument to a description,
not the 'reason' argument which already contains a descriptive string
like 'enospc'.
> Traceback (most recent call last):
> File "/usr/lib/python2.7/dist-packages/libvirt.py", line 4661, in _dispatchDomainEventIOErrorReasonCallback
> reason, opaque)
> File "libvirt-python/examples/event-test.py", line 536, in myDomainEventIOErrorReasonCallback
> dom.name(), dom.ID(), srcpath, devalias, action, ERROR_EVENTS[reason]))
> File "libvirt-python/examples/event-test.py", line 474, in __getitem__
> data = self.args[item]
> TypeError: tuple indices must be integers, not str
Fixes: f5928c6711654f1496707ca77f626b3192843d57
Signed-off-by: Philipp Hahn <hahn(a)univention.de>
---
examples/event-test.py | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/examples/event-test.py b/examples/event-test.py
index eeb2777..dabf4b0 100755
--- a/examples/event-test.py
+++ b/examples/event-test.py
@@ -531,8 +531,8 @@ def myDomainEventIOErrorCallback(conn, dom, srcpath, devalias, action, opaque):
def myDomainEventIOErrorReasonCallback(conn, dom, srcpath, devalias, action, reason, opaque):
- print("myDomainEventIOErrorReasonCallback: Domain %s(%s) %s %s %d %s" % (
- dom.name(), dom.ID(), srcpath, devalias, action, ERROR_EVENTS[reason]))
+ print("myDomainEventIOErrorReasonCallback: Domain %s(%s) %s %s %s %s" % (
+ dom.name(), dom.ID(), srcpath, devalias, ERROR_EVENTS[action], reason))
def myDomainEventGraphicsCallback(conn, dom, phase, localAddr, remoteAddr, authScheme, subject, opaque):
--
2.11.0
6 years
[libvirt] [PATCH v2] vhost-user: define conventions for vhost-user backends
by Marc-André Lureau
As discussed during "[PATCH v4 00/29] vhost-user for input & GPU"
review, let's define a common set of backend conventions to help with
management layer implementation, and interoperability.
v2:
- drop --pidfile
- add some notes about daemonizing & stdin/out/err
Cc: libvir-list(a)redhat.com
Cc: Gerd Hoffmann <kraxel(a)redhat.com>
Cc: Daniel P. Berrangé <berrange(a)redhat.com>
Cc: Changpeng Liu <changpeng.liu(a)intel.com>
Cc: Dr. David Alan Gilbert <dgilbert(a)redhat.com>
Cc: Felipe Franciosi <felipe(a)nutanix.com>
Cc: Gonglei <arei.gonglei(a)huawei.com>
Cc: Maxime Coquelin <maxime.coquelin(a)redhat.com>
Cc: Michael S. Tsirkin <mst(a)redhat.com>
Cc: Victor Kaplansky <victork(a)redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau(a)redhat.com>
---
docs/interop/vhost-user.txt | 109 +++++++++++++++++++++++++++++++++++-
1 file changed, 107 insertions(+), 2 deletions(-)
diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index ba5e37d714..339b335e9c 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -17,8 +17,13 @@ The protocol defines 2 sides of the communication, master and slave. Master is
the application that shares its virtqueues, in our case QEMU. Slave is the
consumer of the virtqueues.
-In the current implementation QEMU is the Master, and the Slave is intended to
-be a software Ethernet switch running in user space, such as Snabbswitch.
+In the current implementation QEMU is the Master, and the Slave is the
+external process consuming the virtio queues, for example a software
+Ethernet switch running in user space, such as Snabbswitch, or a block
+device backend processing read & write to a virtual disk. In order to
+facilitate interoperability between various backend implementations,
+it is recommended to follow the "Backend program conventions"
+described in this document.
Master and slave can be either a client (i.e. connecting) or server (listening)
in the socket communication.
@@ -859,3 +864,103 @@ resilient for selective requests.
For the message types that already solicit a reply from the client, the
presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
no behavioural change. (See the 'Communication' section for details.)
+
+Backend program conventions
+---------------------------
+
+vhost-user backends provide various services and they may need to be
+configured manually depending on the use case. However, it is a good
+idea to follow the conventions listed here when possible. Users, QEMU
+or libvirt, can then rely on some common behaviour to avoid
+heterogenous configuration and management of the backend program and
+facilitate interoperability.
+
+In order to be discoverable, default vhost-user backends should be
+located under "/usr/libexec", and be named "vhost-user-$device" where
+"$device" is the device name in lower-case following the name listed
+in the Linux virtio_ids.h header (ex: the VIRTIO_ID_RPROC_SERIAL
+backend would be named "vhost-user-rproc-serial").
+
+Mechanisms to list, and to select among alternatives implementations
+or modify the default backend are not described at this point (a
+distribution may use update-alternatives, for example, to list and to
+pick a different default backend).
+
+The backend program must not daemonize itself, but it may be
+daemonized by the management layer. It may also have a restricted
+access to the system.
+
+File descriptors 0, 1 and 2 will exist, and have regular
+stdin/stdout/stderr usage (they may be redirected to /dev/null by the
+management layer, or to a log handler).
+
+The backend program must end (as quickly and cleanly as possible) when
+the SIGTERM signal is received. Eventually, it may be SIGKILL by the
+management layer after a few seconds.
+
+The following command line options have an expected behaviour. They
+are mandatory, unless explicitly said differently:
+
+* --socket-path=PATH
+
+This option specify the location of the vhost-user Unix domain socket.
+It is incompatible with --fd.
+
+* --fd=FDNUM
+
+When this argument is given, the backend program is started with the
+vhost-user socket as file descriptor FDNUM. It is incompatible with
+--socket-path.
+
+* --print-capabilities
+
+Output to stdout a line-seperated list of backend capabilities, and
+then exit successfully. Other options and arguments should be ignored,
+and the backend program should not perform its normal function.
+
+At the time of writing, there are no common capabilities. Some
+device-specific capabilities are listed in the respective sections. By
+convention, device-specific capabilities are prefixed by their device
+name.
+
+vhost-user-input program conventions
+------------------------------------
+
+Capabilities:
+
+input-evdev-path
+
+ The --evdev-path command line option is supported.
+
+input-no-grab
+
+ The --no-grab command line option is supported.
+
+* --evdev-path=PATH (optional)
+
+Specify the linux input device.
+
+* --no-grab (optional)
+
+Do no request exclusive access to the input device.
+
+vhost-user-gpu program conventions
+----------------------------------
+
+Capabilities:
+
+gpu-render-node
+
+ The --render-node command line option is supported.
+
+gpu-virgl
+
+ The --virgl command line option is supported.
+
+* --render-node=PATH (optional)
+
+Specify the GPU DRM render node.
+
+* --virgl (optional)
+
+Enable virgl rendering support.
--
2.19.0
6 years