[libvirt] [PATCH v2] qemu: Connect to guest agent iff needed

https://bugzilla.redhat.com/show_bug.cgi?id=1293351 I've came across a bit of a silly scenario, but long story short: after domain was resumed, the virDomainSetTime() API hung for 5 seconds after which it failed with an error. Problem was, that there is no guest agent installed in the domain. But this got me thinking and digging into the code. So even though we do listen to VSERPORT_CHANGE events and connect and disconnect ourselves to guest agent, we still do connect to the guest agent at both domain startup and resume. This is a bit silly as it produces the delay - basically, because we connect to the guest agent, priv->agent is not NULL. Therefore qemuDomainAgentAvailable() will return true. And the place where we hang? Well, historically, when there was no VSERPORT_CHANGE event, we used a dummy ping command to the guest which has 5 seconds timeout. And it's still there and effective. So there's where the delay comes from. What's the resolution? Well, I'm introducing new capability QEMU_CAPS_VSERPORT_CHANGE that is enabled iff qemu is capable of the event. And, during domain startup, reconnect after resume and attach we do connect to the agent socket iff the capability is NOT set. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_domain.c | 11 ++++------- src/qemu/qemu_domain.h | 2 +- src/qemu/qemu_driver.c | 2 +- src/qemu/qemu_process.c | 25 +++++++++++++++++++------ src/qemu/qemu_process.h | 4 +++- tests/qemucapabilitiesdata/caps_2.1.1-1.caps | 1 + tests/qemucapabilitiesdata/caps_2.4.0-1.caps | 1 + tests/qemucapabilitiesdata/caps_2.5.0-1.caps | 1 + 10 files changed, 34 insertions(+), 16 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 6e5d203..9e11af9 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -308,6 +308,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, "virtio-tablet", /* 205 */ "virtio-input-host", + "vserport-change-event", ); @@ -1479,6 +1480,7 @@ struct virQEMUCapsStringFlags virQEMUCapsEvents[] = { { "SPICE_MIGRATE_COMPLETED", QEMU_CAPS_SEAMLESS_MIGRATION }, { "DEVICE_DELETED", QEMU_CAPS_DEVICE_DEL_EVENT }, { "MIGRATION", QEMU_CAPS_MIGRATION_EVENT }, + { "VSERPORT_CHANGE", QEMU_CAPS_VSERPORT_CHANGE }, }; struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 61d6379..983faf6 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -335,6 +335,7 @@ typedef enum { /* 205 */ QEMU_CAPS_VIRTIO_TABLET, /* -device virtio-tablet-{device,pci} */ QEMU_CAPS_VIRTIO_INPUT_HOST, /* -device virtio-input-host-{device,pci} */ + QEMU_CAPS_VSERPORT_CHANGE, /* VSERPORT_CHANGE event */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index bb8d47f..1c546f8 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -3650,10 +3650,9 @@ qemuDomainSupportsBlockJobs(virDomainObjPtr vm, * Returns the pointer to the channel definition that is used to access the * guest agent if the agent is configured or NULL otherwise. */ -virDomainChrSourceDefPtr +virDomainChrDefPtr qemuFindAgentConfig(virDomainDefPtr def) { - virDomainChrSourceDefPtr config = NULL; size_t i; for (i = 0; i < def->nchannels; i++) { @@ -3662,13 +3661,11 @@ qemuFindAgentConfig(virDomainDefPtr def) if (channel->targetType != VIR_DOMAIN_CHR_CHANNEL_TARGET_TYPE_VIRTIO) continue; - if (STREQ_NULLABLE(channel->target.name, "org.qemu.guest_agent.0")) { - config = &channel->source; - break; - } + if (STREQ_NULLABLE(channel->target.name, "org.qemu.guest_agent.0")) + return channel; } - return config; + return NULL; } diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index cff48d7..ca88f8a 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -487,7 +487,7 @@ int qemuDomainAlignMemorySizes(virDomainDefPtr def); void qemuDomainMemoryDeviceAlignSize(virDomainDefPtr def, virDomainMemoryDefPtr mem); -virDomainChrSourceDefPtr qemuFindAgentConfig(virDomainDefPtr def); +virDomainChrDefPtr qemuFindAgentConfig(virDomainDefPtr def); bool qemuDomainMachineIsQ35(const virDomainDef *def); bool qemuDomainMachineIsI440FX(const virDomainDef *def); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index e8ba3a6..fc00073 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4486,7 +4486,7 @@ processSerialChangedEvent(virQEMUDriverPtr driver, if (STREQ_NULLABLE(dev.data.chr->target.name, "org.qemu.guest_agent.0")) { if (newstate == VIR_DOMAIN_CHR_DEVICE_STATE_CONNECTED) { if (!priv->agent) { - if ((rc = qemuConnectAgent(driver, vm)) == -2) + if ((rc = qemuConnectAgent(driver, vm, true)) == -2) goto endjob; if (rc < 0) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index f274068..3aca227 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -198,16 +198,29 @@ static qemuAgentCallbacks agentCallbacks = { int -qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm) +qemuConnectAgent(virQEMUDriverPtr driver, + virDomainObjPtr vm, + bool force) { qemuDomainObjPrivatePtr priv = vm->privateData; int ret = -1; qemuAgentPtr agent = NULL; - virDomainChrSourceDefPtr config = qemuFindAgentConfig(vm->def); + virDomainChrDefPtr config = qemuFindAgentConfig(vm->def); if (!config) return 0; + if (!force && + virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_VSERPORT_CHANGE) && + config->state == VIR_DOMAIN_CHR_DEVICE_STATE_DISCONNECTED) { + /* With newer qemus capable of VSERPORT_CHANGE event, we are connecting and + * disconnecting to the guest agent as it connects or disconnects to the + * channel within the guest. So, it's important to connect here iff we are + * running qemu not capable of the event or we are called from the event + * callback (@force == true). */ + return 0; + } + if (virSecurityManagerSetDaemonSocketLabel(driver->securityManager, vm->def) < 0) { VIR_ERROR(_("Failed to set security context for agent for %s"), @@ -223,7 +236,7 @@ qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm) virObjectUnlock(vm); agent = qemuAgentOpen(vm, - config, + &config->source, &agentCallbacks); virObjectLock(vm); @@ -3525,7 +3538,7 @@ qemuProcessReconnect(void *opaque) goto error; /* Failure to connect to agent shouldn't be fatal */ - if ((ret = qemuConnectAgent(driver, obj)) < 0) { + if ((ret = qemuConnectAgent(driver, obj, false)) < 0) { if (ret == -2) goto error; @@ -4950,7 +4963,7 @@ qemuProcessLaunch(virConnectPtr conn, goto cleanup; /* Failure to connect to agent shouldn't be fatal */ - if ((rv = qemuConnectAgent(driver, vm)) < 0) { + if ((rv = qemuConnectAgent(driver, vm, false)) < 0) { if (rv == -2) goto cleanup; @@ -5665,7 +5678,7 @@ int qemuProcessAttach(virConnectPtr conn ATTRIBUTE_UNUSED, goto error; /* Failure to connect to agent shouldn't be fatal */ - if ((ret = qemuConnectAgent(driver, vm)) < 0) { + if ((ret = qemuConnectAgent(driver, vm, false)) < 0) { if (ret == -2) goto error; diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index 85e3a06..555181f 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -151,6 +151,8 @@ int qemuProcessSPICEAllocatePorts(virQEMUDriverPtr driver, virDomainDiskDefPtr qemuProcessFindDomainDiskByAlias(virDomainObjPtr vm, const char *alias); -int qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm); +int qemuConnectAgent(virQEMUDriverPtr driver, + virDomainObjPtr vm, + bool force); #endif /* __QEMU_PROCESS_H__ */ diff --git a/tests/qemucapabilitiesdata/caps_2.1.1-1.caps b/tests/qemucapabilitiesdata/caps_2.1.1-1.caps index 1098dcf..332b85a 100644 --- a/tests/qemucapabilitiesdata/caps_2.1.1-1.caps +++ b/tests/qemucapabilitiesdata/caps_2.1.1-1.caps @@ -159,4 +159,5 @@ <flag name='rtl8139'/> <flag name='e1000'/> <flag name='virtio-net'/> + <flag name='vserport-change-event'/> </qemuCaps> diff --git a/tests/qemucapabilitiesdata/caps_2.4.0-1.caps b/tests/qemucapabilitiesdata/caps_2.4.0-1.caps index d67a48d..1a5fe81 100644 --- a/tests/qemucapabilitiesdata/caps_2.4.0-1.caps +++ b/tests/qemucapabilitiesdata/caps_2.4.0-1.caps @@ -167,4 +167,5 @@ <flag name='virtio-mouse'/> <flag name='virtio-tablet'/> <flag name='virtio-input-host'/> + <flag name='vserport-change-event'/> </qemuCaps> diff --git a/tests/qemucapabilitiesdata/caps_2.5.0-1.caps b/tests/qemucapabilitiesdata/caps_2.5.0-1.caps index f4f3673..8b3586a 100644 --- a/tests/qemucapabilitiesdata/caps_2.5.0-1.caps +++ b/tests/qemucapabilitiesdata/caps_2.5.0-1.caps @@ -168,4 +168,5 @@ <flag name='virtio-mouse'/> <flag name='virtio-tablet'/> <flag name='virtio-input-host'/> + <flag name='vserport-change-event'/> </qemuCaps> -- 2.4.10

On Tue, Dec 22, 2015 at 15:41:16 +0100, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1293351
I've came across a bit of a silly scenario, but long story short: after domain was resumed, the virDomainSetTime() API hung for 5 seconds after which it failed with an error. Problem was, that there is no guest agent installed in the domain. But this got me thinking and digging into the code. So even though we do listen to VSERPORT_CHANGE events and connect and disconnect ourselves to guest agent, we still do connect to the guest agent at both domain startup and resume. This is a bit silly as it produces the delay - basically, because we connect to the guest agent, priv->agent is not NULL. Therefore qemuDomainAgentAvailable() will return true. And the place where we hang? Well, historically, when there was no VSERPORT_CHANGE event, we used a dummy ping command to the guest which has 5 seconds timeout. And it's still there and effective. So there's where the delay comes from. What's the resolution? Well, I'm introducing new capability QEMU_CAPS_VSERPORT_CHANGE that is enabled iff qemu is capable of the event. And, during domain startup, reconnect after resume and attach we do connect to the agent socket iff the capability is NOT set.
I'd welcome a less verbose commit message containig the technical details and omitting the story around.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_domain.c | 11 ++++------- src/qemu/qemu_domain.h | 2 +- src/qemu/qemu_driver.c | 2 +- src/qemu/qemu_process.c | 25 +++++++++++++++++++------ src/qemu/qemu_process.h | 4 +++- tests/qemucapabilitiesdata/caps_2.1.1-1.caps | 1 + tests/qemucapabilitiesdata/caps_2.4.0-1.caps | 1 + tests/qemucapabilitiesdata/caps_2.5.0-1.caps | 1 + 10 files changed, 34 insertions(+), 16 deletions(-)
[...]
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index f274068..3aca227 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -198,16 +198,29 @@ static qemuAgentCallbacks agentCallbacks = {
int -qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm) +qemuConnectAgent(virQEMUDriverPtr driver, + virDomainObjPtr vm, + bool force) { qemuDomainObjPrivatePtr priv = vm->privateData; int ret = -1; qemuAgentPtr agent = NULL; - virDomainChrSourceDefPtr config = qemuFindAgentConfig(vm->def); + virDomainChrDefPtr config = qemuFindAgentConfig(vm->def);
if (!config) return 0;
+ if (!force && + virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_VSERPORT_CHANGE) &&
AFAIK the capability bit check won't be necessary here, since the event was introduced precisely at the same time as the ability to query the state via monitor and thus config->state will be _DISCONNECTED only if that exists.
+ config->state == VIR_DOMAIN_CHR_DEVICE_STATE_DISCONNECTED) { + /* With newer qemus capable of VSERPORT_CHANGE event, we are connecting and + * disconnecting to the guest agent as it connects or disconnects to the + * channel within the guest. So, it's important to connect here iff we are + * running qemu not capable of the event or we are called from the event + * callback (@force == true). */
So basically this gets called with @force true just from the callback. But in the callback you call this function if and only if the serial port state is _CONNECTED. All other callers call this with unknown port state. The @force argument thus doesn't make much sense.
+ return 0; + } + if (virSecurityManagerSetDaemonSocketLabel(driver->securityManager, vm->def) < 0) { VIR_ERROR(_("Failed to set security context for agent for %s"),
Peter

On 04.01.2016 16:28, Peter Krempa wrote:
On Tue, Dec 22, 2015 at 15:41:16 +0100, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1293351
I've came across a bit of a silly scenario, but long story short: after domain was resumed, the virDomainSetTime() API hung for 5 seconds after which it failed with an error. Problem was, that there is no guest agent installed in the domain. But this got me thinking and digging into the code. So even though we do listen to VSERPORT_CHANGE events and connect and disconnect ourselves to guest agent, we still do connect to the guest agent at both domain startup and resume. This is a bit silly as it produces the delay - basically, because we connect to the guest agent, priv->agent is not NULL. Therefore qemuDomainAgentAvailable() will return true. And the place where we hang? Well, historically, when there was no VSERPORT_CHANGE event, we used a dummy ping command to the guest which has 5 seconds timeout. And it's still there and effective. So there's where the delay comes from. What's the resolution? Well, I'm introducing new capability QEMU_CAPS_VSERPORT_CHANGE that is enabled iff qemu is capable of the event. And, during domain startup, reconnect after resume and attach we do connect to the agent socket iff the capability is NOT set.
I'd welcome a less verbose commit message containig the technical details and omitting the story around.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_domain.c | 11 ++++------- src/qemu/qemu_domain.h | 2 +- src/qemu/qemu_driver.c | 2 +- src/qemu/qemu_process.c | 25 +++++++++++++++++++------ src/qemu/qemu_process.h | 4 +++- tests/qemucapabilitiesdata/caps_2.1.1-1.caps | 1 + tests/qemucapabilitiesdata/caps_2.4.0-1.caps | 1 + tests/qemucapabilitiesdata/caps_2.5.0-1.caps | 1 + 10 files changed, 34 insertions(+), 16 deletions(-)
[...]
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index f274068..3aca227 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -198,16 +198,29 @@ static qemuAgentCallbacks agentCallbacks = {
int -qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm) +qemuConnectAgent(virQEMUDriverPtr driver, + virDomainObjPtr vm, + bool force) { qemuDomainObjPrivatePtr priv = vm->privateData; int ret = -1; qemuAgentPtr agent = NULL; - virDomainChrSourceDefPtr config = qemuFindAgentConfig(vm->def); + virDomainChrDefPtr config = qemuFindAgentConfig(vm->def);
if (!config) return 0;
+ if (!force && + virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_VSERPORT_CHANGE) &&
AFAIK the capability bit check won't be necessary here, since the event was introduced precisely at the same time as the ability to query the state via monitor and thus config->state will be _DISCONNECTED only if that exists.
Okay if you say so.
+ config->state == VIR_DOMAIN_CHR_DEVICE_STATE_DISCONNECTED) { + /* With newer qemus capable of VSERPORT_CHANGE event, we are connecting and + * disconnecting to the guest agent as it connects or disconnects to the + * channel within the guest. So, it's important to connect here iff we are + * running qemu not capable of the event or we are called from the event + * callback (@force == true). */
So basically this gets called with @force true just from the callback. But in the callback you call this function if and only if the serial port state is _CONNECTED. All other callers call this with unknown port state. The @force argument thus doesn't make much sense.
Not true. We set the @config->state to _CONNECTED if and only if connecting to guest agent succeeded. Prior that @config->state is left untouched, i.e. is in _DISCONNECTED or _DEFAULT state. Moreover, I realized we will need to format @config->state into domain XML more often - e.g. during migration - we want to connect on the other side iff we are connected on the source. I have v3 patches ready. Michal

On Tue, Jan 05, 2016 at 10:23:59 +0100, Michal Privoznik wrote:
On 04.01.2016 16:28, Peter Krempa wrote:
On Tue, Dec 22, 2015 at 15:41:16 +0100, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1293351
I've came across a bit of a silly scenario, but long story short: after domain was resumed, the virDomainSetTime() API hung for 5 seconds after which it failed with an error. Problem was, that there is no guest agent installed in the domain. But this got me thinking and digging into the code. So even though we do listen to VSERPORT_CHANGE events and connect and disconnect ourselves to guest agent, we still do connect to the guest agent at both domain startup and resume. This is a bit silly as it produces the delay - basically, because we connect to the guest agent, priv->agent is not NULL. Therefore qemuDomainAgentAvailable() will return true. And the place where we hang? Well, historically, when there was no VSERPORT_CHANGE event, we used a dummy ping command to the guest which has 5 seconds timeout. And it's still there and effective. So there's where the delay comes from. What's the resolution? Well, I'm introducing new capability QEMU_CAPS_VSERPORT_CHANGE that is enabled iff qemu is capable of the event. And, during domain startup, reconnect after resume and attach we do connect to the agent socket iff the capability is NOT set.
[...]
+ config->state == VIR_DOMAIN_CHR_DEVICE_STATE_DISCONNECTED) { + /* With newer qemus capable of VSERPORT_CHANGE event, we are connecting and + * disconnecting to the guest agent as it connects or disconnects to the + * channel within the guest. So, it's important to connect here iff we are + * running qemu not capable of the event or we are called from the event + * callback (@force == true). */
So basically this gets called with @force true just from the callback. But in the callback you call this function if and only if the serial port state is _CONNECTED. All other callers call this with unknown port state. The @force argument thus doesn't make much sense.
Not true. We set the @config->state to _CONNECTED if and only if connecting to guest agent succeeded. Prior that @config->state is left untouched, i.e. is in _DISCONNECTED or _DEFAULT state.
Okay, I had to actually look at the code at this point. So the above statement is not entirely true either. There is a bug in the code that would not set the state to _CONNECTED if qemuConnectAgent failed. Otherwise the state is always updated and for non-agent VIRTIO channels it even doesn't have a condition that would allow this to fail. As the channel state doesn't necessarily have to denote that the agent socket is connected due to various reasons the state should always be updated.
Moreover, I realized we will need to format @config->state into domain XML more often - e.g. during migration - we want to connect on the other side iff we are connected on the source. I have v3 patches ready.
I don't think so. This state can be re-queried at the destination of the migration once the migration is complete and thus it does not need to be transported. The only reason why the virtio channel state is parsed from the XML is to report correct transitions in the event. Other than that I don't think that any situation would require us to actually remember or transport the original state since it can always be queried and acted upon at the point where it's needed.
Michal

On Tue, Jan 05, 2016 at 10:34:16 +0100, Peter Krempa wrote:
On Tue, Jan 05, 2016 at 10:23:59 +0100, Michal Privoznik wrote:
On 04.01.2016 16:28, Peter Krempa wrote:
On Tue, Dec 22, 2015 at 15:41:16 +0100, Michal Privoznik wrote:
...
Okay, I had to actually look at the code at this point. So the above statement is not entirely true either. There is a bug in the code that would not set the state to _CONNECTED if qemuConnectAgent failed. Otherwise the state is always updated and for non-agent VIRTIO channels it even doesn't have a condition that would allow this to fail.
As the channel state doesn't necessarily have to denote that the agent socket is connected due to various reasons the state should always be updated.
Moreover, I realized we will need to format @config->state into domain XML more often - e.g. during migration - we want to connect on the other side iff we are connected on the source. I have v3 patches ready.
I don't think so. This state can be re-queried at the destination of the migration once the migration is complete and thus it does not need to be transported. The only reason why the virtio channel state is parsed from the XML is to report correct transitions in the event.
So I just figured out that we should do the correct tranistion event even after migration, so yes, the port state should be formatted also into the migratable XML. The problem there is that the migration might transfer the VM to a qemu instance that doesn't support this event which will then render a lot of this stuff not working properly though so the migration code will need to correctly handle the state after that.
participants (2)
-
Michal Privoznik
-
Peter Krempa