[PATCH] Revert "report error when virProcessGetStatInfo() is unable to parse data"

This reverts commit 938382b60ae5bd1f83b5cb09e1ce68b9a88f679a. Turns out, the commit did more harm than good. It changed semantics on some public APIs. For instance, while qemuDomainGetInfo() previously did not returned an error it does now. While the calls to virProcessGetStatInfo() is guarded with virDomainObjIsActive() it doesn't necessarily mean that QEMU's PID is still alive. QEMU might be gone but we just haven't realized it (e.g. because the eof handler thread is waiting for a job). Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2041610 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/ch/ch_driver.c | 2 ++ src/qemu/qemu_driver.c | 7 ++++++- src/util/virprocess.c | 8 ++------ 3 files changed, 10 insertions(+), 7 deletions(-) diff --git a/src/ch/ch_driver.c b/src/ch/ch_driver.c index 3cbc668489..53e0872207 100644 --- a/src/ch/ch_driver.c +++ b/src/ch/ch_driver.c @@ -1073,6 +1073,8 @@ chDomainHelperGetVcpus(virDomainObj *vm, if (virProcessGetStatInfo(&vcpuinfo->cpuTime, &vcpuinfo->cpu, NULL, vm->pid, vcpupid) < 0) { + virReportSystemError(errno, "%s", + _("cannot get vCPU placement & pCPU time")); return -1; } } diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index e150b86cef..373cd62536 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1357,6 +1357,8 @@ qemuDomainHelperGetVcpus(virDomainObj *vm, if (virProcessGetStatInfo(&vcpuinfo->cpuTime, &vcpuinfo->cpu, NULL, vm->pid, vcpupid) < 0) { + virReportSystemError(errno, "%s", + _("cannot get vCPU placement & pCPU time")); return -1; } } @@ -2517,6 +2519,8 @@ qemuDomainGetInfo(virDomainPtr dom, if (virDomainObjIsActive(vm)) { if (virProcessGetStatInfo(&(info->cpuTime), NULL, NULL, vm->pid, 0) < 0) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("cannot read cputime for domain")); goto cleanup; } } @@ -10524,7 +10528,8 @@ qemuDomainMemoryStatsInternal(virQEMUDriver *driver, } if (virProcessGetStatInfo(NULL, NULL, &rss, vm->pid, 0) < 0) { - virResetLastError(); + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("cannot get RSS for domain")); } else { stats[ret].tag = VIR_DOMAIN_MEMORY_STAT_RSS; stats[ret].val = rss; diff --git a/src/util/virprocess.c b/src/util/virprocess.c index 85d8c8e747..b559a4257e 100644 --- a/src/util/virprocess.c +++ b/src/util/virprocess.c @@ -1784,10 +1784,7 @@ virProcessGetStatInfo(unsigned long long *cpuTime, virStrToLong_ullp(proc_stat[VIR_PROCESS_STAT_STIME], NULL, 10, &systime) < 0 || virStrToLong_l(proc_stat[VIR_PROCESS_STAT_RSS], NULL, 10, &rss) < 0 || virStrToLong_i(proc_stat[VIR_PROCESS_STAT_PROCESSOR], NULL, 10, &cpu) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("cannot parse process status data for pid '%d/%d'"), - (int) pid, (int) tid); - return -1; + VIR_WARN("cannot parse process status data"); } /* We got jiffies @@ -1884,8 +1881,7 @@ virProcessGetStatInfo(unsigned long long *cpuTime G_GNUC_UNUSED, pid_t pid G_GNUC_UNUSED, pid_t tid G_GNUC_UNUSED) { - virReportSystemError(ENOSYS, "%s", - _("Process statistics data is not supported on this platform")); + errno = ENOSYS; return -1; } -- 2.34.1

On Tue, Jan 18, 2022 at 5:15 PM Michal Privoznik <mprivozn@redhat.com> wrote:
This reverts commit 938382b60ae5bd1f83b5cb09e1ce68b9a88f679a.
Turns out, the commit did more harm than good. It changed semantics on some public APIs. For instance, while qemuDomainGetInfo() previously did not returned an error it does now. While the calls to virProcessGetStatInfo() is guarded with virDomainObjIsActive() it doesn't necessarily mean that QEMU's PID is still alive. QEMU might be gone but we just haven't realized it (e.g. because the eof handler thread is waiting for a job).
Doh!
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2041610 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/ch/ch_driver.c | 2 ++ src/qemu/qemu_driver.c | 7 ++++++- src/util/virprocess.c | 8 ++------ 3 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/src/ch/ch_driver.c b/src/ch/ch_driver.c index 3cbc668489..53e0872207 100644 --- a/src/ch/ch_driver.c +++ b/src/ch/ch_driver.c @@ -1073,6 +1073,8 @@ chDomainHelperGetVcpus(virDomainObj *vm, if (virProcessGetStatInfo(&vcpuinfo->cpuTime, &vcpuinfo->cpu, NULL, vm->pid, vcpupid) < 0) { + virReportSystemError(errno, "%s", + _("cannot get vCPU placement & pCPU time")); return -1; } } diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index e150b86cef..373cd62536 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1357,6 +1357,8 @@ qemuDomainHelperGetVcpus(virDomainObj *vm, if (virProcessGetStatInfo(&vcpuinfo->cpuTime, &vcpuinfo->cpu, NULL, vm->pid, vcpupid) < 0) { + virReportSystemError(errno, "%s", + _("cannot get vCPU placement & pCPU time")); return -1; } } @@ -2517,6 +2519,8 @@ qemuDomainGetInfo(virDomainPtr dom, if (virDomainObjIsActive(vm)) { if (virProcessGetStatInfo(&(info->cpuTime), NULL, NULL, vm->pid, 0) < 0) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("cannot read cputime for domain")); goto cleanup; } } @@ -10524,7 +10528,8 @@ qemuDomainMemoryStatsInternal(virQEMUDriver *driver, }
if (virProcessGetStatInfo(NULL, NULL, &rss, vm->pid, 0) < 0) { - virResetLastError(); + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("cannot get RSS for domain")); } else { stats[ret].tag = VIR_DOMAIN_MEMORY_STAT_RSS; stats[ret].val = rss; diff --git a/src/util/virprocess.c b/src/util/virprocess.c index 85d8c8e747..b559a4257e 100644 --- a/src/util/virprocess.c +++ b/src/util/virprocess.c @@ -1784,10 +1784,7 @@ virProcessGetStatInfo(unsigned long long *cpuTime, virStrToLong_ullp(proc_stat[VIR_PROCESS_STAT_STIME], NULL, 10, &systime) < 0 || virStrToLong_l(proc_stat[VIR_PROCESS_STAT_RSS], NULL, 10, &rss) < 0 || virStrToLong_i(proc_stat[VIR_PROCESS_STAT_PROCESSOR], NULL, 10, &cpu) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("cannot parse process status data for pid '%d/%d'"), - (int) pid, (int) tid); - return -1; + VIR_WARN("cannot parse process status data"); }
/* We got jiffies @@ -1884,8 +1881,7 @@ virProcessGetStatInfo(unsigned long long *cpuTime G_GNUC_UNUSED, pid_t pid G_GNUC_UNUSED, pid_t tid G_GNUC_UNUSED) { - virReportSystemError(ENOSYS, "%s", - _("Process statistics data is not supported on this platform")); + errno = ENOSYS; return -1; }
-- 2.34.1

On Tue, Jan 18, 2022 at 12:44:51PM +0100, Michal Privoznik wrote:
+++ b/src/util/virprocess.c @@ -1784,10 +1784,7 @@ virProcessGetStatInfo(unsigned long long *cpuTime, virStrToLong_ullp(proc_stat[VIR_PROCESS_STAT_STIME], NULL, 10, &systime) < 0 || virStrToLong_l(proc_stat[VIR_PROCESS_STAT_RSS], NULL, 10, &rss) < 0 || virStrToLong_i(proc_stat[VIR_PROCESS_STAT_PROCESSOR], NULL, 10, &cpu) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("cannot parse process status data for pid '%d/%d'"), - (int) pid, (int) tid); - return -1; + VIR_WARN("cannot parse process status data");
Shame to lose the improved error/warning message. Perhaps it could be reintroduced separately. Reviewed-by: Andrea Bolognani <abologna@redhat.com> -- Andrea Bolognani / Red Hat / Virtualization

On Thu, 20 Jan 2022, Andrea Bolognani wrote:
On Tue, Jan 18, 2022 at 12:44:51PM +0100, Michal Privoznik wrote:
+++ b/src/util/virprocess.c @@ -1784,10 +1784,7 @@ virProcessGetStatInfo(unsigned long long *cpuTime, virStrToLong_ullp(proc_stat[VIR_PROCESS_STAT_STIME], NULL, 10, &systime) < 0 || virStrToLong_l(proc_stat[VIR_PROCESS_STAT_RSS], NULL, 10, &rss) < 0 || virStrToLong_i(proc_stat[VIR_PROCESS_STAT_PROCESSOR], NULL, 10, &cpu) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("cannot parse process status data for pid '%d/%d'"), - (int) pid, (int) tid); - return -1; + VIR_WARN("cannot parse process status data");
Shame to lose the improved error/warning message. Perhaps it could be reintroduced separately.
Functions in general are not coded around success path only. Most well written functions have a success path and an error path. In case of error, they should be able to report warning/errors. If raising an error from a function causes a breakage of an external api, then that is an architectural problem imho. Instead of reverting the error/warning, we should try to fix the larger problem at hand in the longer term.

On 1/20/22 16:31, Ani Sinha wrote:
On Thu, 20 Jan 2022, Andrea Bolognani wrote:
On Tue, Jan 18, 2022 at 12:44:51PM +0100, Michal Privoznik wrote:
+++ b/src/util/virprocess.c @@ -1784,10 +1784,7 @@ virProcessGetStatInfo(unsigned long long *cpuTime, virStrToLong_ullp(proc_stat[VIR_PROCESS_STAT_STIME], NULL, 10, &systime) < 0 || virStrToLong_l(proc_stat[VIR_PROCESS_STAT_RSS], NULL, 10, &rss) < 0 || virStrToLong_i(proc_stat[VIR_PROCESS_STAT_PROCESSOR], NULL, 10, &cpu) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("cannot parse process status data for pid '%d/%d'"), - (int) pid, (int) tid); - return -1; + VIR_WARN("cannot parse process status data");
Shame to lose the improved error/warning message. Perhaps it could be reintroduced separately.
Functions in general are not coded around success path only. Most well written functions have a success path and an error path. In case of error, they should be able to report warning/errors. If raising an error from a function causes a breakage of an external api, then that is an architectural problem imho. Instead of reverting the error/warning, we should try to fix the larger problem at hand in the longer term.
Well, until we get there we should fix the upstream so that we don't have another broken release. Michal

On Thu, Jan 20, 2022 at 21:14 Michal Prívozník <mprivozn@redhat.com> wrote:
On 1/20/22 16:31, Ani Sinha wrote:
On Thu, 20 Jan 2022, Andrea Bolognani wrote:
On Tue, Jan 18, 2022 at 12:44:51PM +0100, Michal Privoznik wrote:
+++ b/src/util/virprocess.c @@ -1784,10 +1784,7 @@ virProcessGetStatInfo(unsigned long long
*cpuTime,
virStrToLong_ullp(proc_stat[VIR_PROCESS_STAT_STIME], NULL,
10, &systime) < 0 ||
virStrToLong_l(proc_stat[VIR_PROCESS_STAT_RSS], NULL, 10,
&rss) < 0 ||
virStrToLong_i(proc_stat[VIR_PROCESS_STAT_PROCESSOR], NULL,
10, &cpu) < 0) {
- virReportError(VIR_ERR_INTERNAL_ERROR, - _("cannot parse process status data for pid '%d/%d'"), - (int) pid, (int) tid); - return -1; + VIR_WARN("cannot parse process status data");
Shame to lose the improved error/warning message. Perhaps it could be reintroduced separately.
Functions in general are not coded around success path only. Most well written functions have a success path and an error path. In case of error, they should be able to report warning/errors. If raising an error from a function causes a breakage of an external api, then that is an architectural problem imho. Instead of reverting the error/warning, we should try to fix the larger problem at hand in the longer term.
Well, until we get there we should fix the upstream so that we don't have another broken release.
AKA kicking the can one more time 🙃

On 1/20/22 16:48, Ani Sinha wrote:
AKA kicking the can one more time 🙃
Well, I should have been more careful and not merge the patch in the first place. Changing API behavior is something we should never do. Looking at the code closer, it looks like all callers of this function would need to ignore the reported error so that their behavior is not changed. At this point, does it make sense to report an error in the function? Michal

On Thu, Jan 20, 2022 at 21:29 Michal Prívozník <mprivozn@redhat.com> wrote:
On 1/20/22 16:48, Ani Sinha wrote:
AKA kicking the can one more time 🙃
Well, I should have been more careful and not merge the patch in the first place. Changing API behavior is something we should never do.
Looking at the code closer, it looks like all callers of this function would need to ignore the reported error so that their behavior is not changed. At this point, does it make sense to report an error in the function?
The callers can decide what do with the error raised by the function. We should not write functions that cannot fail.

On 1/20/22 18:23, Ani Sinha wrote:
On Thu, Jan 20, 2022 at 21:29 Michal Prívozník <mprivozn@redhat.com <mailto:mprivozn@redhat.com>> wrote:
On 1/20/22 16:48, Ani Sinha wrote: > >
> > AKA kicking the can one more time 🙃
Well, I should have been more careful and not merge the patch in the first place. Changing API behavior is something we should never do.
Looking at the code closer, it looks like all callers of this function would need to ignore the reported error so that their behavior is not changed. At this point, does it make sense to report an error in the function?
The callers can decide what do with the error raised by the function. We should not write functions that cannot fail.
But that's not what the commit does, is it. It changed some public APIs from best effort to fail early. Therefore, it was reverted until we can come up with proper fix. Libvirt's promise and value is in stability of its APIs. We want users to update libvirt without having to rewrite their app (or even rebuild it = ABI stability). And the commit broke that promise. It's only fair that it is reverted until proper solution is found. Michal

On Fri, 21 Jan 2022, Michal Prívozník wrote:
On 1/20/22 18:23, Ani Sinha wrote:
On Thu, Jan 20, 2022 at 21:29 Michal Prívozník <mprivozn@redhat.com <mailto:mprivozn@redhat.com>> wrote:
On 1/20/22 16:48, Ani Sinha wrote: > >
> > AKA kicking the can one more time 🙃
Well, I should have been more careful and not merge the patch in the first place. Changing API behavior is something we should never do.
Looking at the code closer, it looks like all callers of this function would need to ignore the reported error so that their behavior is not changed. At this point, does it make sense to report an error in the function?
The callers can decide what do with the error raised by the function. We should not write functions that cannot fail.
But that's not what the commit does, is it. It changed some public APIs from best effort to fail early.
That is the side effect and I consider it as a bug. Imagine we add some more code into that function tomorrow and there is an error path. Now because of this bug, we will have to ignore the error condition and not report it. How ridiculous is that? We should have kept the patch as is and fix the callers so that the public APIs were not broken.
Libvirt's promise and value is in stability of its APIs. We want users to update libvirt without having to rewrite their app (or even rebuild it = ABI stability). And the commit broke that promise. It's only fair that it is reverted until proper solution is found.
I have no argument with the above.

On Fri, Jan 21, 2022 at 07:38:35PM +0530, Ani Sinha wrote:
On Fri, 21 Jan 2022, Michal Prívozník wrote:
On 1/20/22 18:23, Ani Sinha wrote:
On Thu, Jan 20, 2022 at 21:29 Michal Prívozník <mprivozn@redhat.com <mailto:mprivozn@redhat.com>> wrote:
On 1/20/22 16:48, Ani Sinha wrote: > >
> > AKA kicking the can one more time 🙃
Well, I should have been more careful and not merge the patch in the first place. Changing API behavior is something we should never do.
Looking at the code closer, it looks like all callers of this function would need to ignore the reported error so that their behavior is not changed. At this point, does it make sense to report an error in the function?
The callers can decide what do with the error raised by the function. We should not write functions that cannot fail.
But that's not what the commit does, is it. It changed some public APIs from best effort to fail early.
That is the side effect and I consider it as a bug. Imagine we add some more code into that function tomorrow and there is an error path. Now because of this bug, we will have to ignore the error condition and not report it. How ridiculous is that?
We should have kept the patch as is and fix the callers so that the public APIs were not broken.
That is not the libvirt approach. Our #1 priority is public API compatibility, everything else is subservient to that. So when we have a regression we fix that in the quickest possible way. Simply reverting the broken patch was the quickest option here. Figuring out a correct long term solution can follow up later Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Fri, 21 Jan 2022, Daniel P. Berrangé wrote:
On Fri, Jan 21, 2022 at 07:38:35PM +0530, Ani Sinha wrote:
On Fri, 21 Jan 2022, Michal Prívozník wrote:
On 1/20/22 18:23, Ani Sinha wrote:
On Thu, Jan 20, 2022 at 21:29 Michal Prívozník <mprivozn@redhat.com <mailto:mprivozn@redhat.com>> wrote:
On 1/20/22 16:48, Ani Sinha wrote: > >
> > AKA kicking the can one more time 🙃
Well, I should have been more careful and not merge the patch in the first place. Changing API behavior is something we should never do.
Looking at the code closer, it looks like all callers of this function would need to ignore the reported error so that their behavior is not changed. At this point, does it make sense to report an error in the function?
The callers can decide what do with the error raised by the function. We should not write functions that cannot fail.
But that's not what the commit does, is it. It changed some public APIs from best effort to fail early.
That is the side effect and I consider it as a bug. Imagine we add some more code into that function tomorrow and there is an error path. Now because of this bug, we will have to ignore the error condition and not report it. How ridiculous is that?
We should have kept the patch as is and fix the callers so that the public APIs were not broken.
That is not the libvirt approach. Our #1 priority is public API compatibility, everything else is subservient to that. So when we have a regression we fix that in the quickest possible way. Simply reverting the broken patch>
(a) I take exception to the fact that the patch reverted was "broken". (b) I have sent a patch adding a comment to that function as to why we cannot report any error there. Wider people can message the comment in any way they like. It would remind us to fix the code in the future and at the same time prevent others from wasting their time going down the path I took.

On Fri, Jan 21, 2022 at 08:16:01PM +0530, Ani Sinha wrote:
On Fri, 21 Jan 2022, Daniel P. Berrangé wrote:
On Fri, Jan 21, 2022 at 07:38:35PM +0530, Ani Sinha wrote:
On Fri, 21 Jan 2022, Michal Prívozník wrote:
On 1/20/22 18:23, Ani Sinha wrote:
On Thu, Jan 20, 2022 at 21:29 Michal Prívozník <mprivozn@redhat.com <mailto:mprivozn@redhat.com>> wrote:
On 1/20/22 16:48, Ani Sinha wrote: > >
> > AKA kicking the can one more time 🙃
Well, I should have been more careful and not merge the patch in the first place. Changing API behavior is something we should never do.
Looking at the code closer, it looks like all callers of this function would need to ignore the reported error so that their behavior is not changed. At this point, does it make sense to report an error in the function?
The callers can decide what do with the error raised by the function. We should not write functions that cannot fail.
But that's not what the commit does, is it. It changed some public APIs from best effort to fail early.
That is the side effect and I consider it as a bug. Imagine we add some more code into that function tomorrow and there is an error path. Now because of this bug, we will have to ignore the error condition and not report it. How ridiculous is that?
We should have kept the patch as is and fix the callers so that the public APIs were not broken.
That is not the libvirt approach. Our #1 priority is public API compatibility, everything else is subservient to that. So when we have a regression we fix that in the quickest possible way. Simply reverting the broken patch>
(a) I take exception to the fact that the patch reverted was "broken".
Did you actually read the bug report in the first message in this series ? An end user hit a behaviour regression breaking the installation process, and was caused by the patch that's been reverted. I'm not saying the patch didn't have a good motiviation behind what it was trying to achieve, but the end result was a unfortunately a regresion in the public API in libvirt that impacted our users. That clearly qualfies for being called 'broken'. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Fri, Jan 21, 2022 at 20:33 Daniel P. Berrangé <berrange@redhat.com> wrote:
On Fri, Jan 21, 2022 at 08:16:01PM +0530, Ani Sinha wrote:
On Fri, 21 Jan 2022, Daniel P. Berrangé wrote:
On Fri, Jan 21, 2022 at 07:38:35PM +0530, Ani Sinha wrote:
On Fri, 21 Jan 2022, Michal Prívozník wrote:
On 1/20/22 18:23, Ani Sinha wrote:
On Thu, Jan 20, 2022 at 21:29 Michal Prívozník <
<mailto:mprivozn@redhat.com>> wrote:
On 1/20/22 16:48, Ani Sinha wrote: > >
> > AKA kicking the can one more time 🙃
Well, I should have been more careful and not merge the
first place. Changing API behavior is something we should
never do.
Looking at the code closer, it looks like all callers of
would need to ignore the reported error so that their
behavior is not
changed. At this point, does it make sense to report an
error in the
function?
The callers can decide what do with the error raised by the
function. We
should not write functions that cannot fail.
But that's not what the commit does, is it. It changed some public APIs from best effort to fail early.
That is the side effect and I consider it as a bug. Imagine we add some more code into that function tomorrow and there is an error path. Now because of this bug, we will have to ignore the error condition and not report it. How ridiculous is that?
We should have kept the patch as is and fix the callers so that the
mprivozn@redhat.com patch in the this function public
APIs were not broken.
That is not the libvirt approach. Our #1 priority is public API compatibility, everything else is subservient to that. So when we have a regression we fix that in the quickest possible way. Simply reverting the broken patch>
(a) I take exception to the fact that the patch reverted was "broken".
Did you actually read the bug report in the first message in this series ? An end user hit a behaviour regression breaking the installation process, and was caused by the patch that's been reverted.
No I did not read the bug report because it required me to create an account and then login. I do not have my redhat BZ account credentials handy.

On Fri, Jan 21, 2022 at 08:39:47PM +0530, Ani Sinha wrote:
On Fri, Jan 21, 2022 at 20:33 Daniel P. Berrangé <berrange@redhat.com> wrote:
On Fri, Jan 21, 2022 at 08:16:01PM +0530, Ani Sinha wrote:
On Fri, 21 Jan 2022, Daniel P. Berrangé wrote:
On Fri, Jan 21, 2022 at 07:38:35PM +0530, Ani Sinha wrote:
On Fri, 21 Jan 2022, Michal Prívozník wrote:
On 1/20/22 18:23, Ani Sinha wrote: > > > On Thu, Jan 20, 2022 at 21:29 Michal Prívozník <
> <mailto:mprivozn@redhat.com>> wrote: > > On 1/20/22 16:48, Ani Sinha wrote: > > > > > > > > > AKA kicking the can one more time 🙃 > > Well, I should have been more careful and not merge the
> first place. Changing API behavior is something we should never do. > > Looking at the code closer, it looks like all callers of
> would need to ignore the reported error so that their behavior is not > changed. At this point, does it make sense to report an error in the > function? > > > The callers can decide what do with the error raised by the function. We > should not write functions that cannot fail. >
But that's not what the commit does, is it. It changed some public APIs from best effort to fail early.
That is the side effect and I consider it as a bug. Imagine we add some more code into that function tomorrow and there is an error path. Now because of this bug, we will have to ignore the error condition and not report it. How ridiculous is that?
We should have kept the patch as is and fix the callers so that the
mprivozn@redhat.com patch in the this function public
APIs were not broken.
That is not the libvirt approach. Our #1 priority is public API compatibility, everything else is subservient to that. So when we have a regression we fix that in the quickest possible way. Simply reverting the broken patch>
(a) I take exception to the fact that the patch reverted was "broken".
Did you actually read the bug report in the first message in this series ? An end user hit a behaviour regression breaking the installation process, and was caused by the patch that's been reverted.
No I did not read the bug report because it required me to create an account and then login. I do not have my redhat BZ account credentials handy.
Urgh, sorry I missed that the BZ was private. Private BZs are not supposed to be listed in commit messages for precisely the reason that people can't see them. The commit message should have showed the problem listed in the private BZ description instead. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Fri, Jan 21, 2022 at 8:56 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Fri, Jan 21, 2022 at 20:33 Daniel P. Berrangé <berrange@redhat.com> wrote:
On Fri, Jan 21, 2022 at 08:16:01PM +0530, Ani Sinha wrote:
On Fri, 21 Jan 2022, Daniel P. Berrangé wrote:
On Fri, Jan 21, 2022 at 07:38:35PM +0530, Ani Sinha wrote:
On Fri, 21 Jan 2022, Michal Prívozník wrote:
> On 1/20/22 18:23, Ani Sinha wrote: > > > > > > On Thu, Jan 20, 2022 at 21:29 Michal Prívozník <
> > <mailto:mprivozn@redhat.com>> wrote: > > > > On 1/20/22 16:48, Ani Sinha wrote: > > > > > > > > > > > > > > AKA kicking the can one more time 🙃 > > > > Well, I should have been more careful and not merge the
> > first place. Changing API behavior is something we should never do. > > > > Looking at the code closer, it looks like all callers of
mprivozn@redhat.com patch in the this function
> > would need to ignore the reported error so that their behavior is not > > changed. At this point, does it make sense to report an error in the > > function? > > > > > > The callers can decide what do with the error raised by the function. We > > should not write functions that cannot fail. > > > > But that's not what the commit does, is it. It changed some
APIs
> from best effort to fail early.
That is the side effect and I consider it as a bug. Imagine we add some more code into that function tomorrow and there is an error
because of this bug, we will have to ignore the error condition and not report it. How ridiculous is that?
We should have kept the patch as is and fix the callers so that
On Fri, Jan 21, 2022 at 08:39:47PM +0530, Ani Sinha wrote: public path. Now the
public
APIs were not broken.
That is not the libvirt approach. Our #1 priority is public API compatibility, everything else is subservient to that. So when we have a regression we fix that in the quickest possible way. Simply reverting the broken patch>
(a) I take exception to the fact that the patch reverted was "broken".
Did you actually read the bug report in the first message in this series ? An end user hit a behaviour regression breaking the installation process, and was caused by the patch that's been reverted.
No I did not read the bug report because it required me to create an account and then login. I do not have my redhat BZ account credentials handy.
Urgh, sorry I missed that the BZ was private. Private BZs are not supposed to be listed in commit messages for precisely the reason that people can't see them. The commit message should have showed the problem listed in the private BZ description instead.
Turns out even if I logged into BZ with my old account credentials ( I couldn't find the password, so reset it) I still can't access the bug info: You are not authorized to access bug #2041610. Most likely the bug has been restricted for internal development processes and we cannot grant access. If you are a Red Hat customer with an active subscription, please visit the Red Hat Customer Portal <https://access.redhat.com/> for assistance with your issue If you are a Fedora Project user and require assistance, please consider using one of the mailing lists <https://fedoraproject.org/wiki/Communicate> we host for the Fedora Project.

On 1/21/22 16:26, Daniel P. Berrangé wrote:
Urgh, sorry I missed that the BZ was private. Private BZs are not supposed to be listed in commit messages for precisely the reason that people can't see them. The commit message should have showed the problem listed in the private BZ description instead.
Yeah, I did not realize BZ was private either. Nevertheless, the commit message does show the reason: while previously qemuDomainGetInfo() would not return an error it does so after the commit in question. This is reproducible (for instance) via virt-install, which reports: ERROR internal error: cannot parse process status data for pid '1234/0' after guest installation has completed and guest reboots. Michal
participants (5)
-
Andrea Bolognani
-
Ani Sinha
-
Daniel P. Berrangé
-
Michal Privoznik
-
Michal Prívozník