[PATCH 0/3] leaseshelper: Wait to acquire PID file

Patch 1/3 is for demonstration purposes only. The rest patches actual problem.I was also thinking of making the PID file per bridge, i.e. include bridge name in the name (e.g. /var/run/virbr0_leaseshelper.pid) to decrease pressure on the single PID file. However, DHCP transactions are not that frequent IMO so it's not really a bottle neck nor performance issue. Michal Prívozník (3): *** DO NOT APPLY UPSTREAM *** leaseshelper: Wait to acquire PID file leaseshelper: Report more errors src/network/leaseshelper.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) -- 2.26.2

This is a demo of the problem. Set up two NATed networks with dnsmasq and start a domain with two interfaces, one from each network. Only one network will report DHCP lease in 'virsh net-dhcp-leases'. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/network/leaseshelper.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/network/leaseshelper.c b/src/network/leaseshelper.c index a1780ca4e4..02759f2314 100644 --- a/src/network/leaseshelper.c +++ b/src/network/leaseshelper.c @@ -172,6 +172,8 @@ main(int argc, char **argv) case VIR_LEASE_ACTION_ADD: case VIR_LEASE_ACTION_OLD: /* Create new lease */ + sleep(60); + if (virLeaseNew(&lease_new, mac, clientid, ip, hostname, iaid, server_duid) < 0) goto cleanup; /* Custom ipv6 leases *will not* be created if the env-var DNSMASQ_MAC -- 2.26.2

On Mon, Jun 15, 2020 at 13:32:34 +0200, Michal Privoznik wrote:
This is a demo of the problem. Set up two NATed networks with dnsmasq and start a domain with two interfaces, one from each network. Only one network will report DHCP lease in 'virsh net-dhcp-leases'.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
I suggest you make use of the actual useful uses of a sign-off and that is to omit it for patches which are not meant to be pushed upstream. The pre-push hook will reject accidentally pushing it.

On 6/15/20 2:06 PM, Peter Krempa wrote:
On Mon, Jun 15, 2020 at 13:32:34 +0200, Michal Privoznik wrote:
This is a demo of the problem. Set up two NATed networks with dnsmasq and start a domain with two interfaces, one from each network. Only one network will report DHCP lease in 'virsh net-dhcp-leases'.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
I suggest you make use of the actual useful uses of a sign-off and that is to omit it for patches which are not meant to be pushed upstream. The pre-push hook will reject accidentally pushing it.
Ah, sure; It's just that I have this alias 'cs' which does commit + sign off and I am so used to commit using that. BTW: did we migrate these pre-commit hooks to gitlab too? Michal

On Mon, Jun 15, 2020 at 02:46:42PM +0200, Michal Privoznik wrote:
On 6/15/20 2:06 PM, Peter Krempa wrote:
On Mon, Jun 15, 2020 at 13:32:34 +0200, Michal Privoznik wrote:
This is a demo of the problem. Set up two NATed networks with dnsmasq and start a domain with two interfaces, one from each network. Only one network will report DHCP lease in 'virsh net-dhcp-leases'.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
I suggest you make use of the actual useful uses of a sign-off and that is to omit it for patches which are not meant to be pushed upstream. The pre-push hook will reject accidentally pushing it.
Ah, sure; It's just that I have this alias 'cs' which does commit + sign off and I am so used to commit using that.
BTW: did we migrate these pre-commit hooks to gitlab too?
Yes & no. You can't install arbitrary hooks server side. Instead we have a regex filter that checks for the word "Signed-off-by" in the commit message which applies when pushing directly to git. The CI job checks this for code that will go in via merge requests. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On a DHCP transaction, dnsmasq runs our leases helper which updates corresponding JSON files. While one dnsmasq won't run the leaseshelper in parallel, two dnsmasqs (from two distinct networks) might. To avoid corrupting JSON file, the leaseshelper acquires PID file first. Well, the way it's acquiring it is not ideal - it calls virPidFileAcquirePath(wait = false); which means, that either it acquires the PID file instantly or returns an error and does not touch the JSON at all. This in turn means that there might be a leases record missing. With wait = true, this won't happen. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1840307 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/network/leaseshelper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/network/leaseshelper.c b/src/network/leaseshelper.c index 02759f2314..c4258cae4e 100644 --- a/src/network/leaseshelper.c +++ b/src/network/leaseshelper.c @@ -160,7 +160,7 @@ main(int argc, char **argv) pid_file = g_strdup(RUNSTATEDIR "/leaseshelper.pid"); /* Try to claim the pidfile, exiting if we can't */ - if ((pid_file_fd = virPidFileAcquirePath(pid_file, false, getpid())) < 0) + if ((pid_file_fd = virPidFileAcquirePath(pid_file, true, getpid())) < 0) goto cleanup; /* Since interfaces can be hot plugged, we need to make sure that the -- 2.26.2

On Mon, Jun 15, 2020 at 01:32:35PM +0200, Michal Privoznik wrote:
On a DHCP transaction, dnsmasq runs our leases helper which updates corresponding JSON files. While one dnsmasq won't run the leaseshelper in parallel, two dnsmasqs (from two distinct networks) might. To avoid corrupting JSON file, the leaseshelper acquires PID file first. Well, the way it's acquiring it is not ideal - it calls virPidFileAcquirePath(wait = false); which means, that either it acquires the PID file instantly or returns an error and does not touch the JSON at all. This in turn means that there might be a leases record missing. With wait = true, this won't happen.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1840307
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/network/leaseshelper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/network/leaseshelper.c b/src/network/leaseshelper.c index 02759f2314..c4258cae4e 100644 --- a/src/network/leaseshelper.c +++ b/src/network/leaseshelper.c @@ -160,7 +160,7 @@ main(int argc, char **argv) pid_file = g_strdup(RUNSTATEDIR "/leaseshelper.pid");
/* Try to claim the pidfile, exiting if we can't */ - if ((pid_file_fd = virPidFileAcquirePath(pid_file, false, getpid())) < 0) + if ((pid_file_fd = virPidFileAcquirePath(pid_file, true, getpid())) < 0) goto cleanup;
/* Since interfaces can be hot plugged, we need to make sure that the
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Some functions or code paths that may fail don't report error (e.g. when acquiring PID file fails) leading to a silent quit of the leaseshelper. This makes it super hard for us and users to debug what is happening. Fortunately, dnsmasq captures both stdout and stderr so we can write an error message there. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/network/leaseshelper.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/src/network/leaseshelper.c b/src/network/leaseshelper.c index c4258cae4e..947f838126 100644 --- a/src/network/leaseshelper.c +++ b/src/network/leaseshelper.c @@ -131,8 +131,10 @@ main(int argc, char **argv) * events for expired leases. So, libvirtd sets another env var for this * purpose */ if (!interface && - !(interface = getenv("VIR_BRIDGE_NAME"))) - goto cleanup; + !(interface = getenv("VIR_BRIDGE_NAME"))) { + fprintf(stderr, _("interface not set\n")); + exit(EXIT_FAILURE); + } ip = argv[3]; mac = argv[2]; @@ -160,13 +162,21 @@ main(int argc, char **argv) pid_file = g_strdup(RUNSTATEDIR "/leaseshelper.pid"); /* Try to claim the pidfile, exiting if we can't */ - if ((pid_file_fd = virPidFileAcquirePath(pid_file, true, getpid())) < 0) + if ((pid_file_fd = virPidFileAcquirePath(pid_file, true, getpid())) < 0) { + fprintf(stderr, + _("Unable to acquire PID file: %s\n errno=%d"), + pid_file, errno); goto cleanup; + } /* Since interfaces can be hot plugged, we need to make sure that the * corresponding custom lease file exists. If not, 'touch' it */ - if (virFileTouch(custom_lease_file, 0644) < 0) + if (virFileTouch(custom_lease_file, 0644) < 0) { + fprintf(stderr, + _("Unable to create: %s\n errno=%d"), + custom_lease_file, errno); goto cleanup; + } switch ((enum virLeaseActionFlags) action) { case VIR_LEASE_ACTION_ADD: -- 2.26.2

On Mon, Jun 15, 2020 at 01:32:36PM +0200, Michal Privoznik wrote:
Some functions or code paths that may fail don't report error (e.g. when acquiring PID file fails) leading to a silent quit of the leaseshelper. This makes it super hard for us and users to debug what is happening. Fortunately, dnsmasq captures both stdout and stderr so we can write an error message there.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/network/leaseshelper.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 6/15/20 1:32 PM, Michal Privoznik wrote:
Patch 1/3 is for demonstration purposes only. The rest patches actual problem.I was also thinking of making the PID file per bridge, i.e. include bridge name in the name (e.g. /var/run/virbr0_leaseshelper.pid) to decrease pressure on the single PID file. However, DHCP transactions are not that frequent IMO so it's not really a bottle neck nor performance issue.
Michal Prívozník (3): *** DO NOT APPLY UPSTREAM *** leaseshelper: Wait to acquire PID file leaseshelper: Report more errors
src/network/leaseshelper.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-)
Ping? We are getting close to the freeze and while this is a bugfix and could go in, we are getting close to the release too. Michal
participants (3)
-
Daniel P. Berrangé
-
Michal Privoznik
-
Peter Krempa