On Fri, Jul 08, 2011 at 04:33:28PM -0600, Eric Blake wrote:
On 07/07/2011 08:17 AM, Daniel P. Berrange wrote:
> From: "Daniel P. Berrange" <berrange(a)redhat.com>
>
> If libvirtd crashes then the pidfile may not be cleaned up,
> making a later restart fail, even though the original process
> no longer exists.
>
> Instead of simply using file creation as the test for successful
> pidfile acquisition, actually take out a lock on the pidfile.
>
> To avoid races, after locking the pidfile, verify that the
> inode hasn't changed.
>
> Also make sure the unprivileged libvirtd now acquires the
> pidfile, instead of relying on the UNIX socket to ensure
> mutual exclusion
Cool idea.
> -static int daemonWritePidFile(const char *pidFile, const char *argv0)
> -{
> +static int
> +daemonAcquirePidFile(const char *argv0, const char *pidFile) {
Why'd you hoist the { to the previous line? Our convention has been
(slowly) leaning towards a function body starting on column 1.
> int fd;
> - FILE *fh;
> char ebuf[1024];
> + unsigned long long pid = (unsigned long long)getpid();
See my comments in an earlier thread about our existing assumptions that
pid_t fits in int. In fact, I would be okay with:
verify(sizeof(pid_t) <= sizeof(int));
int pid = getpid();
char pidstr[INT_BUFSIZE_BOUND(pid)];
...
snprintf(pidstr, sizeof(pidstr), "%u", pid);
> + char pidstr[INT_BUFSIZE_BOUND(pid)];
>
> if (pidFile[0] == '\0')
> return 0;
>
> - if ((fd = open(pidFile, O_WRONLY|O_CREAT|O_EXCL, 0644)) < 0) {
> - VIR_ERROR(_("Failed to open pid file '%s' : %s"),
> - pidFile, virStrerror(errno, ebuf, sizeof ebuf));
> - return -1;
> - }
> + while (1) {
> + struct stat a, b;
> + if ((fd = open(pidFile, O_WRONLY|O_CREAT, 0644)) < 0) {
> + VIR_ERROR(_("%s: Failed to open pid file '%s' :
%s"),
> + argv0, pidFile, virStrerror(errno, ebuf, sizeof ebuf));
> + return -1;
> + }
> +
> + if (fstat(fd, &b) < 0) {
> + VIR_ERROR(_("%s: Pid file '%s' disappeared: %s"),
Misleading error message. fstat can indeed fail (although such failures
are rare), but not because a file disappeared - after all, you have an
fd open to the file.
> - if (fprintf(fh, "%lu\n", (unsigned long)getpid()) < 0) {
> - VIR_ERROR(_("%s: Failed to write to pid file '%s' :
%s"),
> - argv0, pidFile, virStrerror(errno, ebuf, sizeof ebuf));
> - VIR_FORCE_FCLOSE(fh);
> - return -1;
> - }
> + snprintf(pidstr, sizeof(pidstr), "%llu", pid);
>
> - if (VIR_FCLOSE(fh) == EOF) {
> - VIR_ERROR(_("%s: Failed to close pid file '%s' : %s"),
> - argv0, pidFile, virStrerror(errno, ebuf, sizeof ebuf));
> - return -1;
> + if (safewrite(fd, pidstr, strlen(pidstr)) < 0) {
Nice conversion from FILE* to write().
> @@ -1436,6 +1453,12 @@ int main(int argc, char **argv) {
> umask(old_umask);
> }
>
> + /* Try to claim the pidfile, existing if we can't */
s/existing/exiting/
> + if ((pid_file_fd = daemonAcquirePidFile(argv[0], pid_file)) < 0) {
> + ret = VIR_DAEMON_ERR_PIDFILE;
> + goto cleanup;
> + }
> +
> if (!(srv = virNetServerNew(config->min_workers,
> config->max_workers,
> config->max_clients,
> @@ -1570,8 +1593,10 @@ cleanup:
> }
> VIR_FORCE_CLOSE(statuswrite);
> }
> - if (pid_file)
> - unlink (pid_file);
> + if (pid_file_fd != -1) {
> + unlink(pid_file);
> + VIR_FORCE_CLOSE(pid_file_fd);
Swap these two lines. Not that flock (or even libvirtd) works on mingw,
but mingw doesn't like unlink() on an open fd.
If you swap those two lines you open up a (small) race condition
in the locking scheme where you could unlink a pidfile that has now
been claimed by someone else. By unlinking while we still have it
open, we know we still hold the fcntl() lock.
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|