[PATCH 0/1] qemu_tpm: Get swtpm pid without binary validation

Hi all and Happy New Year! My name is Vasiliy, I am an engineer at SUSE. I was playing around with TPM in libvirt and trying to enable it in KubeVirt. With the emulator I was always getting "swtpm failed to start" internal error. After debugging the issue I found that the problem was not actually with starting the emulator but rather with retrieving the PID. The code in libvirt currently verifies that /proc/[pid]/exe points to the correct swtpm binary. In my case an attempt to dereference the symlink from procfs always resulted in EACCES. Eventually I found this issue [1]. It appears that libvirt needs CAP_SYS_PTRACE otherwise it will not be able to access the exe link (even if run as root). This can also be observed with the following reproducer: $ docker run -it --rm --security-opt apparmor:unconfined --security-opt seccomp:unconfined busybox / # adduser -D test / # su - test ~ $ sleep infinity & ~ $ exit / # stat /proc/$(pidof sleep)/exe File: stat: /proc/10/exe: cannot read link: Permission denied Size: 0 Blocks: 0 IO Block: 1024 symbolic link Device: 6eh/110d Inode: 187271 Links: 1 Access: (0777/lrwxrwxrwx) Uid: ( 1000/ test) Gid: ( 1000/ test) Access: 2022-01-03 06:52:39.480790247 +0000 Modify: 2022-01-03 06:52:39.480790247 +0000 Change: 2022-01-03 06:52:39.480790247 +0000 $ docker run -it --rm --security-opt apparmor:unconfined --security-opt seccomp:unconfined --cap-add sys_ptrace busybox / # adduser -D test / # su - test ~ $ sleep infinity & ~ $ exit / # stat /proc/$(pidof sleep)/exe File: '/proc/10/exe' -> '/bin/sleep' Size: 0 Blocks: 0 IO Block: 1024 symbolic link Device: 6eh/110d Inode: 195011 Links: 1 Access: (0777/lrwxrwxrwx) Uid: ( 1000/ test) Gid: ( 1000/ test) Access: 2022-01-03 07:13:28.003224653 +0000 Modify: 2022-01-03 07:13:28.003224653 +0000 Change: 2022-01-03 07:13:28.003224653 +0000 I tried to adapt the function that retrieves swtpm PID so it also covers the usecase when libvirt is run in a container without ptrace capability. The patch solved the issue for me and I verified that the error is no more reproducible. So I wanted to propose that solution to handle the issue. Or maybe someone can suggest a better alternative which would be more suitable? Would appreciate any feedback. Thanks. [1] https://github.com/moby/moby/issues/40713 Vasiliy Ulyanov (1): qemu_tpm: Get swtpm pid without binary validation src/qemu/qemu_tpm.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) -- 2.34.1

Access to /proc/[pid]/exe may be restricted in certain environments (e.g. in containers) and any attempt to stat(2) or readlink(2) the file will result in 'permission denied' error if the calling process does not have CAP_SYS_PTRACE capability. According to proc(5) manpage: Permission to dereference or read (readlink(2)) this symbolic link is governed by a ptrace access mode PTRACE_MODE_READ_FSCREDS check; see ptrace(2). If the first call to virPidFileReadPathIfAlive fails with EACCES try to call it one more time without specifyng swtpm binary path in order to avoid dereferencing the symlink. Signed-off-by: Vasiliy Ulyanov <vulyanov@suse.de> --- src/qemu/qemu_tpm.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_tpm.c b/src/qemu/qemu_tpm.c index 7e7b01768e..9c80e15e9b 100644 --- a/src/qemu/qemu_tpm.c +++ b/src/qemu/qemu_tpm.c @@ -261,10 +261,17 @@ qemuTPMEmulatorGetPid(const char *swtpmStateDir, g_autofree char *swtpm = virTPMGetSwtpm(); g_autofree char *pidfile = qemuTPMEmulatorCreatePidFilename(swtpmStateDir, shortName); + int rc; + if (!pidfile) return -1; - if (virPidFileReadPathIfAlive(pidfile, pid, swtpm) < 0) + rc = virPidFileReadPathIfAlive(pidfile, pid, swtpm); + /* If access to /proc/[pid]/exe is restricted then skip the validation of + * swtpm binary. */ + if (rc < 0 && virLastErrorIsSystemErrno(EACCES)) + rc = virPidFileReadPathIfAlive(pidfile, pid, NULL); + if (rc < 0) return -1; return 0; -- 2.34.1

On 1/3/22 08:56, Vasiliy Ulyanov wrote:
Access to /proc/[pid]/exe may be restricted in certain environments (e.g. in containers) and any attempt to stat(2) or readlink(2) the file will result in 'permission denied' error if the calling process does not have CAP_SYS_PTRACE capability. According to proc(5) manpage:
Permission to dereference or read (readlink(2)) this symbolic link is governed by a ptrace access mode PTRACE_MODE_READ_FSCREDS check; see ptrace(2).
If the first call to virPidFileReadPathIfAlive fails with EACCES try to call it one more time without specifyng swtpm binary path in order to avoid dereferencing the symlink.
Signed-off-by: Vasiliy Ulyanov <vulyanov@suse.de> --- src/qemu/qemu_tpm.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_tpm.c b/src/qemu/qemu_tpm.c index 7e7b01768e..9c80e15e9b 100644 --- a/src/qemu/qemu_tpm.c +++ b/src/qemu/qemu_tpm.c @@ -261,10 +261,17 @@ qemuTPMEmulatorGetPid(const char *swtpmStateDir, g_autofree char *swtpm = virTPMGetSwtpm(); g_autofree char *pidfile = qemuTPMEmulatorCreatePidFilename(swtpmStateDir, shortName); + int rc; + if (!pidfile) return -1;
- if (virPidFileReadPathIfAlive(pidfile, pid, swtpm) < 0) + rc = virPidFileReadPathIfAlive(pidfile, pid, swtpm); + /* If access to /proc/[pid]/exe is restricted then skip the validation of + * swtpm binary. */ + if (rc < 0 && virLastErrorIsSystemErrno(EACCES)) + rc = virPidFileReadPathIfAlive(pidfile, pid, NULL); + if (rc < 0) return -1;
return 0;
Firstly, this fixes the problem only for swtpm case, but there is one other place where the same problem can happen (qemuVhostUserGPUGetPid()).
From conceptual POV, what may now happen is that when PID wrapped we may wrongly detect another process as swtpm.
I'm wondering whether we can ditch the @binPath argument of virPidFileReadPathIfAlive() completely. A pid file should be locked by the process owning it [1], therefore what we could do is read given file and try to lock it. If we succeeded it means that the process is dead and we read a stale PID. If locking fails then the process is still running and the PID we read is valid. 1: This is how libvirt treats pidfiles (see virPidFileAcquirePath()). Question is whether other tools which use their own pidfiles do the same. For instance, swtpm is given '--pid file=/some/path' argument but I haven't checked whether it behaves as expected. Also, to make things worse there are at least two types of file locks on Linux and both are independent of each other, so question then is what type of lock does given tool use. But this could all be mitigated by using virCommandSetPidFile() when spawning the command instead of passing arbitrary --pid arguments. Michal

Hi Michal, On 11/01/2022 11:00, Michal Prívozník wrote:
Firstly, this fixes the problem only for swtpm case, but there is one other place where the same problem can happen (qemuVhostUserGPUGetPid()).
Sure, I can also try to fix the vhost gpu case. Was mostly focused on tpm hence did not look at other places.
From conceptual POV, what may now happen is that when PID wrapped we may wrongly detect another process as swtpm.
Right, valid point...
I'm wondering whether we can ditch the @binPath argument of virPidFileReadPathIfAlive() completely. A pid file should be locked by the process owning it [1], therefore what we could do is read given file and try to lock it. If we succeeded it means that the process is dead and we read a stale PID. If locking fails then the process is still running and the PID we read is valid.
1: This is how libvirt treats pidfiles (see virPidFileAcquirePath()). Question is whether other tools which use their own pidfiles do the same. For instance, swtpm is given '--pid file=/some/path' argument but I haven't checked whether it behaves as expected. Also, to make things worse there are at least two types of file locks on Linux and both are independent of each other, so question then is what type of lock does given tool use.
Unfortunately swtpm does not lock the pid file at all :(
But this could all be mitigated by using virCommandSetPidFile() when spawning the command instead of passing arbitrary --pid arguments.
So I can drop --pid and --daemon args from swtpm cmd and add virCommandSetPidFile(cmd, pidfile); virCommandDaemonize(cmd); Then in virPidFileReadPathIfAlive() I can add a lock check. Hm... Nice, that seems like a good way to go. -- Vasily Ulyanov <vulyanov@suse.de> Software Engineer, SUSE Labs Core

On 1/12/22 08:51, Vasily Ulyanov wrote:
Hi Michal,
So I can drop --pid and --daemon args from swtpm cmd and add
virCommandSetPidFile(cmd, pidfile); virCommandDaemonize(cmd);
Then in virPidFileReadPathIfAlive() I can add a lock check. Hm... Nice, that seems like a good way to go.
Correct. That's common pattern. However, be aware that using virCommandDaemonize() makes the subsequent virCommandRun() return almost instantly, even before the actual binary that was intended to run is executed (if scheduler aligns things "well"). Therefore, checking for the child process is a bit trickier. But the pidfile is written before virCommandRun() returns, which is good. Moreover, it's not possible to use virCommandSetErrorBuffer() nor virCommandSetOutputBuffer() with virCommandDaemonize(). However, we have examples of how to use the pattern you suggested in our code. For instance qemuProcessStartManagedPRDaemon() or qemuSlirpStart(). You can find more by grepping for virCommandDaemonize(). Let me know if you need any help. Michal
participants (3)
-
Michal Prívozník
-
Vasiliy Ulyanov
-
Vasily Ulyanov