On 4/28/22 3:01 PM, Daniel P. Berrangé wrote:
On Wed, Apr 27, 2022 at 11:13:20PM +0200, Claudio Fontana wrote:
> this is in preparation for a minor refactoring of the copy
> function itself out of runIO().
>
> Signed-off-by: Claudio Fontana <cfontana(a)suse.de>
> ---
> src/util/iohelper.c | 95 +++++++++++++++++++++++++--------------------
> 1 file changed, 53 insertions(+), 42 deletions(-)
>
> diff --git a/src/util/iohelper.c b/src/util/iohelper.c
> index 2c91bf4f93..c13746a547 100644
> --- a/src/util/iohelper.c
> +++ b/src/util/iohelper.c
> @@ -45,6 +45,16 @@
> # define O_DIRECT 0
> #endif
>
> +struct runIOParams {
> + bool isBlockDev;
> + bool isDirect;
> + bool isWrite;
> + int fdin;
> + const char *fdinname;
> + int fdout;
> + const char *fdoutname;
> +};
> +
> static int
> runIO(const char *path, int fd, int oflags)
> {
> @@ -53,13 +63,9 @@ runIO(const char *path, int fd, int oflags)
> size_t buflen = 1024*1024;
> intptr_t alignMask = 64*1024 - 1;
> int ret = -1;
> - int fdin, fdout;
> - const char *fdinname, *fdoutname;
> - unsigned long long total = 0;
> - bool direct = O_DIRECT && ((oflags & O_DIRECT) != 0);
> - off_t end = 0;
> + off_t total = 0;
> struct stat sb;
> - bool isBlockDev = false;
> + struct runIOParams p;
>
> #if WITH_POSIX_MEMALIGN
> if (posix_memalign(&base, alignMask + 1, buflen))
> @@ -77,34 +83,23 @@ runIO(const char *path, int fd, int oflags)
> fd, path);
> goto cleanup;
> }
> - isBlockDev = S_ISBLK(sb.st_mode);
> + p.isBlockDev = S_ISBLK(sb.st_mode);
> + p.isDirect = O_DIRECT && (oflags & O_DIRECT);
>
> switch (oflags & O_ACCMODE) {
> case O_RDONLY:
> - fdin = fd;
> - fdinname = path;
> - fdout = STDOUT_FILENO;
> - fdoutname = "stdout";
> - /* To make the implementation simpler, we give up on any
> - * attempt to use O_DIRECT in a non-trivial manner. */
> - if (!isBlockDev && direct && ((end = lseek(fd, 0, SEEK_CUR))
!= 0)) {
> - virReportSystemError(end < 0 ? errno : EINVAL, "%s",
> - _("O_DIRECT read needs entire seekable
file"));
> - goto cleanup;
> - }
> + p.isWrite = false;
> + p.fdin = fd;
> + p.fdinname = path;
> + p.fdout = STDOUT_FILENO;
> + p.fdoutname = "stdout";
> break;
> case O_WRONLY:
> - fdin = STDIN_FILENO;
> - fdinname = "stdin";
> - fdout = fd;
> - fdoutname = path;
> - /* To make the implementation simpler, we give up on any
> - * attempt to use O_DIRECT in a non-trivial manner. */
> - if (!isBlockDev && direct && (end = lseek(fd, 0, SEEK_END))
!= 0) {
> - virReportSystemError(end < 0 ? errno : EINVAL, "%s",
> - _("O_DIRECT write needs empty seekable
file"));
> - goto cleanup;
> - }
> + p.isWrite = true;
> + p.fdin = STDIN_FILENO;
> + p.fdinname = "stdin";
> + p.fdout = fd;
> + p.fdoutname = path;
> break;
>
> case O_RDWR:
> @@ -114,6 +109,22 @@ runIO(const char *path, int fd, int oflags)
> (oflags & O_ACCMODE));
> goto cleanup;
> }
> + /* To make the implementation simpler, we give up on any
> + * attempt to use O_DIRECT in a non-trivial manner. */
> + if (!p.isBlockDev && p.isDirect) {
> + off_t off;
> + if (p.isWrite) {
> + if ((off = lseek(fd, 0, SEEK_END)) != 0) {
> + virReportSystemError(off < 0 ? errno : EINVAL, "%s",
> + _("O_DIRECT write needs empty seekable
file"));
> + goto cleanup;
> + }
> + } else if ((off = lseek(fd, 0, SEEK_CUR)) != 0) {
> + virReportSystemError(off < 0 ? errno : EINVAL, "%s",
> + _("O_DIRECT read needs entire seekable
file"));
> + goto cleanup;
> + }
I'm puzzelled about the current code doing SEEK_END in one case
and SEEK_CUR in the other case. I think this method should simply
use SEEK_CUR only as a sanity check for being ablt to use O_DIRECT.
I think the answer is in the original commit:
commit 12291656b135ed788e41dadbd2d15e613ddea9b5
Author: Eric Blake <eblake(a)redhat.com>
Date: Tue Jul 12 08:35:05 2011 -0600
save: let iohelper work on O_DIRECT fds
Required for a coming patch where iohelper will operate on O_DIRECT
fds. There, the user-space memory must be aligned to file system
boundaries (at least 512, but using page-aligned works better, and
some file systems prefer 64k). Made tougher by the fact that
VIR_ALLOC won't work on void *, but posix_memalign won't work on
char * and isn't available everywhere.
This patch makes some simplifying assumptions - namely, output
to an O_DIRECT fd will only be attempted on an empty seekable
file (hence, no need to worry about preserving existing data
on a partial block, and ftruncate will work to undo the effects
of having to round up the size of the last block written), and
input from an O_DIRECT fd will only be attempted on a complete
seekable file with the only possible short read at EOF.
----
Here the relevant part being "empty seekable file".
The check not only ensures the file is seekable, but also double checks that it is empty.
I can add a comment to make it explicit.
If we actually need to position the file offset, then the caller
should do a SEEK_END itself.
> + }
>
> while (1) {
> ssize_t got;
> @@ -124,16 +135,16 @@ runIO(const char *path, int fd, int oflags)
> * writes will be aligned.
> * In other cases using saferead reduces number of syscalls.
> */
> - if (fdin == fd && direct) {
> - if ((got = read(fdin, buf, buflen)) < 0 &&
> + if (!p.isWrite && p.isDirect) {
> + if ((got = read(p.fdin, buf, buflen)) < 0 &&
> errno == EINTR)
> continue;
> } else {
> - got = saferead(fdin, buf, buflen);
> + got = saferead(p.fdin, buf, buflen);
> }
The distinction for read vs saferead is for correctness with O_DIRECT.
We only want to use read on the plain file / blockdev, and saferead
on the socket/pipe.
Yes that's clear.
We already call fstat to let us figure out the fd type, so can use
that decide whether to use read vs saferead.
This ties into my comment on the later patch about simplifying the
parameters passed in, to remove the distinction of read vs write.
As mentioned elsewhere I don't think it can be simplified quite that way,
but I think there is an alternative that could work, will propose in the next spin.
Thanks
Claudio