[libvirt] PATCH [0/3] Volume zeroing

The following patches implement overwriting a volume with zeros when the volume is deleted. The zeroing happens before the delete, so it works for storage backends that don't support actually deleting volumes as well as the ones that do. The intent is that any future VM assigned that volume will not be able to recover any data belonging to the previous VM. It is not intended to prevent attackers with physical access to the medium from recovering data--it simply writes a single pass of zeros over the medium. If the filesystem containing the volume supports the fiemap ioctl and the volume is a sparse file, the volume zeroing code attempts to use fiemap to locate the mapped extents. It does not attempt to zero a sparse file if it cannot use fiemap. Such an operation could take an essentially unbounded amount of time. Since the volume is being deleted, zeroing has less value in the context of backends that support delete, but does provide value with storage backends that do not zero volumes if they are deleted and recreated. Dave

Allocates a buffer containing a struct with variable sized array as the last member, useful for some ioctl and other APIs that return data in this way. --- src/util/memory.h | 18 ++++++++++++++++++ 1 files changed, 18 insertions(+), 0 deletions(-) diff --git a/src/util/memory.h b/src/util/memory.h index fc9e6c1..e7effd6 100644 --- a/src/util/memory.h +++ b/src/util/memory.h @@ -76,6 +76,24 @@ void virFree(void *ptrptr); #define VIR_ALLOC_N(ptr, count) virAllocN(&(ptr), sizeof(*(ptr)), (count)) /** + * VIR_ALLOC_VAR: + * @ptr: pointer to hold address of allocated memory + * @type: element type of trailing array + * @count: number of array elements to allocate + * + * Allocate sizeof(*ptr) bytes plus an array of 'count' elements, each + * sizeof('type'). This sort of allocation is useful for receiving + * the data of certain ioctls and other APIs which return a struct in + * which the last element is an array of undefined length. The caller + * of this type of API is expected to know the length of the array + * that will be returned and allocate a suitable buffer to contain the + * returned data. + * + * Returns -1 on failure, 0 on success + */ +#define VIR_ALLOC_VAR(ptr, type, count) virAlloc(&(ptr), sizeof(*(ptr)) + (sizeof(type) * count)) + +/** * VIR_REALLOC_N: * @ptr: pointer to hold address of allocated memory * @count: number of elements to allocate -- 1.6.5.5

David Allan wrote:
Allocates a buffer containing a struct with variable sized array as the last member, useful for some ioctl and other APIs that return data in this way. --- src/util/memory.h | 18 ++++++++++++++++++ 1 files changed, 18 insertions(+), 0 deletions(-)
diff --git a/src/util/memory.h b/src/util/memory.h index fc9e6c1..e7effd6 100644 --- a/src/util/memory.h +++ b/src/util/memory.h @@ -76,6 +76,24 @@ void virFree(void *ptrptr); #define VIR_ALLOC_N(ptr, count) virAllocN(&(ptr), sizeof(*(ptr)), (count))
/** + * VIR_ALLOC_VAR: + * @ptr: pointer to hold address of allocated memory + * @type: element type of trailing array + * @count: number of array elements to allocate + * + * Allocate sizeof(*ptr) bytes plus an array of 'count' elements, each + * sizeof('type'). This sort of allocation is useful for receiving + * the data of certain ioctls and other APIs which return a struct in + * which the last element is an array of undefined length. The caller + * of this type of API is expected to know the length of the array + * that will be returned and allocate a suitable buffer to contain the + * returned data. + * + * Returns -1 on failure, 0 on success
Hi Dave, This sounds like what at least gcc calls flexible array members, so you might want to let the macro name and/or description reflect that. Incidentally, there's an autoconf macro to detect whether a C compiler supports them: -- Macro: AC_C_FLEXIBLE_ARRAY_MEMBER If the C compiler supports flexible array members, define `FLEXIBLE_ARRAY_MEMBER' to nothing; otherwise define it to 1. That way, a declaration like this: struct s { size_t n_vals; double val[FLEXIBLE_ARRAY_MEMBER]; }; ...
+ */ +#define VIR_ALLOC_VAR(ptr, type, count) virAlloc(&(ptr), sizeof(*(ptr)) + (sizeof(type) * count)) + +/** * VIR_REALLOC_N: * @ptr: pointer to hold address of allocated memory * @count: number of elements to allocate
This new allocator should detect overflow, similarly to how virReallocN does, in case the product would overflow size_t: int virReallocN(void *ptrptr, size_t size, size_t count) { ... if (xalloc_oversized(count, size)) { errno = ENOMEM; return -1; } Finally, if you manage to leave it as a macro, you'll want to parenthesize "count" on the RHS. Otherwise, passing e.g., "n + 1" as the count will allocate too little space for any type with size larger than 1.

--- configure.ac | 6 ++++++ src/Makefile.am | 7 +++++++ 2 files changed, 13 insertions(+), 0 deletions(-) diff --git a/configure.ac b/configure.ac index 29c6396..68786c2 100644 --- a/configure.ac +++ b/configure.ac @@ -1438,6 +1438,12 @@ AM_CONDITIONAL([WITH_STORAGE_DISK], [test "$with_storage_disk" = "yes"]) AC_SUBST([LIBPARTED_CFLAGS]) AC_SUBST([LIBPARTED_LIBS]) +AC_CHECK_HEADER([linux/fiemap.h],[ + have_fiemap=yes + AC_DEFINE([HAVE_FIEMAP],[],[fiemap is available]) + ],[]) +AM_CONDITIONAL([HAVE_FIEMAP], [test "$have_fiemap" = "yes"]) + dnl dnl check for libcurl (ESX) dnl diff --git a/src/Makefile.am b/src/Makefile.am index 3232256..92b1b7c 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -263,6 +263,9 @@ STORAGE_DRIVER_MPATH_SOURCES = \ STORAGE_DRIVER_DISK_SOURCES = \ storage/storage_backend_disk.h storage/storage_backend_disk.c +STORAGE_DRIVER_FIEMAP_SOURCES = \ + storage/storage_driver_fiemap.c + STORAGE_HELPER_DISK_SOURCES = \ storage/parthelper.c @@ -661,6 +664,10 @@ if WITH_STORAGE_DISK libvirt_driver_storage_la_SOURCES += $(STORAGE_DRIVER_DISK_SOURCES) endif +if HAVE_FIEMAP +libvirt_driver_storage_la_SOURCES += $(STORAGE_DRIVER_FIEMAP_SOURCES) +endif + if WITH_NODE_DEVICES # Needed to keep automake quiet about conditionals if WITH_DRIVER_MODULES -- 1.6.5.5

* If the appropriate flag is specified to vol delete, zero out the volume before deleting it. * If the volume is a sparse file and the fiemap ioctl is available, use fiemap to locate the volume's extents. --- src/storage/storage_driver.c | 125 +++++++++++++++++++++++++++++++++ src/storage/storage_driver.h | 20 +++++ src/storage/storage_driver_fiemap.c | 132 +++++++++++++++++++++++++++++++++++ 3 files changed, 277 insertions(+), 0 deletions(-) create mode 100644 src/storage/storage_driver_fiemap.c diff --git a/src/storage/storage_driver.c b/src/storage/storage_driver.c index 6b1045a..661c412 100644 --- a/src/storage/storage_driver.c +++ b/src/storage/storage_driver.c @@ -26,6 +26,9 @@ #include <stdio.h> #include <unistd.h> #include <sys/types.h> +#include <sys/param.h> +#include <fcntl.h> + #if HAVE_PWD_H #include <pwd.h> #endif @@ -1518,6 +1521,118 @@ cleanup: return ret; } + +int +storageZeroExtent(virStorageVolDefPtr vol, + struct stat *st, + int fd, + size_t extent_start, + size_t extent_length, + char *writebuf, + size_t *bytes_zeroed) +{ + int ret = -1, written; + size_t remaining, write_size; + char errbuf[64]; + + VIR_DEBUG("extent logical start: %zu len: %zu ", + extent_start, extent_length); + + if (lseek(fd, extent_start, SEEK_SET) < 0) { + VIR_ERROR("Failed to seek to position %zu in volume " + "with path '%s': '%s'", + extent_start, vol->target.path, + virStrerror(errno, errbuf, sizeof(errbuf))); + goto out; + } + + remaining = extent_length; + while (remaining > 0) { + + write_size = (st->st_blksize < remaining) ? st->st_blksize : remaining; + written = safewrite(fd, writebuf, write_size); + if (written < 0) { + VIR_ERROR("Failed to write to storage volume with path '%s': '%s' " + "(attempted to write %d bytes)", + vol->target.path, + virStrerror(errno, errbuf, sizeof(errbuf)), + write_size); + goto out; + } + + *bytes_zeroed += written; + remaining -= written; + } + + VIR_DEBUG("Wrote %zu bytes to volume with path '%s'", + *bytes_zeroed, vol->target.path); + + ret = 0; + +out: + return ret; +} + + +static int +storageVolumeZeroOut(virStorageVolDefPtr vol) +{ + int ret = -1, fd = -1; + char errbuf[64]; + struct stat st; + char *writebuf; + size_t bytes_zeroed = 0; + + VIR_DEBUG("Zeroing out volume with path '%s'", vol->target.path); + + fd = open(vol->target.path, O_RDWR); + if (fd == -1) { + VIR_ERROR("Failed to open storage volume with path '%s': '%s'", + vol->target.path, + virStrerror(errno, errbuf, sizeof(errbuf))); + goto out; + } + + memset(&st, 0, sizeof(st)); + + if (fstat(fd, &st) == -1) { + VIR_ERROR("Failed to stat storage volume with path '%s': '%s'", + vol->target.path, + virStrerror(errno, errbuf, sizeof(errbuf))); + goto out; + } + + if (VIR_ALLOC_N(writebuf, st.st_blksize) != 0) { + virReportOOMError(); + goto out; + } + + /* Don't attempt to brute force a sparse regular file; doing so + * could take an essentially unbounded amount of time. The user + * can always delete and recreate the file to zero it. */ + if (S_ISREG(st.st_mode) && st.st_blocks < (st.st_size / DEV_BSIZE)) { + ret = storageZeroByFiemap(vol, &st, fd, writebuf); + } else { + ret = storageZeroExtent(vol, + &st, + fd, + 0, + vol->allocation, + writebuf, + &bytes_zeroed); + } + +out: + VIR_FREE(writebuf); + + if (fd != -1) { + close(fd); + } + + return ret; +} + + static int storageVolumeDelete(virStorageVolPtr obj, unsigned int flags) { @@ -1563,6 +1678,16 @@ storageVolumeDelete(virStorageVolPtr obj, goto cleanup; } + /* Even if the backend doesn't support volume deletion, we can + * still zero it out; indeed, if the backend does support volume + * deletion, it's almost certain to be faster to delete & recreate + * a volume than it is to zero it out. */ + if (flags & VIR_STORAGE_VOL_DELETE_ZEROED) { + if (storageVolumeZeroOut(vol) == -1) { + goto cleanup; + } + } + if (!backend->deleteVol) { virStorageReportError(VIR_ERR_NO_SUPPORT, "%s", _("storage pool does not support vol deletion")); diff --git a/src/storage/storage_driver.h b/src/storage/storage_driver.h index 500ea02..cb837f8 100644 --- a/src/storage/storage_driver.h +++ b/src/storage/storage_driver.h @@ -28,4 +28,24 @@ int storageRegister(void); +#ifdef HAVE_FIEMAP +#define storageZeroByFiemap storageZeroByFiemapInternal +int +storageZeroByFiemapInternal(virStorageVolDefPtr vol, + struct stat *st, + int fd, + char *writebuf); +#else // #ifdef HAVE_FIEMAP +#define storageZeroByFiemap(vol, st, fd, writebuf) -1 +#endif // #ifdef HAVE_FIEMAP + +int +storageZeroExtent(virStorageVolDefPtr vol, + struct stat *st, + int fd, + size_t extent_start, + size_t extent_length, + char *writebuf, + size_t *bytes_zeroed); + #endif /* __VIR_STORAGE_DRIVER_H__ */ diff --git a/src/storage/storage_driver_fiemap.c b/src/storage/storage_driver_fiemap.c new file mode 100644 index 0000000..1984254 --- /dev/null +++ b/src/storage/storage_driver_fiemap.c @@ -0,0 +1,132 @@ +/* + * storage_driver_fiemap.c: fiemap specific code for storage driver + * + * Copyright (C) 2010 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Author: Dave Allan <dallan@redhat.com> + */ +#include <config.h> + +#include <sys/ioctl.h> +#include <linux/fiemap.h> +#include <linux/fs.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <unistd.h> +#include <inttypes.h> + +#include "virterror_internal.h" +#include "storage_driver.h" +#include "memory.h" +#include "logging.h" + +#define VIR_FROM_THIS VIR_FROM_STORAGE + +static int +storageGetNumExtents(virStorageVolDefPtr vol, + int fd, + int *num_extents) +{ + int ret = -1; + char errbuf[64]; + struct fiemap map; + + map.fm_start = 0; + map.fm_length = FIEMAP_MAX_OFFSET; + map.fm_extent_count = 0; + map.fm_flags = FIEMAP_FLAG_SYNC; + + ret = ioctl(fd, FS_IOC_FIEMAP, &map); + + if (ret != 0) { + VIR_ERROR("fiemap failed: '%s' (returned %d) " + "Flags: 0x%"PRIx32" volume path: '%s'", + strerror_r(errno, errbuf, sizeof(errbuf)), + ret, map.fm_flags, vol->target.path); + goto out; + } + + *num_extents = map.fm_mapped_extents; + + VIR_DEBUG("Volume with path '%s' has %d extents", + vol->target.path, *num_extents); + + ret = 0; + +out: + return ret; +} + + +int +storageZeroByFiemapInternal(virStorageVolDefPtr vol, + struct stat *st, + int fd, + char *writebuf ATTRIBUTE_UNUSED) +{ + int ret = -1, num_extents = 0, i; + size_t bytes_zeroed, total_zeroed = 0; + struct fiemap *fiemap_data = NULL; + + if (storageGetNumExtents(vol, fd, &num_extents) != 0) { + goto out; + } + + if (VIR_ALLOC_VAR(fiemap_data, struct fiemap_extent, num_extents) != 0) { + virReportOOMError(); + goto out; + } + + fiemap_data->fm_start = 0; + fiemap_data->fm_length = FIEMAP_MAX_OFFSET; + fiemap_data->fm_extent_count = num_extents; + fiemap_data->fm_flags = FIEMAP_FLAG_SYNC; + + ret = ioctl(fd, FS_IOC_FIEMAP, fiemap_data); + if (ret == EBADR) { + VIR_ERROR("fiemap ioctl returned %d (Flags: 0x%"PRIx32")", + ret, fiemap_data->fm_flags); + goto out; + } + + VIR_DEBUG("extent count: %"PRIu32" mapped_extents: %"PRIu32, + fiemap_data->fm_extent_count, fiemap_data->fm_mapped_extents); + + for (i = 0; i < fiemap_data->fm_mapped_extents; i++) { + + bytes_zeroed = 0; + if (storageZeroExtent(vol, + st, + fd, + fiemap_data->fm_extents[i].fe_logical, + fiemap_data->fm_extents[i].fe_length, + writebuf, + &bytes_zeroed) != 0) { + goto out; + } + + total_zeroed += bytes_zeroed; + + } + + ret = 0; + +out: + VIR_FREE(fiemap_data); + + return ret; +} -- 1.6.5.5

On 02/15/2010 10:29 PM, David Allan wrote:
* If the volume is a sparse file and the fiemap ioctl is available, use fiemap to locate the volume's extents.
What about for a sparse file doing just ftruncate (fd, 0); ftruncate (fd, st.st_size); ? It's already sparse, it doesn't hurt to make it _more_ sparse. Paolo

On 02/15/2010 04:47 PM, Paolo Bonzini wrote:
On 02/15/2010 10:29 PM, David Allan wrote:
* If the volume is a sparse file and the fiemap ioctl is available, use fiemap to locate the volume's extents.
What about for a sparse file doing just
ftruncate (fd, 0); ftruncate (fd, st.st_size);
?
It's already sparse, it doesn't hurt to make it _more_ sparse.
Paolo
That's a good point. The only thing that makes me hesitate is that I'm not certain that the file is guaranteed to contain zeros following the second call to ftruncate, and I'm having trouble coming up with the relevant spec at the moment. This question might be academic in any case, since the volume zeroing is part of volume deletion, so we're talking about the case of a volume represented by a regular file on a backend that doesn't support volume delete. I'm not sure such a case exists. If that's so, we could just skip the whole zeroing operation entirely for regular files. Dave

According to Dave Allan on 2/15/2010 9:08 PM:
ftruncate (fd, 0); ftruncate (fd, st.st_size);
That's a good point. The only thing that makes me hesitate is that I'm not certain that the file is guaranteed to contain zeros following the second call to ftruncate, and I'm having trouble coming up with the relevant spec at the moment.
For regular files, POSIX guarantees that extending a file via ftruncate will give zeros following the previous length. http://www.opengroup.org/onlinepubs/9699919799/functions/ftruncate.html "If the file previously was smaller than this size, ftruncate() shall increase the size of the file. If the file size is increased, the extended area shall appear as if it were zero-filled." But for all other file types (such as block devices), the results are unspecified by POSIX. -- Don't work too hard, make some time for fun as well! Eric Blake ebb9@byu.net

On 02/15/2010 11:29 PM, Eric Blake wrote:
According to Dave Allan on 2/15/2010 9:08 PM:
ftruncate (fd, 0); ftruncate (fd, st.st_size);
That's a good point. The only thing that makes me hesitate is that I'm not certain that the file is guaranteed to contain zeros following the second call to ftruncate, and I'm having trouble coming up with the relevant spec at the moment.
For regular files, POSIX guarantees that extending a file via ftruncate will give zeros following the previous length. http://www.opengroup.org/onlinepubs/9699919799/functions/ftruncate.html "If the file previously was smaller than this size, ftruncate() shall increase the size of the file. If the file size is increased, the extended area shall appear as if it were zero-filled."
But for all other file types (such as block devices), the results are unspecified by POSIX.
That's ok, I was only asking about behavior with regular files here. Thanks for the pointer--that's exactly what I was looking for. Dave

On 02/16/2010 05:08 AM, Dave Allan wrote:
What about for a sparse file doing just
ftruncate (fd, 0); ftruncate (fd, st.st_size);
That's a good point. The only thing that makes me hesitate is that I'm not certain that the file is guaranteed to contain zeros following the second call to ftruncate,
I think since it's sparse, it must be a regular file? If so, it would be fine as Eric pointed out.
a volume represented by a regular file on a backend that doesn't support volume delete. If that's so, we could just skip the whole zeroing operation entirely for regular files.
Looks like it... Paolo

On Mon, Feb 15, 2010 at 10:47:33PM +0100, Paolo Bonzini wrote:
On 02/15/2010 10:29 PM, David Allan wrote:
* If the volume is a sparse file and the fiemap ioctl is available, use fiemap to locate the volume's extents.
What about for a sparse file doing just
ftruncate (fd, 0); ftruncate (fd, st.st_size);
?
It's already sparse, it doesn't hurt to make it _more_ sparse.
The idea of zeroing upon delete, is that we want the currently allocated extents to be overwritten with zeros. If we truncate to 0 size, then extend to original size I imagine that would easily allow the filesystem to give you a new set of extents filled with zeros, leaving the old extents with data intact as unused space on the FS. Also, we need to make sure that this code works with physical block devices, as well as plain files. You can't truncate a block device so we'll need to write zeros in that case anyway. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On 02/16/2010 12:33 PM, Daniel P. Berrange wrote:
The idea of zeroing upon delete, is that we want the currently allocated extents to be overwritten with zeros. If we truncate to 0 size, then extend to original size I imagine that would easily allow the filesystem to give you a new set of extents filled with zeros, leaving the old extents with data intact as unused space on the FS.
Yeah, as long as you use regular files as images you're safe, but you'd expose the old data if you destroyed the partition on which the file resided and used the partition as storage for a new guest. But then in this scenario (delete file system partition and give it out as raw) you could expose information not only about other/old guests, but also about the host. So for partitions it can be even more important to provide an option to zero the partition _before_ giving it out. Currently you cannot do that with libvirt.
Also, we need to make sure that this code works with physical block devices, as well as plain files. You can't truncate a block device so we'll need to write zeros in that case anyway.
A block device won't ever be sparse---I was talking about the sparse case only. Paolo

On Tue, Feb 16, 2010 at 01:31:58PM +0100, Paolo Bonzini wrote:
On 02/16/2010 12:33 PM, Daniel P. Berrange wrote:
The idea of zeroing upon delete, is that we want the currently allocated extents to be overwritten with zeros. If we truncate to 0 size, then extend to original size I imagine that would easily allow the filesystem to give you a new set of extents filled with zeros, leaving the old extents with data intact as unused space on the FS.
Yeah, as long as you use regular files as images you're safe, but you'd expose the old data if you destroyed the partition on which the file resided and used the partition as storage for a new guest.
But then in this scenario (delete file system partition and give it out as raw) you could expose information not only about other/old guests, but also about the host. So for partitions it can be even more important to provide an option to zero the partition _before_ giving it out. Currently you cannot do that with libvirt.
There is an unused 'flags' parameter in virStorageVolCreate(), where we could/should add a VIR_STORAGE_VOL_CREATE_ZEROED parameter too for that scenario ANother option would be to add an explicit virStorageVolZero() API to allow a volume to be wiped independently of create/delete operations. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On 02/16/2010 07:36 AM, Daniel P. Berrange wrote:
On Tue, Feb 16, 2010 at 01:31:58PM +0100, Paolo Bonzini wrote:
On 02/16/2010 12:33 PM, Daniel P. Berrange wrote:
The idea of zeroing upon delete, is that we want the currently allocated extents to be overwritten with zeros. If we truncate to 0 size, then extend to original size I imagine that would easily allow the filesystem to give you a new set of extents filled with zeros, leaving the old extents with data intact as unused space on the FS.
Yeah, as long as you use regular files as images you're safe, but you'd expose the old data if you destroyed the partition on which the file resided and used the partition as storage for a new guest.
But then in this scenario (delete file system partition and give it out as raw) you could expose information not only about other/old guests, but also about the host. So for partitions it can be even more important to provide an option to zero the partition _before_ giving it out. Currently you cannot do that with libvirt.
There is an unused 'flags' parameter in virStorageVolCreate(), where we could/should add a VIR_STORAGE_VOL_CREATE_ZEROED parameter too for that scenario
ANother option would be to add an explicit virStorageVolZero() API to allow a volume to be wiped independently of create/delete operations.
I like this approach. Since the code is already written for pretty much every permutation: regular non-sparse file, regular sparse file, block device, adding the new API isn't a lot of work. That gives users the greatest flexibility for choosing the time of the zero operation, which is important since it's time consuming. Dave

On 02/16/2010 05:31 PM, Dave Allan wrote:
ANother option would be to add an explicit virStorageVolZero() API to allow a volume to be wiped independently of create/delete operations.
I like this approach. Since the code is already written for pretty much every permutation: regular non-sparse file, regular sparse file, block device, adding the new API isn't a lot of work. That gives users the greatest flexibility for choosing the time of the zero operation, which is important since it's time consuming.
Yeah, that'd be a nice feature. Paolo

On 02/16/2010 06:33 AM, Daniel P. Berrange wrote:
On Mon, Feb 15, 2010 at 10:47:33PM +0100, Paolo Bonzini wrote:
On 02/15/2010 10:29 PM, David Allan wrote:
* If the volume is a sparse file and the fiemap ioctl is available, use fiemap to locate the volume's extents.
What about for a sparse file doing just
ftruncate (fd, 0); ftruncate (fd, st.st_size);
?
It's already sparse, it doesn't hurt to make it _more_ sparse.
The idea of zeroing upon delete, is that we want the currently allocated extents to be overwritten with zeros. If we truncate to 0 size, then extend to original size I imagine that would easily allow the filesystem to give you a new set of extents filled with zeros, leaving the old extents with data intact as unused space on the FS.
Again, just to be clear, I'm only speaking of regular files here: the situation you just described seems ok to me, since we're not trying to protect against attackers on the host. As long as we guarantee that any read to the file will return zeros following the zero operation, I think we're ok. Since the section of the POSIX spec that Eric pointed to makes that guarantee with ftrucate, that's a great option for sparse files. Can you think of any situation in which we'd want to do a complete overwrite of a sparse file? At this point, I'm not seeing any. fiemap is a nice tool, but I think we can dispense with it.
Also, we need to make sure that this code works with physical block devices, as well as plain files. You can't truncate a block device so we'll need to write zeros in that case anyway.
That's what it currently does, so we're ok there. Dave
participants (6)
-
Daniel P. Berrange
-
Dave Allan
-
David Allan
-
Eric Blake
-
Jim Meyering
-
Paolo Bonzini