
13.04.2018 00:35, John Snow wrote:
On 04/12/2018 08:26 AM, Vladimir Sementsov-Ogievskiy wrote:
1. It looks unsafe to use nbd server + backup(sync=none) on same node, synchronization is needed, like in block/replication, which uses backup_wait_for_overlapping_requests, backup_cow_request_begin, backup_cow_request_end. We have a filter driver for this thing, not yet in upstream. Is it the case that blockdev-backup sync=none can race with read requests on the NBD server?
i.e. we can get temporarily inconsistent data before the COW completes? Can you elaborate?
I'm not sure but looks possible: 1. start NBD read, find that there is a hole in temporary image, decide to read from active image (or even start read) and yield 2. guest writes to the same are (COW happens, but it doesn't help) 3. reduce point (1.), read invalid (already updated by 2.) data And similar place in block/replication, which uses backup(sync=none) too is protected from such situation.
2. If we use filter driver anyway, it may be better to not use backup at all, and do all needed things in a filter driver. if blockdev-backup sync=none isn't sufficient to get the semantics we want, it may indeed be more appropriate to just leave the entire task to a new filter node.
3. It may be interesting to implement something like READ_ONCE for NBD, which means, that we will never read these clusters again. And after such command, we don't need to copy corresponding clusters to temporary image, if guests decides to write them (as we know, that client already read them and don't going to read again). That would be a very interesting optimization indeed; but I don't think we have any kind of infrastructure for such things currently. It's almost like a TRIM on which regions need to perform COW for the BlockSnapshot.
Hmm, READ+TRIM may be used too. And trim may be naturally implemented in special filter driver. -- Best regards, Vladimir