Re: [Libvir] Re; virDomainBlockStats

Mmm, Ok .. I've pretty much spent the last week getting Xen 3.2 to function "properly" built entirely from raw XEN sources using Xen's documentation. AIO is still a flakey as hell, one time through the loop and the instance seems to work, second time around it won't start and neither will anything else. I have file: based instances running just fine .. my guess is the unload problem is still there, but the kernel logging is gone and they've created a critical system lock-up in it's place. Is there any documentation to the effect that "file" is bad?? The official XEN documentation lists "file" as the standard and makes no mention of "aio". Currently I'm mounting my image on gluster filesystems and I'm getting 70Mb/sec so I'm not unhappy with the performance, and I've had no issues recently with crashes. ... more information would be useful if you have it. Whereas I'm happy to accept that aio is better in principle, as far as I can see although it may work for you, I've now tried it in a number of different configurations on a number of machines and it's simply unusable. tia Gareth. ----- Original Message ----- step 3.: "Daniel P. Berrange" <berrange@redhat.com> To: "Gareth Bult" <gareth@encryptec.net> Sent: 19 January 2008 17:50:01 o'clock (GMT) Europe/London Subject: Re: [Libvir] Re; virDomainBlockStats On Sat, Jan 19, 2008 at 05:18:58PM +0000, Gareth Bult wrote:
Mmm,
Interesting ..
First off, xentop doesn't display block device stats for tap:aio based systems and it does for file. Second, tap:aio generated kernel Ooops's when you shutdown a DomU.
Not exactly what I'd call mainstream (!)
That must be a flaw in Ubuntu's kernels. tap:aio works flawlessly in Fedora / RHEL and is the only supported option, because file: has catastrophic data loss issues during host crashes. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

On Sat, Jan 26, 2008 at 11:29:54PM +0000, Gareth Bult wrote:
Is there any documentation to the effect that "file" is bad?? The official XEN documentation lists "file" as the standard and makes no mention of "aio".
The upstream documentation leaves alot to be desired. First of all I should say, there is a difference between file: with PV vs file: with HVM. file: with HVM has no problem, because all IO is handled by QEMU. It is only file: with PV guests that is flawed. This is because it uses loopback devices to acess the file. Loop devices cache data writes in memory for an undefined amount of time, even if the guest requests a sycn to disk. The result is that if the host OS crashes your guest disks can loss huge amounts of data that they thought were already synced to disk. Not even a journalling FS helps, because there's no write ordering guarentees. tap:aio: addresses all these issues by doing Direct IO + Async I/O to allow you to have multiple outstanding I/O operations, to avoid the host OS buffer cache, and to provide firm guarentee that the data is on disk when the guest asks for it to be.
On Sat, Jan 19, 2008 at 05:18:58PM +0000, Gareth Bult wrote:
Mmm,
Interesting ..
First off, xentop doesn't display block device stats for tap:aio based systems and it does for file. Second, tap:aio generated kernel Ooops's when you shutdown a DomU.
Not exactly what I'd call mainstream (!)
That must be a flaw in Ubuntu's kernels. tap:aio works flawlessly in Fedora / RHEL and is the only supported option, because file: has catastrophic data loss issues during host crashes.
Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

Ok, question; Does TAP:AIO work on networked filesystems .. my testing says not. (I've tried on local filesystems and seem to have it partially working) XEN without a good underlying cluster filesystem is to say the least "limited". If you can't use AIO in this sort of environment, it also becomes limited .. and if libvirt requires this, it means libvirt can't be used on large-scale roll-outs on (certain?) network filesystems. (I'm using gluster) Interestingly (looking at the Xen list re; performance problems) I'm getting > 65Mb/sec on my DomU's .vs. 70Mb/sec on my host nodes Dom0 ... and it's proving to be reliable atm .. Be interesting to know why Xen only documents "file" given the critical nature / apparent flaws in the driver. fyi; I run root fs's off DomU's but store data on separately mounted shared filesystems .. (more efficient when it comes to fail-over / file-system self-heal in the event of a filesystem node outage) Is there "no" way of forcing a loopback interface to flush itself? Gareth. ----- Original Message ----- step 3.: "Daniel P. Berrange" <berrange@redhat.com> To: "Gareth Bult" <gareth@encryptec.net> Cc: "libvir-list" <libvir-list@redhat.com> Sent: 27 January 2008 02:14:26 o'clock (GMT) Europe/London Subject: Re: [Libvir] Re; virDomainBlockStats On Sat, Jan 26, 2008 at 11:29:54PM +0000, Gareth Bult wrote:
Is there any documentation to the effect that "file" is bad?? The official XEN documentation lists "file" as the standard and makes no mention of "aio".
The upstream documentation leaves alot to be desired. First of all I should say, there is a difference between file: with PV vs file: with HVM. file: with HVM has no problem, because all IO is handled by QEMU. It is only file: with PV guests that is flawed. This is because it uses loopback devices to acess the file. Loop devices cache data writes in memory for an undefined amount of time, even if the guest requests a sycn to disk. The result is that if the host OS crashes your guest disks can loss huge amounts of data that they thought were already synced to disk. Not even a journalling FS helps, because there's no write ordering guarentees. tap:aio: addresses all these issues by doing Direct IO + Async I/O to allow you to have multiple outstanding I/O operations, to avoid the host OS buffer cache, and to provide firm guarentee that the data is on disk when the guest asks for it to be.
On Sat, Jan 19, 2008 at 05:18:58PM +0000, Gareth Bult wrote:
Mmm,
Interesting ..
First off, xentop doesn't display block device stats for tap:aio based systems and it does for file. Second, tap:aio generated kernel Ooops's when you shutdown a DomU.
Not exactly what I'd call mainstream (!)
That must be a flaw in Ubuntu's kernels. tap:aio works flawlessly in Fedora / RHEL and is the only supported option, because file: has catastrophic data loss issues during host crashes.
Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

On Sun, Jan 27, 2008 at 02:53:46PM +0000, Gareth Bult wrote:
Ok, question;
Does TAP:AIO work on networked filesystems .. my testing says not.
It does on Fedora / RHEL at least - it should work on any FS that supports direct IO & async IO.
XEN without a good underlying cluster filesystem is to say the least "limited". If you can't use AIO in this sort of environment, it also becomes limited .. and if libvirt requires this, it means libvirt can't be used on large-scale roll-outs on (certain?) network filesystems.
We need to fix libvirt so that stats work with file: based devices too. We just need to figure out the logic required in order to generate the correct device ID.
Interestingly (looking at the Xen list re; performance problems) I'm getting > 65Mb/sec on my DomU's .vs. 70Mb/sec on my host nodes Dom0 ... and it's proving to be reliable atm ..
You ought to be able to get within 5% of host performance when using either phy: (for real block devs) or tap:aio:. NB make sure you do *not* use sparse files - fully allocate the file if you want good performance. Sparse file have terrible performance characteristics as blocks as allocated on-demand which causes metadata syncs of the underlying host FS, which serializes all I/O operations.
Be interesting to know why Xen only documents "file" given the critical nature / apparent flaws in the driver.
Xen documentation leeaves alot to be desired :-(
Is there "no" way of forcing a loopback interface to flush itself?
Unfortunately not - its inherant in the impl of loop devices. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

Mmm, I will have to experiment to see what happens by forcing failures and measuring the damage .. fyi; I'm pretty much 100% sure aio does not work on "gluster" .. indeed it seems this may be my "main" problem with aio as now I've moved some testing kit onto local partitions, I am getting some better results. That said, I'm getting of the order of within 5% native speed over gluster, which seems to be approaching that of local partitions .. (using "file:" .. AND all filesystems were created as sparse ..) [not really an issue as we *need* a cluster fs for migration] It sounds like I might have to hack the loopback driver and force a regular sync out of it .. however, testing first .. Also (!) our "accidental" policy of storing critical data (DB's etc) outside of the DomU's on raw network filesystems has probably been rather fortunate ... :) Note; my hack to libvirt to work with file: seems to be 100% across my particular platform ... Thanks, Gareth. ----- Original Message ----- step 3.: "Daniel P. Berrange" <berrange@redhat.com> To: "Gareth Bult" <gareth@encryptec.net> Cc: "libvir-list" <libvir-list@redhat.com> Sent: 27 January 2008 17:03:09 o'clock (GMT) Europe/London Subject: Re: [Libvir] Re; virDomainBlockStats On Sun, Jan 27, 2008 at 02:53:46PM +0000, Gareth Bult wrote:
Ok, question;
Does TAP:AIO work on networked filesystems .. my testing says not.
It does on Fedora / RHEL at least - it should work on any FS that supports direct IO & async IO.
XEN without a good underlying cluster filesystem is to say the least "limited". If you can't use AIO in this sort of environment, it also becomes limited .. and if libvirt requires this, it means libvirt can't be used on large-scale roll-outs on (certain?) network filesystems.
We need to fix libvirt so that stats work with file: based devices too. We just need to figure out the logic required in order to generate the correct device ID.
Interestingly (looking at the Xen list re; performance problems) I'm getting > 65Mb/sec on my DomU's .vs. 70Mb/sec on my host nodes Dom0 ... and it's proving to be reliable atm ..
You ought to be able to get within 5% of host performance when using either phy: (for real block devs) or tap:aio:. NB make sure you do *not* use sparse files - fully allocate the file if you want good performance. Sparse file have terrible performance characteristics as blocks as allocated on-demand which causes metadata syncs of the underlying host FS, which serializes all I/O operations.
Be interesting to know why Xen only documents "file" given the critical nature / apparent flaws in the driver.
Xen documentation leeaves alot to be desired :-(
Is there "no" way of forcing a loopback interface to flush itself?
Unfortunately not - its inherant in the impl of loop devices. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|
participants (2)
-
Daniel P. Berrange
-
Gareth Bult