On Tue, Feb 22, 2011 at 10:24:26AM -0500, Vivek Goyal wrote:
[..]
- I don't see any throttling messages. They are prefixed by "throtl". So
it seems all this IO is happening in root group. I believe it belongs
to unthrottled VM. So to me it looks that system reached in bad shape
even before throttled VMs were started.
You are taking trace of /dev/sdb and not /dev/vdisks/kernel3 etc, hence
I don't see the throttle messages. So that's fine.
- So it sounds more and more like a CFQ issue which happens in conjuction
with throttling. I will try to reproduce it.
I tried a lot but I can't reproduce the issue. So now I shall have to rely
on data from you.
- Need little more info about how did you capture the blktrace. So you
started blktrace and then started dd in parallel in all the three
VMs and immediately system freezes and these are the only logs we see
on console?
Can you please apply attached patch. This just makes CFQ output little
more verbose and run the test again and capture the trace.
- Start the trace on /dev/sdb
- Start the dd jobs in virt machines
- Wait for system to hang
- Press CTRL-C
- Make sure there were no lost events otherwise increase the size and
number of buffers.
Can you also open tracing in another window and also trace one of the
throttled dm deivces, say /dev/disks/kernel3. Following the same procedure
as above. So let the two traces run in parallel.
Thanks
Vivek
---
block/cfq-iosched.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
Index: linux-2.6/block/cfq-iosched.c
===================================================================
--- linux-2.6.orig/block/cfq-iosched.c 2011-02-22 13:23:25.000000000 -0500
+++ linux-2.6/block/cfq-iosched.c 2011-02-22 14:01:21.515363676 -0500
@@ -498,7 +498,7 @@ static inline bool cfq_bio_sync(struct b
static inline void cfq_schedule_dispatch(struct cfq_data *cfqd)
{
if (cfqd->busy_queues) {
- cfq_log(cfqd, "schedule dispatch");
+ cfq_log(cfqd, "schedule dispatch: busy_queues=%d rq_queued=%d
rq_in_driver=%d", cfqd->busy_queues, cfqd->rq_queued, cfqd->rq_in_driver);
kblockd_schedule_work(cfqd->queue, &cfqd->unplug_work);
}
}
@@ -2229,6 +2229,8 @@ static struct cfq_queue *cfq_select_queu
{
struct cfq_queue *cfqq, *new_cfqq = NULL;
+ cfq_log(cfqd, "select_queue: busy_queues=%d rq_queued=%d rq_in_driver=%d",
cfqd->busy_queues, cfqd->rq_queued, cfqd->rq_in_driver);
+
cfqq = cfqd->active_queue;
if (!cfqq)
goto new_queue;
@@ -2499,8 +2501,10 @@ static int cfq_dispatch_requests(struct
return cfq_forced_dispatch(cfqd);
cfqq = cfq_select_queue(cfqd);
- if (!cfqq)
+ if (!cfqq) {
+ cfq_log(cfqd, "select: no cfqq selected");
return 0;
+ }
/*
* Dispatch a request from this cfqq, if it is allowed
@@ -3359,7 +3363,7 @@ static void cfq_insert_request(struct re
struct cfq_data *cfqd = q->elevator->elevator_data;
struct cfq_queue *cfqq = RQ_CFQQ(rq);
- cfq_log_cfqq(cfqd, cfqq, "insert_request");
+ cfq_log_cfqq(cfqd, cfqq, "insert_request: busy_queues=%d rq_queued=%d
rq_in_driver=%d", cfqd->busy_queues, cfqd->rq_queued, cfqd->rq_in_driver);
cfq_init_prio_data(cfqq, RQ_CIC(rq)->ioc);
rq_set_fifo_time(rq, jiffies + cfqd->cfq_fifo_expire[rq_is_sync(rq)]);