I'm using libvirt under Debian 12 (9.0.0-4+deb12u2 w/qemu
7.2+dfsg-7+deb12u12).
I have a vm using sr-iov, and configured it with a failover macvtap
interface so I could live migrate it. However, there is a significant
delay at the end of the migration resulting in a lot of lost traffic. If
I only have the macvtap interface, migration completes immediately at
the end of the transfer of memory with no loss of traffic.
I enabled debug logging, and found the following. On the source system,
it logs that the system is paused for the cutover:
2025-04-30 01:08:12.526+0000: 1696180: debug :
qemuMigrationAnyCompleted:1957 : Migration paused before switchover
at that point, for almost a minute, the source system just keeps
printing the same statistics:
2025-04-30 01:08:12.923+0000: 1696272: info :
qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_REPLY:
mon=0x7f8fdc0ad2f0 reply={"return": {"expected-downtime": 300, "status":
"device", "setup-time": 297, "total-time": 26107, "ram": {"total":
137452265472, "postcopy-requests": 0, "dirty-sync-count": 3,
"multifd-bytes": 2821784576, "pages-per-second": 297855,
"downtime-bytes": 13208, "page-size": 4096, "remaining": 0,
"postcopy-bytes": 0, "mbps": 9786.9158461538464, "transferred":
3117658825, "dirty-sync-missed-zero-copy": 0, "precopy-bytes":
295861041, "duplicate": 32874480, "dirty-pages-rate": 56, "skipped": 0,
"normal-bytes": 2804301824, "normal": 684644}}, "id": "libvirt-577"}
[...]
2025-04-30 01:09:06.290+0000: 1696272: info :
qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_REPLY:
mon=0x7f8fdc0ad2f0 reply={"return": {"expected-downtime": 300, "status":
"device", "setup-time": 297, "total-time"
: 79474, "ram": {"total": 137452265472, "postcopy-requests": 0,
"dirty-sync-count": 3, "multifd-bytes": 2821784576, "pages-per-second":
297855, "downtime-bytes": 13208, "page-size": 4096, "remaining": 0,
"postcopy-bytes"
: 0, "mbps": 9786.9158461538464, "transferred": 3117658825,
"dirty-sync-missed-zero-copy": 0, "precopy-bytes": 295861041,
"duplicate": 32874480, "dirty-pages-rate": 56, "skipped": 0,
"normal-bytes": 2804301824, "normal":
684644}}, "id": "libvirt-629"}
until finally it completes:
2025-04-30 01:09:06.327+0000: 1696272: info :
qemuMonitorJSONIOProcessLine:203 : QEMU_MONITOR_RECV_EVENT:
mon=0x7f8fdc0ad2f0 event={"timestamp": {"seconds": 1745975346,
"microseconds": 327382}, "event": "MIGRATION", "dat
a": {"status": "completed"}}
On the destination side, it says something about negotiating failover
for the network link:
2025-04-30 01:08:12.923+0000: 1384503: info :
qemuMonitorJSONIOProcessLine:203 : QEMU_MONITOR_RECV_EVENT: mon=
0x7fc7900ab2f0 event={"timestamp": {"seconds": 1745975292,
"microseconds": 922783}, "event": "FAILOVER_NEGOTIA
TED", "data": {"device-id": "ua-sr-iov-backup"}}
Then nothing happens for about a minute until it says it is done:
2025-04-30 01:09:06.328+0000: 1384503: debug :
qemuMonitorJSONIOProcessLine:189 : Line [{"timestamp": {"second
s": 1745975346, "microseconds": 327991}, "event": "MIGRATION", "data":
{"status": "completed"}}]
Any thoughts on what is going on here to cause this delay? It's clearly
somehow related to the sv-iov component of the migration.
What is your model of sr-iov cards? And what is the the XML configration for sr-iov?
The additional downtime could be caused by VFIO migrations. The downtime of VFIO
You can try to update to these versions or above to see if the downtime is improved.
Thanks much…