Hanging migration leaving domain in paused state

Hi, We are running libvirt 8.0.0, and sometimes live migration could not finish (because the guest is dirtying the memory too fast). We implemented a monitor that increases max downtime when it observed that "Data Remaining" bumps up. But we found a strange sequence of events from the monitor, which leads to a paused domain on the destination hypervisor: The monitor sees Data Remaining bumping up and increases max downtime up to 20 seconds, but weird thing is that after a period of time, it started reporting "Data Remaining" and "Data Total" is both 0, but the migration job is still unfinished: "Migration in progress - DataTotal: 85 904728064, DataRemaining: 22201458688, TimeElapsed: 20005, MaxDowntime: 500, DirtyRate: 0" "Migration in progress - DataTotal: 85904728064, DataRemaining: 43801825280, TimeElapsed: 10005, MaxDowntime: 500, DirtyRate: 0" "Migration in progress - DataTotal: 85904728064, DataRemaining: 52382912512, TimeElapsed: 10004, MaxDowntime: 500, DirtyRate: 0" (DataRemaining bumps up, we start increasing max downtime) "Migration in progress - DataTotal: 85904728064, DataRemaining: 4219596800, TimeElapsed: 40004, MaxDowntime: 1500, DirtyRate: 0" (Last poll where we see the job info) After which monitor logs "Migration in progress - DataTotal: 0, DataRemaining: 0, TimeElapsed: 40004, MaxDowntime: 13500, DirtyRate: 0" The domain is always running on the source hypervisor but there is a paused domain on the destination hypervisor which is paused at start up. Trying to understand what might have happened: - Is this a known issue for live migrating high memory activity guests and the way we interact with libvirt? - What is the recommended way to ensure that a started live migration always run to completion if we don't care about downtime? Appreciate any help here Yangchen Ye

Sorry I was mistaken in the previous email. The ones with data remaining are for other VMs. For the hanging one, the migration job seems to always have Data Total & Data Remaining be 0. Best Yangchen Ye On Mon, Jul 21, 2025 at 12:49 PM Yangchen Ye <eikasia30@gmail.com> wrote:
Hi,
We are running libvirt 8.0.0, and sometimes live migration could not finish (because the guest is dirtying the memory too fast). We implemented a monitor that increases max downtime when it observed that "Data Remaining" bumps up. But we found a strange sequence of events from the monitor, which leads to a paused domain on the destination hypervisor:
The monitor sees Data Remaining bumping up and increases max downtime up to 20 seconds, but weird thing is that after a period of time, it started reporting "Data Remaining" and "Data Total" is both 0, but the migration job is still unfinished:
"Migration in progress - DataTotal: 85 904728064, DataRemaining: 22201458688, TimeElapsed: 20005, MaxDowntime: 500, DirtyRate: 0"
"Migration in progress - DataTotal: 85904728064, DataRemaining: 43801825280, TimeElapsed: 10005, MaxDowntime: 500, DirtyRate: 0"
"Migration in progress - DataTotal: 85904728064, DataRemaining: 52382912512, TimeElapsed: 10004, MaxDowntime: 500, DirtyRate: 0" (DataRemaining bumps up, we start increasing max downtime)
"Migration in progress - DataTotal: 85904728064, DataRemaining: 4219596800, TimeElapsed: 40004, MaxDowntime: 1500, DirtyRate: 0" (Last poll where we see the job info)
After which monitor logs
"Migration in progress - DataTotal: 0, DataRemaining: 0, TimeElapsed: 40004, MaxDowntime: 13500, DirtyRate: 0"
The domain is always running on the source hypervisor but there is a paused domain on the destination hypervisor which is paused at start up.
Trying to understand what might have happened: - Is this a known issue for live migrating high memory activity guests and the way we interact with libvirt? - What is the recommended way to ensure that a started live migration always run to completion if we don't care about downtime?
Appreciate any help here
Yangchen Ye
participants (1)
-
Yangchen Ye