[libvirt] race condtion in getting VNC port for libvirt
 
            Hi, We are encountering a problem of race conditions for getting VNC port when VM start up. In a very powerful hypervisor, if we try to start up more than 1 VM concurrently, then there might be some VMs failed to start up due to VNC port confilict. We searched libvirt bugs and found that some one report the problem and we have give a fix, but the fix can not resolve the problem. http://osdir.com/ml/libvir-list/2010-05/msg00530.html https://build.opensuse.org/package/view_file?file=vnc-race-3.patch&packa ge=libvirt&project=Virtualization:openSUSE11.3 The bitmap test-and-set operation is not atomic, so that there still is a time-window in which a VNC port could be reserved for two caller. Your help is highly appreciated! Thanks, Guangya Liu Cloud Developer Platform Computing direct: +86-29-87607400-333 www.platform.com <http://www.platform.com/>
 
            On Sun, Nov 07, 2010 at 08:53:32PM +0800, Guangya Liu wrote:
Hi,
We are encountering a problem of race conditions for getting VNC port when VM start up.
In a very powerful hypervisor, if we try to start up more than 1 VM concurrently, then there might be some VMs failed to start up due to VNC port confilict.
We searched libvirt bugs and found that some one report the problem and we have give a fix, but the fix can not resolve the problem.
http://osdir.com/ml/libvir-list/2010-05/msg00530.html
https://build.opensuse.org/package/view_file?file=vnc-race-3.patch&packa ge=libvirt&project=Virtualization:openSUSE11.3
This patch is already merged in current GIT repos.
The bitmap test-and-set operation is not atomic, so that there still is a time-window in which a VNC port could be reserved for two caller.
The virBitmap APIs for getting & setting don't need to be atomic, because they are only called when the QEMU driver mutex is held, which ensures serialization of VM startup. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
 
            Thanks Daniel for the info. But I still have a question, since we already make the call qemudDomainCreate synced, why do we need to introduce a bitmap to resolve the problem of getting VNC port conflict error? If qemudDomainCreate is synced, then there should not any problem with VNC port. Looking forward to your reply. static virDomainPtr qemudDomainCreate(virConnectPtr conn, const char *xml, unsigned int flags) { struct qemud_driver *driver = conn->privateData; virDomainDefPtr def; virDomainObjPtr vm = NULL; virDomainPtr dom = NULL; virDomainEventPtr event = NULL; virCheckFlags(VIR_DOMAIN_START_PAUSED, NULL); qemuDriverLock(driver); <== Thanks, Guangya -----Original Message----- From: Daniel P. Berrange [mailto:berrange@redhat.com] Sent: Tuesday, November 09, 2010 9:47 PM To: Guangya Liu Cc: libvir-list@redhat.com Subject: Re: [libvirt] race condtion in getting VNC port for libvirt On Sun, Nov 07, 2010 at 08:53:32PM +0800, Guangya Liu wrote:
Hi,
We are encountering a problem of race conditions for getting VNC port
when VM start up.
In a very powerful hypervisor, if we try to start up more than 1 VM
concurrently, then there might be some VMs failed to start up due to VNC
port confilict.
We searched libvirt bugs and found that some one report the problem and
we have give a fix, but the fix can not resolve the problem.
https://build.opensuse.org/package/view_file?file=vnc-race-3.patch&packa
ge=libvirt&project=Virtualization:openSUSE11.3
This patch is already merged in current GIT repos.
The bitmap test-and-set operation is not atomic, so that there still is
a time-window in which a VNC port could be reserved for two caller.
The virBitmap APIs for getting & setting don't need to be atomic, because they are only called when the QEMU driver mutex is held, which ensures serialization of VM startup. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
 
            On Wed, Nov 10, 2010 at 01:21:24AM +0800, Guangya Liu wrote:
Thanks Daniel for the info.
But I still have a question, since we already make the call qemudDomainCreate synced, why do we need to introduce a bitmap to resolve the problem of getting VNC port conflict error? If qemudDomainCreate is synced, then there should not any problem with VNC port.
Previously, libvirt would call bind() on port numbers until it found one free, then close() the socket & launch QEMU telling it to open that port again. Thus there is a race between libvirt finding a free port with bind()+close() for the first VM, and QEMU actually opening it, where libvirt could then start looking for another port for a second VM. The libvirt driver lock doesn't solve that, because the code involves a separate QEMU process with is outside the context of the lock With the bitmap code, libvirt now records that it assigned this port number to a VM, so when libvirt goes to start a second VM it won't reuse this port, even if the first QEMU hasn't yet got around to opening it. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
participants (2)
- 
                 Daniel P. Berrange Daniel P. Berrange
- 
                 Guangya Liu Guangya Liu