[libvirt] Libvir JNA report SIGSEGV

Hi, I try to verify the JNA with concurrent situation but meet some problems. The following is my example code: public static void testcase1() throws LibvirtException { Connect conn=null; Connect conn1=null; //connect to the hypervisor conn = new Connect("esx://10.74.125.68:443/?no_verify=1&transport=https", new ConnectAuthDefault(), 0); System.out.println(conn.getVersion()); //connect to the hypervisor conn1 = new Connect("esx://10.74.125.90:443/?no_verify=1&transport=https", new ConnectAuthDefault(), 0); System.out.println(conn1.getVersion()); while(true) { int[] array = new int[100000000]; Long version = conn.getVersion(); Long version1 = conn1.getVersion(); try { Thread.sleep(1000); } catch(Exception e) { } } } When I add line "int[] array = new int[100000000]", then the following error will be generated very quickly: # An unexpected error has been detected by Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x0000003f9b07046e, pid=30049, tid=1109510464 # # Java VM: OpenJDK 64-Bit Server VM (1.6.0-b09 mixed mode linux-amd64) # Problematic frame: # C [libc.so.6+0x7046e] # # An error report file with more information is saved as: I have tried to write the similar code as following. It works well. static void virXenBasic_TC001(void) { virConnectPtr conn = NULL; virConnectPtr conn1 = NULL; unsigned long version = 0; unsigned long version1 = 0; char *hostname = NULL; conn = virConnectOpenAuth("esx://10.74.125.21/?no_verify=1", virConnectAuthPtrDefault, 0); if (conn == NULL) { fprintf(stderr, "Failed to open connection to qemu:///system\n"); return; } conn1 = virConnectOpenAuth("esx://192.168.119.40/?no_verify=1", virConnectAuthPtrDefault, 0); if (conn1 == NULL) { fprintf(stderr, "Failed to open connection to qemu:///system\n"); return; } while(true) { hostname = malloc(sizeof(char) * 100000000); virConnectGetVersion(conn, &version); virConnectGetVersion(conn, &version1); free(hostname); sleep(1); } return; } B.R. Benjamin Wang

On Wed, Sep 05, 2012 at 08:59:07AM +0000, Benjamin Wang (gendwang) wrote:
Hi, I try to verify the JNA with concurrent situation but meet some problems. The following is my example code: public static void testcase1() throws LibvirtException { Connect conn=null; Connect conn1=null;
//connect to the hypervisor conn = new Connect("esx://10.74.125.68:443/?no_verify=1&transport=https", new ConnectAuthDefault(), 0); System.out.println(conn.getVersion());
//connect to the hypervisor conn1 = new Connect("esx://10.74.125.90:443/?no_verify=1&transport=https", new ConnectAuthDefault(), 0); System.out.println(conn1.getVersion());
while(true) { int[] array = new int[100000000]; Long version = conn.getVersion(); Long version1 = conn1.getVersion();
try { Thread.sleep(1000); } catch(Exception e) { } } }
When I add line "int[] array = new int[100000000]", then the following error will be generated very quickly: # An unexpected error has been detected by Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x0000003f9b07046e, pid=30049, tid=1109510464 # # Java VM: OpenJDK 64-Bit Server VM (1.6.0-b09 mixed mode linux-amd64) # Problematic frame: # C [libc.so.6+0x7046e] # # An error report file with more information is saved as:
I have tried to write the similar code as following. It works well. static void virXenBasic_TC001(void) { virConnectPtr conn = NULL; virConnectPtr conn1 = NULL; unsigned long version = 0; unsigned long version1 = 0; char *hostname = NULL;
conn = virConnectOpenAuth("esx://10.74.125.21/?no_verify=1", virConnectAuthPtrDefault, 0); if (conn == NULL) { fprintf(stderr, "Failed to open connection to qemu:///system\n"); return; }
conn1 = virConnectOpenAuth("esx://192.168.119.40/?no_verify=1", virConnectAuthPtrDefault, 0); if (conn1 == NULL) { fprintf(stderr, "Failed to open connection to qemu:///system\n"); return; }
while(true) { hostname = malloc(sizeof(char) * 100000000); virConnectGetVersion(conn, &version); virConnectGetVersion(conn, &version1); free(hostname); sleep(1); } return; }
Maybe you need to increase the stack or memory size of you java process or something, that doesn't look related to libvirt at all in my opinion. Well maybe the bindings fails somewhere at checking for an allocation error, but is it in JNA ? Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

Hi, The problem only occurs in JNA part. The pure c libvirt works well. Even If I only create a connection outside of the loop, the problem can still happen. The following is the easiest problem to reproduce this problem public static void testcase1() throws LibvirtException { Connect conn=null; //connect to the hypervisor conn = new Connect("esx://10.74.125.69:443/?no_verify=1&transport=https", new ConnectAuthDefault(), 0); while(true) { int[] array = new int[100000000]; try { Thread.sleep(1000); } catch(Exception e){} } } B.R. Benjamin Wang -----Original Message----- From: Daniel Veillard [mailto:veillard@redhat.com] Sent: 2012年9月6日 15:49 To: Benjamin Wang (gendwang) Cc: libvir-list@redhat.com; Yang Zhou (yangzho) Subject: Re: [libvirt] Libvir JNA report SIGSEGV On Wed, Sep 05, 2012 at 08:59:07AM +0000, Benjamin Wang (gendwang) wrote:
Hi, I try to verify the JNA with concurrent situation but meet some problems. The following is my example code: public static void testcase1() throws LibvirtException { Connect conn=null; Connect conn1=null;
//connect to the hypervisor conn = new Connect("esx://10.74.125.68:443/?no_verify=1&transport=https", new ConnectAuthDefault(), 0); System.out.println(conn.getVersion());
//connect to the hypervisor conn1 = new Connect("esx://10.74.125.90:443/?no_verify=1&transport=https", new ConnectAuthDefault(), 0); System.out.println(conn1.getVersion());
while(true) { int[] array = new int[100000000]; Long version = conn.getVersion(); Long version1 = conn1.getVersion();
try { Thread.sleep(1000); } catch(Exception e) { } } }
When I add line "int[] array = new int[100000000]", then the following error will be generated very quickly: # An unexpected error has been detected by Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x0000003f9b07046e, pid=30049, tid=1109510464 # # Java VM: OpenJDK 64-Bit Server VM (1.6.0-b09 mixed mode linux-amd64) # Problematic frame: # C [libc.so.6+0x7046e] # # An error report file with more information is saved as:
I have tried to write the similar code as following. It works well. static void virXenBasic_TC001(void) { virConnectPtr conn = NULL; virConnectPtr conn1 = NULL; unsigned long version = 0; unsigned long version1 = 0; char *hostname = NULL;
conn = virConnectOpenAuth("esx://10.74.125.21/?no_verify=1", virConnectAuthPtrDefault, 0); if (conn == NULL) { fprintf(stderr, "Failed to open connection to qemu:///system\n"); return; }
conn1 = virConnectOpenAuth("esx://192.168.119.40/?no_verify=1", virConnectAuthPtrDefault, 0); if (conn1 == NULL) { fprintf(stderr, "Failed to open connection to qemu:///system\n"); return; }
while(true) { hostname = malloc(sizeof(char) * 100000000); virConnectGetVersion(conn, &version); virConnectGetVersion(conn, &version1); free(hostname); sleep(1); } return; }
Maybe you need to increase the stack or memory size of you java process or something, that doesn't look related to libvirt at all in my opinion. Well maybe the bindings fails somewhere at checking for an allocation error, but is it in JNA ? Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Thu, Sep 06, 2012 at 07:53:24AM +0000, Benjamin Wang (gendwang) wrote:
Hi, The problem only occurs in JNA part. The pure c libvirt works well. Even If I only create a connection outside of the loop, the problem can still happen. The following is the easiest problem to reproduce this problem
public static void testcase1() throws LibvirtException { Connect conn=null;
//connect to the hypervisor conn = new Connect("esx://10.74.125.69:443/?no_verify=1&transport=https", new ConnectAuthDefault(), 0);
while(true) { int[] array = new int[100000000];
try { Thread.sleep(1000); } catch(Exception e){} } }
Then it's a java bug. The loop doesn't call or use libvirt in any way. If it crashes in the loop it's java crashing to me ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

Hi, Actually I also did another test as following. When I comment the "new Connet", the program works well. So this is the problem related to Libvirt JNA. If I manually run the garbage collection for this program, it still works well. But if I run the garbage collection for the last problem, It will crash. I guess this problem is caused by ConnectAuth callback. When garbage collection is executed, the callback memory is moved. B.R. Benjamin Wang
public static void testcase1() throws LibvirtException { while(true) { int[] array = new int[100000000];
try { Thread.sleep(1000); } catch(Exception e){} } }
-----Original Message----- From: Daniel Veillard [mailto:veillard@redhat.com] Sent: 2012年9月6日 16:53 To: Benjamin Wang (gendwang) Cc: libvir-list@redhat.com Subject: Re: [libvirt] Libvir JNA report SIGSEGV On Thu, Sep 06, 2012 at 07:53:24AM +0000, Benjamin Wang (gendwang) wrote:
Hi, The problem only occurs in JNA part. The pure c libvirt works well. Even If I only create a connection outside of the loop, the problem can still happen. The following is the easiest problem to reproduce this problem
public static void testcase1() throws LibvirtException { Connect conn=null;
//connect to the hypervisor conn = new Connect("esx://10.74.125.69:443/?no_verify=1&transport=https", new ConnectAuthDefault(), 0);
while(true) { int[] array = new int[100000000];
try { Thread.sleep(1000); } catch(Exception e){} } }
Then it's a java bug. The loop doesn't call or use libvirt in any way. If it crashes in the loop it's java crashing to me ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Thu, Sep 06, 2012 at 09:06:14AM +0000, Benjamin Wang (gendwang) wrote:
Hi, Actually I also did another test as following. When I comment the "new Connet", the program works well. So this is the problem related to Libvirt JNA. If I manually run the garbage collection for this program, it still works well. But if I run the garbage collection for the last problem, It will crash. I guess this problem is caused by ConnectAuth callback. When garbage collection is executed, the callback memory is moved.
Okay, maybe some memory need to be pinned in some ways, I take patches ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

Hi, I have looked into the code for several days. But I didn't find the root cause. Because even if I only call "new Connect", the problem will occur. So this should be related to Connect.java or ConnectAuthDefault.java. Would you take a quick at the issue and give some prompt? Then I can try to fix this. Thanks! B.R. Benjamin Wang -----Original Message----- From: Daniel Veillard [mailto:veillard@redhat.com] Sent: 2012年9月6日 19:05 To: Benjamin Wang (gendwang) Cc: libvir-list@redhat.com Subject: Re: [libvirt] Libvir JNA report SIGSEGV On Thu, Sep 06, 2012 at 09:06:14AM +0000, Benjamin Wang (gendwang) wrote:
Hi, Actually I also did another test as following. When I comment the "new Connet", the program works well. So this is the problem related to Libvirt JNA. If I manually run the garbage collection for this program, it still works well. But if I run the garbage collection for the last problem, It will crash. I guess this problem is caused by ConnectAuth callback. When garbage collection is executed, the callback memory is moved.
Okay, maybe some memory need to be pinned in some ways, I take patches ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

Hi, The problem is located. The root cause is that esxConnectToHost/esxConnectToVCenter method defined in esx_driver.c will collect the username and password. JNA will allocate the memory for username and password. But esxConnectToHost/esxConnectToVCenter will free the memory allocated by Java as following defined in esxConnectToHost/esxConnectToVCenter: VIR_FREE(username); VIR_FREE(unescapedPassword); When JVM run the GC, it will crash because of dual free. If I comment these two lines, the system works well. But I think this is not a good solution. What about your opinion? B.R. Benjamin Wang -----Original Message----- From: Benjamin Wang (gendwang) Sent: 2012年9月6日 21:43 To: 'veillard@redhat.com' Cc: libvir-list@redhat.com Subject: RE: [libvirt] Libvir JNA report SIGSEGV Hi, I have looked into the code for several days. But I didn't find the root cause. Because even if I only call "new Connect", the problem will occur. So this should be related to Connect.java or ConnectAuthDefault.java. Would you take a quick at the issue and give some prompt? Then I can try to fix this. Thanks! B.R. Benjamin Wang -----Original Message----- From: Daniel Veillard [mailto:veillard@redhat.com] Sent: 2012年9月6日 19:05 To: Benjamin Wang (gendwang) Cc: libvir-list@redhat.com Subject: Re: [libvirt] Libvir JNA report SIGSEGV On Thu, Sep 06, 2012 at 09:06:14AM +0000, Benjamin Wang (gendwang) wrote:
Hi, Actually I also did another test as following. When I comment the "new Connet", the program works well. So this is the problem related to Libvirt JNA. If I manually run the garbage collection for this program, it still works well. But if I run the garbage collection for the last problem, It will crash. I guess this problem is caused by ConnectAuth callback. When garbage collection is executed, the callback memory is moved.
Okay, maybe some memory need to be pinned in some ways, I take patches ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/
participants (2)
-
Benjamin Wang (gendwang)
-
Daniel Veillard