[libvirt] Entering freeze for libvirt-1.0.3

I have just tagged git and pushed the tarball. The rpms for F17 are following too at the usual place: ftp://libvirt.org/libvirt/ I gave a try to the set of rpms and this seems to work fine for relatively simple tests, but of course more testing and checking on other architectures is needed ! If everything goes fine, I will make an rc2 in a couple of days and then we can decide if a final release on Friday is fine or if we need to postpone to next week, Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veillard@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/

On 2013年02月25日 19:52, Daniel Veillard wrote:
I have just tagged git and pushed the tarball. The rpms for F17 are following too at the usual place: ftp://libvirt.org/libvirt/
I gave a try to the set of rpms and this seems to work fine for relatively simple tests, but of course more testing and checking on other architectures is needed ! If everything goes fine, I will make an rc2 in a couple of days and then we can decide if a final release on Friday is fine or if we need to postpone to next week,
There are too few changes in this release for pp64. Testing it by building source for ppc64, it works well. Thanks. --Li
Daniel

On Tue, Feb 26, 2013 at 11:10:35AM +0800, Li Zhang wrote:
On 2013年02月25日 19:52, Daniel Veillard wrote:
I have just tagged git and pushed the tarball. The rpms for F17 are following too at the usual place: ftp://libvirt.org/libvirt/
I gave a try to the set of rpms and this seems to work fine for relatively simple tests, but of course more testing and checking on other architectures is needed ! If everything goes fine, I will make an rc2 in a couple of days and then we can decide if a final release on Friday is fine or if we need to postpone to next week,
There are too few changes in this release for pp64.
By "too few" do you mean that we are missing anything important ?
Testing it by building source for ppc64, it works well.
Cool, thanks for testing :-) Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veillard@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/

On 2013年02月26日 12:51, Daniel Veillard wrote:
On Tue, Feb 26, 2013 at 11:10:35AM +0800, Li Zhang wrote:
On 2013年02月25日 19:52, Daniel Veillard wrote:
I have just tagged git and pushed the tarball. The rpms for F17 are following too at the usual place: ftp://libvirt.org/libvirt/
I gave a try to the set of rpms and this seems to work fine for relatively simple tests, but of course more testing and checking on other architectures is needed ! If everything goes fine, I will make an rc2 in a couple of days and then we can decide if a final release on Friday is fine or if we need to postpone to next week, There are too few changes in this release for pp64. By "too few" do you mean that we are missing anything important ? I have sent out 2 patches which are not accepted for this release.
One patch is to fix one bug for ppc64 which is: [PATCHv2 1/1] Optimize machine option to set more options with it. Without this patch, USB controllers doesn't work well with VGA enabled for ppc64. And testing with VDSM, it will always report errors when using VGA.
Testing it by building source for ppc64, it works well. Cool, thanks for testing :-)
Daniel

On Tue, Feb 26, 2013 at 01:03:17PM +0800, Li Zhang wrote:
On 2013年02月26日 12:51, Daniel Veillard wrote:
On Tue, Feb 26, 2013 at 11:10:35AM +0800, Li Zhang wrote:
On 2013年02月25日 19:52, Daniel Veillard wrote:
I have just tagged git and pushed the tarball. The rpms for F17 are following too at the usual place: ftp://libvirt.org/libvirt/
I gave a try to the set of rpms and this seems to work fine for relatively simple tests, but of course more testing and checking on other architectures is needed ! If everything goes fine, I will make an rc2 in a couple of days and then we can decide if a final release on Friday is fine or if we need to postpone to next week, There are too few changes in this release for pp64. By "too few" do you mean that we are missing anything important ? I have sent out 2 patches which are not accepted for this release.
One patch is to fix one bug for ppc64 which is: [PATCHv2 1/1] Optimize machine option to set more options with it. Without this patch, USB controllers doesn't work well with VGA enabled for ppc64. And testing with VDSM, it will always report errors when using VGA.
Hum, i just read your patch but i don't have enough knowledge of qemu parameters to know if there is large side effect to your patch. Also you remove code which is about core and replace it with something completely different. As is and with the amount of details you provided I can't make any jugement of how bad or good those changes are. One thing is sure, bug fixes can go in, they are fine during freeze ! Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veillard@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/

On 2013年02月26日 14:04, Daniel Veillard wrote:
On Tue, Feb 26, 2013 at 01:03:17PM +0800, Li Zhang wrote:
On 2013年02月26日 12:51, Daniel Veillard wrote:
On Tue, Feb 26, 2013 at 11:10:35AM +0800, Li Zhang wrote:
On 2013年02月25日 19:52, Daniel Veillard wrote:
I have just tagged git and pushed the tarball. The rpms for F17 are following too at the usual place: ftp://libvirt.org/libvirt/
I gave a try to the set of rpms and this seems to work fine for relatively simple tests, but of course more testing and checking on other architectures is needed ! If everything goes fine, I will make an rc2 in a couple of days and then we can decide if a final release on Friday is fine or if we need to postpone to next week, There are too few changes in this release for pp64. By "too few" do you mean that we are missing anything important ? I have sent out 2 patches which are not accepted for this release.
One patch is to fix one bug for ppc64 which is: [PATCHv2 1/1] Optimize machine option to set more options with it. Without this patch, USB controllers doesn't work well with VGA enabled for ppc64. And testing with VDSM, it will always report errors when using VGA. Hum, i just read your patch but i don't have enough knowledge of qemu parameters to know if there is large side effect to your patch. Also you remove code which is about core and replace it with something completely different. Thanks so much for your reviewing.
In qemu, it provides some options for some kind of devices or hardware by -machine option. I think its purpose is to provide the configurations for machines to compare with -device xxx. There will be more and more options to be set by -machine according to QEMU's development. So, in my patch, -machine is used to be set more options not only dump_core with newer QEMU versions
As is and with the amount of details you provided I can't make any jugement of how bad or good those changes are. One thing is sure, bug fixes can go in, they are fine during freeze ! Got it, thanks. :) I will wait for more comments from mailing list.
Daniel

On 02/25/2013 12:52 PM, Daniel Veillard wrote:
I have just tagged git and pushed the tarball. The rpms for F17 are following too at the usual place: ftp://libvirt.org/libvirt/
I gave a try to the set of rpms and this seems to work fine for relatively simple tests, but of course more testing and checking on other architectures is needed ! If everything goes fine, I will make an rc2 in a couple of days and then we can decide if a final release on Friday is fine or if we need to postpone to next week,
Daniel
Can we please revert the following commits before rc2; 24aa7f8d11054b7b2e643cf3cd5c80a199764af0 (S390: Documentation for CCW address type) 0bbbd42c30543d8251536c2fa11166834c886ada (S390: domain_conf support for CCW) this is for two reasons: 1. They don't make sense without the rest of the series which is still pending review (and probably needs a rebase meanwhile) and the result already shows up on libvirt.org 2. I received feedback from someone more platform-savvy than I am, that one of the XML attribute names (schid) is misleading Reverting gives me the opportunity to provide a new series on top of 1.0.3 which can be hopefully applied in time for the next release. -- Mit freundlichen Grüßen/Kind Regards Viktor Mihajlovski IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294

On 02/26/2013 03:16 AM, Viktor Mihajlovski wrote:
Can we please revert the following commits before rc2;
24aa7f8d11054b7b2e643cf3cd5c80a199764af0 (S390: Documentation for CCW address type) 0bbbd42c30543d8251536c2fa11166834c886ada (S390: domain_conf support for CCW)
this is for two reasons: 1. They don't make sense without the rest of the series which is still pending review (and probably needs a rebase meanwhile) and the result already shows up on libvirt.org 2. I received feedback from someone more platform-savvy than I am, that one of the XML attribute names (schid) is misleading
Yes, avoiding the trap of releasing XML early, only to force ourselves into supporting that design forever, seems like a reasonable reason to revert while waiting for the more complete design to get a thorough review.
Reverting gives me the opportunity to provide a new series on top of 1.0.3 which can be hopefully applied in time for the next release.
Thanks for bringing up the question; I've reverted those patches as requested. It also lends a bit more weight to the question about DHCP <option>, which is another XML addition where we are not sure if the final design is ready for 1.0.3. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 02/27/2013 12:31 AM, Eric Blake wrote:
Yes, avoiding the trap of releasing XML early, only to force ourselves into supporting that design forever, seems like a reasonable reason to revert while waiting for the more complete design to get a thorough review.
Thanks, Eric. -- Mit freundlichen Grüßen/Kind Regards Viktor Mihajlovski IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294

I have just tagged git and pushed the tarball. The rpms for F17 are at the usual place too: ftp://libvirt.org/libvirt/ I gave a try to the set of rpms and this seems to work fine for relatively simple tests. Please give it more testing and let's keep change to git to bug fixes to minimize risks for 1.0.3. If everything goes well I will push 1.0.3 on Friday, thanks ! Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veillard@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/

On 27.02.2013 12:20, Daniel Veillard wrote:
I have just tagged git and pushed the tarball. The rpms for F17 are at the usual place too: ftp://libvirt.org/libvirt/
I gave a try to the set of rpms and this seems to work fine for relatively simple tests. Please give it more testing and let's keep change to git to bug fixes to minimize risks for 1.0.3. If everything goes well I will push 1.0.3 on Friday,
thanks !
Daniel
Maybe we want to postpone the release a bit; I was trying to find out how drop of qemu driver lock speeds up parallel startup of multiple domains, however - it is not working. I've created small test program - test4.c and run it against current HEAD. It takes arguments - domain names - which it spawns a separate thread per each domain and repeatedly starts and destroys domains. See code for more. Then, I've created 20 copies of diskless domain (attached as well): test1..test20 and run my test4 program over them: ./test4 $(for ((i=1; i<21; i++)); do echo -n "test$i "; done) I had to tweak max_client_requests=20 to really run those 20 requests in parallel. However, what I am getting instead of nice measuring of speedup is: [9722] Starting test1 (round 20) ...[9725] Starting test4 (round 20) ...[9724] Starting test3 (round 20) ...[9723] Starting test2 (round 20) ...[9726] Starting test5 (round 20) ...[9727] Starting test6 (round 20) ...[9728] Starting test7 (round 20) ...[9729] Starting test8 (round 20) ...[9730] Starting test9 (round 20) ...[9731] Starting test10 (round 20) ...[9732] Starting test11 (round 20) ...[9733] Starting test12 (round 20) ...[9734] Starting test13 (round 20) ...[9735] Starting test14 (round 20) ...[9736] Starting test15 (round 20) ...[9737] Starting test16 (round 20) ...[9738] Starting test17 (round 20) ...[9741] Starting test20 (round 20) ...[9739] Starting test18 (round 20) ...[9740] Starting test19 (round 20) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test12libvir: error : An error occurred, but the cause is unknown Unable to start domain test14libvir: error : An error occurred, but the cause is unknown Unable to start domain test7libvir: error : An error occurred, but the cause is unknown Unable to start domain test1libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test12[9733] Destroying test12 ...[9728] Destroying test7 ...[9735] Destroying test14 ...[9733] Starting test12 (round 19) ...libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test7[9728] Starting test7 (round 19) ...libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test14[9735] Starting test14 (round 19) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test18libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test1[9722] Destroying test1 ...[9722] Starting test1 (round 19) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test8libvir: error : An error occurred, but the cause is unknown Unable to start domain test15[9725] Done etc. Unfortunately, I caught this after freeze. Michal

On Wed, Feb 27, 2013 at 03:26:24PM +0100, Michal Privoznik wrote:
On 27.02.2013 12:20, Daniel Veillard wrote:
I have just tagged git and pushed the tarball. The rpms for F17 are at the usual place too: ftp://libvirt.org/libvirt/
I gave a try to the set of rpms and this seems to work fine for relatively simple tests. Please give it more testing and let's keep change to git to bug fixes to minimize risks for 1.0.3. If everything goes well I will push 1.0.3 on Friday,
thanks !
Daniel
Maybe we want to postpone the release a bit; I was trying to find out how drop of qemu driver lock speeds up parallel startup of multiple domains, however - it is not working.
I've created small test program - test4.c and run it against current HEAD. It takes arguments - domain names - which it spawns a separate thread per each domain and repeatedly starts and destroys domains. See code for more.
Then, I've created 20 copies of diskless domain (attached as well): test1..test20 and run my test4 program over them:
./test4 $(for ((i=1; i<21; i++)); do echo -n "test$i "; done)
I had to tweak max_client_requests=20 to really run those 20 requests in parallel.
However, what I am getting instead of nice measuring of speedup is:
[9722] Starting test1 (round 20) ...[9725] Starting test4 (round 20) ...[9724] Starting test3 (round 20) ...[9723] Starting test2 (round 20) ...[9726] Starting test5 (round 20) ...[9727] Starting test6 (round 20) ...[9728] Starting test7 (round 20) ...[9729] Starting test8 (round 20) ...[9730] Starting test9 (round 20) ...[9731] Starting test10 (round 20) ...[9732] Starting test11 (round 20) ...[9733] Starting test12 (round 20) ...[9734] Starting test13 (round 20) ...[9735] Starting test14 (round 20) ...[9736] Starting test15 (round 20) ...[9737] Starting test16 (round 20) ...[9738] Starting test17 (round 20) ...[9741] Starting test20 (round 20) ...[9739] Starting test18 (round 20) ...[9740] Starting test19 (round 20) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test12libvir: error : An error occurred, but the cause is unknown Unable to start domain test14libvir: error : An error occurred, but the cause is unknown Unable to start domain test7libvir: error : An error occurred, but the cause is unknown Unable to start domain test1libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test12[9733] Destroying test12 ...[9728] Destroying test7 ...[9735] Destroying test14 ...[9733] Starting test12 (round 19) ...libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test7[9728] Starting test7 (round 19) ...libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test14[9735] Starting test14 (round 19) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test18libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test1[9722] Destroying test1 ...[9722] Starting test1 (round 19) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test8libvir: error : An error occurred, but the cause is unknown Unable to start domain test15[9725] Done
etc.
Unfortunately, I caught this after freeze.
No problem, freeze is there to find bugs, you found a serious bug, let's fix it before the release. [...]
//sleep(20); sleep(1);
does sleep(20); there avoid the problem ? There is a failure in guest start but we don't get a proper description There is no urging deadline, except well for the pending patches waiting for the end of the freeze, if we can fix this by Monday, then that would be IMHO just fine. If on Monday we still don't have a solution let's discuss the status of the analysis, okay ? Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veillard@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/

On 27.02.2013 15:47, Daniel Veillard wrote:
On Wed, Feb 27, 2013 at 03:26:24PM +0100, Michal Privoznik wrote: ....
Just a side note. Running git-bisect gets me to this: a9e97e0c3057dfaefc7217b7db7fbba001cf8f0b is the first bad commit commit a9e97e0c3057dfaefc7217b7db7fbba001cf8f0b Author: Daniel P. Berrange <berrange@redhat.com> Date: Wed Feb 6 18:17:20 2013 +0000 Remove qemuDriverLock from almost everywhere With the majority of fields in the virQEMUDriverPtr struct now immutable or self-locking, there is no need for practically any methods to be using the QEMU driver lock. Only a handful of helper APIs in qemu_conf.c now need it But I think it just expose the bug we had earlier since 2 startup processes are not serialized now. Michal

On Wed, Feb 27, 2013 at 03:26:24PM +0100, Michal Privoznik wrote:
On 27.02.2013 12:20, Daniel Veillard wrote:
I have just tagged git and pushed the tarball. The rpms for F17 are at the usual place too: ftp://libvirt.org/libvirt/
I gave a try to the set of rpms and this seems to work fine for relatively simple tests. Please give it more testing and let's keep change to git to bug fixes to minimize risks for 1.0.3. If everything goes well I will push 1.0.3 on Friday,
thanks !
Daniel
Maybe we want to postpone the release a bit; I was trying to find out how drop of qemu driver lock speeds up parallel startup of multiple domains, however - it is not working.
I've created small test program - test4.c and run it against current HEAD. It takes arguments - domain names - which it spawns a separate thread per each domain and repeatedly starts and destroys domains. See code for more.
Then, I've created 20 copies of diskless domain (attached as well): test1..test20 and run my test4 program over them:
./test4 $(for ((i=1; i<21; i++)); do echo -n "test$i "; done)
I had to tweak max_client_requests=20 to really run those 20 requests in parallel.
However, what I am getting instead of nice measuring of speedup is:
[9722] Starting test1 (round 20) ...[9725] Starting test4 (round 20) ...[9724] Starting test3 (round 20) ...[9723] Starting test2 (round 20) ...[9726] Starting test5 (round 20) ...[9727] Starting test6 (round 20) ...[9728] Starting test7 (round 20) ...[9729] Starting test8 (round 20) ...[9730] Starting test9 (round 20) ...[9731] Starting test10 (round 20) ...[9732] Starting test11 (round 20) ...[9733] Starting test12 (round 20) ...[9734] Starting test13 (round 20) ...[9735] Starting test14 (round 20) ...[9736] Starting test15 (round 20) ...[9737] Starting test16 (round 20) ...[9738] Starting test17 (round 20) ...[9741] Starting test20 (round 20) ...[9739] Starting test18 (round 20) ...[9740] Starting test19 (round 20) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test12libvir: error : An error occurred, but the cause is unknown Unable to start domain test14libvir: error : An error occurred, but the cause is unknown Unable to start domain test7libvir: error : An error occurred, but the cause is unknown Unable to start domain test1libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test12[9733] Destroying test12 ...[9728] Destroying test7 ...[9735] Destroying test14 ...[9733] Starting test12 (round 19) ...libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test7[9728] Starting test7 (round 19) ...libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test14[9735] Starting test14 (round 19) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test18libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test1[9722] Destroying test1 ...[9722] Starting test1 (round 19) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test8libvir: error : An error occurred, but the cause is unknown Unable to start domain test15[9725] Done
I have my own parallel start/destroy test case that I ran during development of the threading code to test it. It caught quite a few bugs, and I had it run 8 threads for approximately 80,000 start/stops in total. I'm not seeing any errors running it with current GIT though. I've also tested your demo program with 8 threads / VMs, and it succeeeded without any problems. Appending my demo program - you don't need to do any setup to run it, it will auto-create the storage & use transient guests Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 02/27/13 15:26, Michal Privoznik wrote:
On 27.02.2013 12:20, Daniel Veillard wrote:
I have just tagged git and pushed the tarball. The rpms for F17 are at the usual place too: ftp://libvirt.org/libvirt/
I gave a try to the set of rpms and this seems to work fine for relatively simple tests. Please give it more testing and let's keep change to git to bug fixes to minimize risks for 1.0.3. If everything goes well I will push 1.0.3 on Friday,
thanks !
Daniel
Maybe we want to postpone the release a bit; I was trying to find out how drop of qemu driver lock speeds up parallel startup of multiple domains, however - it is not working.
I've created small test program - test4.c and run it against current HEAD. It takes arguments - domain names - which it spawns a separate thread per each domain and repeatedly starts and destroys domains. See code for more.
Then, I've created 20 copies of diskless domain (attached as well): test1..test20 and run my test4 program over them:
./test4 $(for ((i=1; i<21; i++)); do echo -n "test$i "; done)
I had to tweak max_client_requests=20 to really run those 20 requests in parallel.
However, what I am getting instead of nice measuring of speedup is:
[9722] Starting test1 (round 20) ...[9725] Starting test4 (round 20) ...[9724] Starting test3 (round 20) ...[9723] Starting test2 (round 20) ...[9726] Starting test5 (round 20) ...[9727] Starting test6 (round 20) ...[9728] Starting test7 (round 20) ...[9729] Starting test8 (round 20) ...[9730] Starting test9 (round 20) ...[9731] Starting test10 (round 20) ...[9732] Starting test11 (round 20) ...[9733] Starting test12 (round 20) ...[9734] Starting test13 (round 20) ...[9735] Starting test14 (round 20) ...[9736] Starting test15 (round 20) ...[9737] Starting test16 (round 20) ...[9738] Starting test17 (round 20) ...[9741] Starting test20 (round 20) ...[9739] Starting test18 (round 20) ...[9740] Starting test19 (round 20) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test12libvir: error : An error occurred, but the cause is unknown Unable to start domain test14libvir: error : An error occurred, but the cause is unknown Unable to start domain test7libvir: error : An error occurred, but the cause is unknown Unable to start domain test1libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test12[9733] Destroying test12 ...[9728] Destroying test7 ...[9735] Destroying test14 ...[9733] Starting test12 (round 19) ...libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test7[9728] Starting test7 (round 19) ...libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test14[9735] Starting test14 (round 19) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test18libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test1[9722] Destroying test1 ...[9722] Starting test1 (round 19) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test8libvir: error : An error occurred, but the cause is unknown Unable to start domain test15[9725] Done
I'm hitting this issue when I start the guest with the network interface in the original XML. If I delete it or change it to a regular (non-vepa) then everything works well. So the impact of this problem isn't that critical as originally reported. Peter

On Thu, Feb 28, 2013 at 11:00:28AM +0100, Peter Krempa wrote:
On 02/27/13 15:26, Michal Privoznik wrote:
On 27.02.2013 12:20, Daniel Veillard wrote:
I have just tagged git and pushed the tarball. The rpms for F17 are at the usual place too: ftp://libvirt.org/libvirt/
I gave a try to the set of rpms and this seems to work fine for relatively simple tests. Please give it more testing and let's keep change to git to bug fixes to minimize risks for 1.0.3. If everything goes well I will push 1.0.3 on Friday,
thanks !
Daniel
Maybe we want to postpone the release a bit; I was trying to find out how drop of qemu driver lock speeds up parallel startup of multiple domains, however - it is not working.
I've created small test program - test4.c and run it against current HEAD. It takes arguments - domain names - which it spawns a separate thread per each domain and repeatedly starts and destroys domains. See code for more.
Then, I've created 20 copies of diskless domain (attached as well): test1..test20 and run my test4 program over them:
./test4 $(for ((i=1; i<21; i++)); do echo -n "test$i "; done)
I had to tweak max_client_requests=20 to really run those 20 requests in parallel.
However, what I am getting instead of nice measuring of speedup is:
[9722] Starting test1 (round 20) ...[9725] Starting test4 (round 20) ...[9724] Starting test3 (round 20) ...[9723] Starting test2 (round 20) ...[9726] Starting test5 (round 20) ...[9727] Starting test6 (round 20) ...[9728] Starting test7 (round 20) ...[9729] Starting test8 (round 20) ...[9730] Starting test9 (round 20) ...[9731] Starting test10 (round 20) ...[9732] Starting test11 (round 20) ...[9733] Starting test12 (round 20) ...[9734] Starting test13 (round 20) ...[9735] Starting test14 (round 20) ...[9736] Starting test15 (round 20) ...[9737] Starting test16 (round 20) ...[9738] Starting test17 (round 20) ...[9741] Starting test20 (round 20) ...[9739] Starting test18 (round 20) ...[9740] Starting test19 (round 20) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test12libvir: error : An error occurred, but the cause is unknown Unable to start domain test14libvir: error : An error occurred, but the cause is unknown Unable to start domain test7libvir: error : An error occurred, but the cause is unknown Unable to start domain test1libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test12[9733] Destroying test12 ...[9728] Destroying test7 ...[9735] Destroying test14 ...[9733] Starting test12 (round 19) ...libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test7[9728] Starting test7 (round 19) ...libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test14[9735] Starting test14 (round 19) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test18libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test1[9722] Destroying test1 ...[9722] Starting test1 (round 19) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test8libvir: error : An error occurred, but the cause is unknown Unable to start domain test15[9725] Done
I'm hitting this issue when I start the guest with the network interface in the original XML. If I delete it or change it to a regular (non-vepa) then everything works well. So the impact of this problem isn't that critical as originally reported.
That is still critical enough that we must fix it before release IMHO Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Wed, Feb 27, 2013 at 03:26:24PM +0100, Michal Privoznik wrote:
On 27.02.2013 12:20, Daniel Veillard wrote:
I have just tagged git and pushed the tarball. The rpms for F17 are at the usual place too: ftp://libvirt.org/libvirt/
I gave a try to the set of rpms and this seems to work fine for relatively simple tests. Please give it more testing and let's keep change to git to bug fixes to minimize risks for 1.0.3. If everything goes well I will push 1.0.3 on Friday,
thanks !
Daniel
Maybe we want to postpone the release a bit; I was trying to find out how drop of qemu driver lock speeds up parallel startup of multiple domains, however - it is not working.
I've created small test program - test4.c and run it against current HEAD. It takes arguments - domain names - which it spawns a separate thread per each domain and repeatedly starts and destroys domains. See code for more.
Then, I've created 20 copies of diskless domain (attached as well): test1..test20 and run my test4 program over them:
./test4 $(for ((i=1; i<21; i++)); do echo -n "test$i "; done)
I had to tweak max_client_requests=20 to really run those 20 requests in parallel.
However, what I am getting instead of nice measuring of speedup is:
[9722] Starting test1 (round 20) ...[9725] Starting test4 (round 20) ...[9724] Starting test3 (round 20) ...[9723] Starting test2 (round 20) ...[9726] Starting test5 (round 20) ...[9727] Starting test6 (round 20) ...[9728] Starting test7 (round 20) ...[9729] Starting test8 (round 20) ...[9730] Starting test9 (round 20) ...[9731] Starting test10 (round 20) ...[9732] Starting test11 (round 20) ...[9733] Starting test12 (round 20) ...[9734] Starting test13 (round 20) ...[9735] Starting test14 (round 20) ...[9736] Starting test15 (round 20) ...[9737] Starting test16 (round 20) ...[9738] Starting test17 (round 20) ...[9741] Starting test20 (round 20) ...[9739] Starting test18 (round 20) ...[9740] Starting test19 (round 20) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test12libvir: error : An error occurred, but the cause is unknown Unable to start domain test14libvir: error : An error occurred, but the cause is unknown Unable to start domain test7libvir: error : An error occurred, but the cause is unknown Unable to start domain test1libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test12[9733] Destroying test12 ...[9728] Destroying test7 ...[9735] Destroying test14 ...[9733] Starting test12 (round 19) ...libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test7[9728] Starting test7 (round 19) ...libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test14[9735] Starting test14 (round 19) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test18libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test1[9722] Destroying test1 ...[9722] Starting test1 (round 19) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test8libvir: error : An error occurred, but the cause is unknown Unable to start domain test15[9725] Done
Do you have corresponding logs from libvirtd, and / or is there anything in /var/log/libvirt/qemu/$GUEST.log Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 28.02.2013 13:42, Daniel P. Berrange wrote:
On Wed, Feb 27, 2013 at 03:26:24PM +0100, Michal Privoznik wrote:
On 27.02.2013 12:20, Daniel Veillard wrote:
I have just tagged git and pushed the tarball. The rpms for F17 are at the usual place too: ftp://libvirt.org/libvirt/
I gave a try to the set of rpms and this seems to work fine for relatively simple tests. Please give it more testing and let's keep change to git to bug fixes to minimize risks for 1.0.3. If everything goes well I will push 1.0.3 on Friday,
thanks !
Daniel
Maybe we want to postpone the release a bit; I was trying to find out how drop of qemu driver lock speeds up parallel startup of multiple domains, however - it is not working.
I've created small test program - test4.c and run it against current HEAD. It takes arguments - domain names - which it spawns a separate thread per each domain and repeatedly starts and destroys domains. See code for more.
Then, I've created 20 copies of diskless domain (attached as well): test1..test20 and run my test4 program over them:
./test4 $(for ((i=1; i<21; i++)); do echo -n "test$i "; done)
I had to tweak max_client_requests=20 to really run those 20 requests in parallel.
However, what I am getting instead of nice measuring of speedup is:
[9722] Starting test1 (round 20) ...[9725] Starting test4 (round 20) ...[9724] Starting test3 (round 20) ...[9723] Starting test2 (round 20) ...[9726] Starting test5 (round 20) ...[9727] Starting test6 (round 20) ...[9728] Starting test7 (round 20) ...[9729] Starting test8 (round 20) ...[9730] Starting test9 (round 20) ...[9731] Starting test10 (round 20) ...[9732] Starting test11 (round 20) ...[9733] Starting test12 (round 20) ...[9734] Starting test13 (round 20) ...[9735] Starting test14 (round 20) ...[9736] Starting test15 (round 20) ...[9737] Starting test16 (round 20) ...[9738] Starting test17 (round 20) ...[9741] Starting test20 (round 20) ...[9739] Starting test18 (round 20) ...[9740] Starting test19 (round 20) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test12libvir: error : An error occurred, but the cause is unknown Unable to start domain test14libvir: error : An error occurred, but the cause is unknown Unable to start domain test7libvir: error : An error occurred, but the cause is unknown Unable to start domain test1libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test12[9733] Destroying test12 ...[9728] Destroying test7 ...[9735] Destroying test14 ...[9733] Starting test12 (round 19) ...libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test7[9728] Starting test7 (round 19) ...libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test14[9735] Starting test14 (round 19) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test18libvir: QEMU Driver error : Requested operation is not valid: domain is not running Unable to destroy domain test1[9722] Destroying test1 ...[9722] Starting test1 (round 19) ...libvir: error : An error occurred, but the cause is unknown Unable to start domain test8libvir: error : An error occurred, but the cause is unknown Unable to start domain test15[9725] Done
Do you have corresponding logs from libvirtd, and / or is there anything in /var/log/libvirt/qemu/$GUEST.log
Daniel
Jirka uploaded them: http://people.redhat.com/jdenemar/libvirt/ Michal

On 27/02/2013, at 11:20 AM, Daniel Veillard wrote:
I have just tagged git and pushed the tarball. The rpms for F17 are at the usual place too: ftp://libvirt.org/libvirt/
I gave a try to the set of rpms and this seems to work fine for relatively simple tests. Please give it more testing and let's keep change to git to bug fixes to minimize risks for 1.0.3. If everything goes well I will push 1.0.3 on Friday,
Seems to compile ok on OSX 10.7, and some basic virsh stuff is also working. Not tested in any depth though. ;> + Justin -- Open Source and Standards @ Red Hat twitter.com/realjustinclift

Hey, On Wed, Feb 27, 2013 at 07:20:56PM +0800, Daniel Veillard wrote:
I have just tagged git and pushed the tarball. The rpms for F17 are at the usual place too: ftp://libvirt.org/libvirt/
I gave a try to the set of rpms and this seems to work fine for relatively simple tests. Please give it more testing and let's keep change to git to bug fixes to minimize risks for 1.0.3. If everything goes well I will push 1.0.3 on Friday,
I've found a pretty bad issue with rc2/git head and Boxes, creating new boxes fails with 2013-03-02 14:04:22.028+0000: 15681: error : qemuDomainCheckDiskPresence:1837 : cannot access file '/home/teuf/isos/msdn/win7/fr_windows_7_home_premium_n_with_sp1_x64_dvd_u_676833.iso': Opération non permise Looking more into it, this seems to be caused by commit f506a4c115c44003455cb956861836a46425f97b Date: Thu Jan 31 13:18:45 2013 -0500 util: make virSetUIDGID a NOP only when uid or gid is -1 qemuDomainCheckDiskPresence calls virFileAccessibleAs with (user, group) being (0, 0) as Boxes is using qemu:///session (these are set to 0 by virQEMUDriverConfigNew in the session case). virFileAccessibleAs calls virSetUIDGID(0, 0) which used to be a noop before the commit above, but is now trying to call setreuid/setregid, which fails. I tried reverting this patch, but then probing qemu binaries during libvirtd startup fails so not a good workaround :) The patch below works for me, but it needs careful review as this is code I don't know at all and I've only lightly tested it. This issue is a 1.0.3 blocker as far as Boxes is concerned. Thanks, Christophe From 6474eb9dc04dcaa29450116ddfb76aefdaffd4f6 Mon Sep 17 00:00:00 2001 From: Christophe Fergeau <cfergeau@redhat.com> Date: Sat, 2 Mar 2013 15:19:47 +0100 Subject: [PATCH] qemu: Use -1 as unpriviledged uid/gid Commit f506a4c1 changed virSetUIDGID() to be a noop when uid/gid are -1, while it used to be a noop when they are <= 0. The changes in this commit broke creating new VMs in GNOME Boxes as qemuDomainCheckDiskPresence gets called during domain creation/startup, which in turn calls virFileAccessibleAs which fails after calling virSetUIDGID(0, 0) (Boxes uses session libvirtd). This commit changes virQEMUDriverConfigNew to use -1 as the unpriviledged uid/gid, and adjusts one code path that expected 0 in this case. I've also looked at the various places where cfg->user is used, and they all seem to handle -1 correctly. --- src/qemu/qemu_conf.c | 4 ++-- src/qemu/qemu_domain.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index f3b09cf..3ef3499 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -129,8 +129,8 @@ virQEMUDriverConfigPtr virQEMUDriverConfigNew(bool privileged) if (virGetGroupID(QEMU_GROUP, &cfg->group) < 0) goto error; } else { - cfg->user = 0; - cfg->group = 0; + cfg->user = (uid_t)-1; + cfg->group = (gid_t)-1; } cfg->dynamicOwnership = privileged; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 0e56596..1ecc8fa 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1301,8 +1301,8 @@ void qemuDomainObjCheckTaint(virQEMUDriverPtr driver, if (cfg->privileged && (!cfg->clearEmulatorCapabilities || - cfg->user == 0 || - cfg->group == 0)) + cfg->user == (uid_t)-1 || + cfg->group == (gid_t)-1)) qemuDomainObjTaint(driver, obj, VIR_DOMAIN_TAINT_HIGH_PRIVILEGES, logFD); if (obj->def->namespaceData) { -- 1.8.1.2

On Sat, Mar 02, 2013 at 03:30:38PM +0100, Christophe Fergeau wrote:
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 0e56596..1ecc8fa 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1301,8 +1301,8 @@ void qemuDomainObjCheckTaint(virQEMUDriverPtr driver,
if (cfg->privileged && (!cfg->clearEmulatorCapabilities || - cfg->user == 0 || - cfg->group == 0)) + cfg->user == (uid_t)-1 || + cfg->group == (gid_t)-1))
This hunk actually seems unneeded, maybe even harmful as this 0 check only triggers on system libvirtd, while this patch sets user/group to be -1 in the session (unpriviledged) libvirtd case. Christophe

On Sat, Mar 2, 2013 at 8:30 AM, Christophe Fergeau <cfergeau@redhat.com> wrote:
Hey,
On Wed, Feb 27, 2013 at 07:20:56PM +0800, Daniel Veillard wrote:
I have just tagged git and pushed the tarball. The rpms for F17 are at the usual place too: ftp://libvirt.org/libvirt/
I gave a try to the set of rpms and this seems to work fine for relatively simple tests. Please give it more testing and let's keep change to git to bug fixes to minimize risks for 1.0.3. If everything goes well I will push 1.0.3 on Friday,
I've found a pretty bad issue with rc2/git head and Boxes, creating new boxes fails with 2013-03-02 14:04:22.028+0000: 15681: error : qemuDomainCheckDiskPresence:1837 : cannot access file '/home/teuf/isos/msdn/win7/fr_windows_7_home_premium_n_with_sp1_x64_dvd_u_676833.iso': Opération non permise
Looking more into it, this seems to be caused by
commit f506a4c115c44003455cb956861836a46425f97b Date: Thu Jan 31 13:18:45 2013 -0500
util: make virSetUIDGID a NOP only when uid or gid is -1
qemuDomainCheckDiskPresence calls virFileAccessibleAs with (user, group) being (0, 0) as Boxes is using qemu:///session (these are set to 0 by virQEMUDriverConfigNew in the session case).
virFileAccessibleAs calls virSetUIDGID(0, 0) which used to be a noop before the commit above, but is now trying to call setreuid/setregid, which fails.
I tried reverting this patch, but then probing qemu binaries during libvirtd startup fails so not a good workaround :)
The patch below works for me, but it needs careful review as this is code I don't know at all and I've only lightly tested it. This issue is a 1.0.3 blocker as far as Boxes is concerned.
Thanks,
Christophe
From 6474eb9dc04dcaa29450116ddfb76aefdaffd4f6 Mon Sep 17 00:00:00 2001 From: Christophe Fergeau <cfergeau@redhat.com> Date: Sat, 2 Mar 2013 15:19:47 +0100 Subject: [PATCH] qemu: Use -1 as unpriviledged uid/gid
Commit f506a4c1 changed virSetUIDGID() to be a noop when uid/gid are -1, while it used to be a noop when they are <= 0.
The changes in this commit broke creating new VMs in GNOME Boxes as qemuDomainCheckDiskPresence gets called during domain creation/startup, which in turn calls virFileAccessibleAs which fails after calling virSetUIDGID(0, 0) (Boxes uses session libvirtd).
This commit changes virQEMUDriverConfigNew to use -1 as the unpriviledged uid/gid, and adjusts one code path that expected 0 in this case. I've also looked at the various places where cfg->user is used, and they all seem to handle -1 correctly. --- src/qemu/qemu_conf.c | 4 ++-- src/qemu/qemu_domain.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index f3b09cf..3ef3499 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -129,8 +129,8 @@ virQEMUDriverConfigPtr virQEMUDriverConfigNew(bool privileged) if (virGetGroupID(QEMU_GROUP, &cfg->group) < 0) goto error; } else { - cfg->user = 0; - cfg->group = 0; + cfg->user = (uid_t)-1; + cfg->group = (gid_t)-1; } cfg->dynamicOwnership = privileged;
I'll agree this change should fix it from a code inspection. So ACK this hunk. Really starting to think we need some tests for this. Given the late phase in the 1.0.3 release cycle (post freeze), do you have a specific test case I can use to verify this?
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 0e56596..1ecc8fa 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1301,8 +1301,8 @@ void qemuDomainObjCheckTaint(virQEMUDriverPtr driver,
if (cfg->privileged && (!cfg->clearEmulatorCapabilities || - cfg->user == 0 || - cfg->group == 0)) + cfg->user == (uid_t)-1 || + cfg->group == (gid_t)-1)) qemuDomainObjTaint(driver, obj, VIR_DOMAIN_TAINT_HIGH_PRIVILEGES, logFD);
if (obj->def->namespaceData) { -- 1.8.1.2
You self NACK'd this part. But I'll agree with the NACK here. This will actually break the checks here for the taint. In fact this code chunk shows that using 0 for user & group to mean don't care was wrong. -- Doug Goldstein

On Sat, Mar 02, 2013 at 12:30:10PM -0600, Doug Goldstein wrote:
On Sat, Mar 2, 2013 at 8:30 AM, Christophe Fergeau <cfergeau@redhat.com> wrote:
diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index f3b09cf..3ef3499 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -129,8 +129,8 @@ virQEMUDriverConfigPtr virQEMUDriverConfigNew(bool privileged) if (virGetGroupID(QEMU_GROUP, &cfg->group) < 0) goto error; } else { - cfg->user = 0; - cfg->group = 0; + cfg->user = (uid_t)-1; + cfg->group = (gid_t)-1; } cfg->dynamicOwnership = privileged;
I'll agree this change should fix it from a code inspection. So ACK this hunk. Really starting to think we need some tests for this. Given the late phase in the 1.0.3 release cycle (post freeze), do you have a specific test case I can use to verify this?
My test case was running GNOME Boxes and seeing it fail ;) I've just extracted a more minimal test: <domain type='kvm' id='3'> <name>win7-test</name> <uuid>3e660fc9-fe67-950b-019f-1aab94e4abb1</uuid> <title>Microsoft Windows 7 2</title> <memory unit='KiB'>2097152</memory> <os> <type arch='x86_64' machine='pc-1.2'>hvm</type> <boot dev='cdrom'/> </os> <devices> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='/home/teuf/isos/msdn/win7/fr_windows_7_home_premium_n_with_sp1_x64_dvd_u_676833.iso' startupPolicy='mandatory'/> <target dev='hdc' bus='ide'/> <readonly/> </disk> </devices> </domain> which fails to start without this patch, and starts correctly with it. Most of this is boilerplate (some parts are probably even optional), the important bit is the <source file='someiso.iso' startupPolicy='mandatory'/> Christophe

On 03/02/2013 01:30 PM, Doug Goldstein wrote:
On Sat, Mar 2, 2013 at 8:30 AM, Christophe Fergeau <cfergeau@redhat.com> wrote:
Hey,
On Wed, Feb 27, 2013 at 07:20:56PM +0800, Daniel Veillard wrote:
I have just tagged git and pushed the tarball. The rpms for F17 are at the usual place too: ftp://libvirt.org/libvirt/
I gave a try to the set of rpms and this seems to work fine for relatively simple tests. Please give it more testing and let's keep change to git to bug fixes to minimize risks for 1.0.3. If everything goes well I will push 1.0.3 on Friday, I've found a pretty bad issue with rc2/git head and Boxes, creating new boxes fails with 2013-03-02 14:04:22.028+0000: 15681: error : qemuDomainCheckDiskPresence:1837 : cannot access file '/home/teuf/isos/msdn/win7/fr_windows_7_home_premium_n_with_sp1_x64_dvd_u_676833.iso': Opération non permise
Looking more into it, this seems to be caused by
commit f506a4c115c44003455cb956861836a46425f97b Date: Thu Jan 31 13:18:45 2013 -0500
util: make virSetUIDGID a NOP only when uid or gid is -1
qemuDomainCheckDiskPresence calls virFileAccessibleAs with (user, group) being (0, 0) as Boxes is using qemu:///session (these are set to 0 by virQEMUDriverConfigNew in the session case).
virFileAccessibleAs calls virSetUIDGID(0, 0) which used to be a noop before the commit above, but is now trying to call setreuid/setregid, which fails.
I tried reverting this patch, but then probing qemu binaries during libvirtd startup fails so not a good workaround :)
The patch below works for me, but it needs careful review as this is code I don't know at all and I've only lightly tested it. This issue is a 1.0.3 blocker as far as Boxes is concerned.
Thanks,
Christophe
From 6474eb9dc04dcaa29450116ddfb76aefdaffd4f6 Mon Sep 17 00:00:00 2001 From: Christophe Fergeau <cfergeau@redhat.com> Date: Sat, 2 Mar 2013 15:19:47 +0100 Subject: [PATCH] qemu: Use -1 as unpriviledged uid/gid
Commit f506a4c1 changed virSetUIDGID() to be a noop when uid/gid are -1, while it used to be a noop when they are <= 0.
The changes in this commit broke creating new VMs in GNOME Boxes as qemuDomainCheckDiskPresence gets called during domain creation/startup, which in turn calls virFileAccessibleAs which fails after calling virSetUIDGID(0, 0) (Boxes uses session libvirtd).
This commit changes virQEMUDriverConfigNew to use -1 as the unpriviledged uid/gid, and adjusts one code path that expected 0 in this case. I've also looked at the various places where cfg->user is used, and they all seem to handle -1 correctly. --- src/qemu/qemu_conf.c | 4 ++-- src/qemu/qemu_domain.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index f3b09cf..3ef3499 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -129,8 +129,8 @@ virQEMUDriverConfigPtr virQEMUDriverConfigNew(bool privileged) if (virGetGroupID(QEMU_GROUP, &cfg->group) < 0) goto error; } else { - cfg->user = 0; - cfg->group = 0; + cfg->user = (uid_t)-1; + cfg->group = (gid_t)-1; } cfg->dynamicOwnership = privileged;
I'll agree this change should fix it from a code inspection. So ACK this hunk. Really starting to think we need some tests for this. Given the late phase in the 1.0.3 release cycle (post freeze), do you have a specific test case I can use to verify this?
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 0e56596..1ecc8fa 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1301,8 +1301,8 @@ void qemuDomainObjCheckTaint(virQEMUDriverPtr driver,
if (cfg->privileged && (!cfg->clearEmulatorCapabilities || - cfg->user == 0 || - cfg->group == 0)) + cfg->user == (uid_t)-1 || + cfg->group == (gid_t)-1)) qemuDomainObjTaint(driver, obj, VIR_DOMAIN_TAINT_HIGH_PRIVILEGES, logFD);
if (obj->def->namespaceData) { -- 1.8.1.2 You self NACK'd this part. But I'll agree with the NACK here. This will actually break the checks here for the taint. In fact this code chunk shows that using 0 for user & group to mean don't care was wrong.
As the author of the patch that uncovered/caused the breakage, I'll second the ACK on the first hunk and NACK on the second.

On Sat, Mar 02, 2013 at 04:54:34PM -0500, Laine Stump wrote:
As the author of the patch that uncovered/caused the breakage, I'll second the ACK on the first hunk and NACK on the second.
I've now pushed this first hunk to master with a slightly improved commit log. Thanks Doug and Laine for the quick review! Christophe

On 03/02/2013 09:30 AM, Christophe Fergeau wrote:
Hey,
On Wed, Feb 27, 2013 at 07:20:56PM +0800, Daniel Veillard wrote:
I have just tagged git and pushed the tarball. The rpms for F17 are at the usual place too: ftp://libvirt.org/libvirt/
I gave a try to the set of rpms and this seems to work fine for relatively simple tests. Please give it more testing and let's keep change to git to bug fixes to minimize risks for 1.0.3. If everything goes well I will push 1.0.3 on Friday, I've found a pretty bad issue with rc2/git head and Boxes, creating new boxes fails with 2013-03-02 14:04:22.028+0000: 15681: error : qemuDomainCheckDiskPresence:1837 : cannot access file '/home/teuf/isos/msdn/win7/fr_windows_7_home_premium_n_with_sp1_x64_dvd_u_676833.iso': Opération non permise
Looking more into it, this seems to be caused by
commit f506a4c115c44003455cb956861836a46425f97b Date: Thu Jan 31 13:18:45 2013 -0500
util: make virSetUIDGID a NOP only when uid or gid is -1
Yep, I had originally been careful to maintain the previous "0 means ignore" behavior, but was convinced that, even if it temporarily uncovered some bugs, it was worth it in the long run to do it correctly. I had expected there would be some fallout that wasn't uncovered immediately (I thought it had all been revealed earlier when Doug and Guido found/fixed a similar problem with session mode libvirt, though. Sigh.)
participants (11)
-
Christophe Fergeau
-
Daniel P. Berrange
-
Daniel Veillard
-
Doug Goldstein
-
Eric Blake
-
Justin Clift
-
Laine Stump
-
Li Zhang
-
Michal Privoznik
-
Peter Krempa
-
Viktor Mihajlovski