[libvirt] Setting devid for emulated NICs (Xen 4.3.1 / libvirt 1.2.0) using libxl driver

Using virt-manager "hypervisor default" type: <interface type='bridge'> <mac address='00:16:3e:5e:09:9d'/> <source bridge='br0'/> <script path='vif-bridge'/> </interface> This causes the qemu call to have "-net none" which removes PXE boot abilities. A linux kernel has network through the xen pv-driver. Changing in virt-manager to "e1000" type: <interface type='bridge'> <mac address='00:16:3e:5e:09:9d'/> <source bridge='br0'/> <script path='vif-bridge'/> <model type='e1000'/> </interface> This currently (Xen 4.3.1 + libvirt 1.2.0) fails and the qemu arguments in the log look suspicious: -device e1000,id=nic-1,netdev=net-1,mac=00:16:3e:5e:09:9d -netdev type=tap,id=net-1,ifname=vif1.-1-emu,script=no,downscript=no Looking through git I found ba64b97134a6129a48684f22f31be92c3b6eef96 libxl: Allow libxl to set NIC devid which removes a line that sets the devid of new NICs. I assume the comment says (sorry I always seem to get confused reading it) that devid should be auto-set by the libxl library provided by Xen. But I could not find where that would be done. Even the xl command implementation sets it explicitly. libxl_device_nic_init sets it to -1 which would explain the wrong qemu arguments above. For testing I re-added the following after the libxlMakeNic call: --- libvirt-1.2.0.orig/src/libxl/libxl_conf.c 2013-12-11 17:04:17.000000000 +0 +++ libvirt-1.2.0/src/libxl/libxl_conf.c 2013-12-16 19:08:46.830016646 +0 @@ -907,6 +907,8 @@ libxlMakeNicList(virDomainDefPtr def, l for (i = 0; i < nnics; i++) { if (libxlMakeNic(l_nics[i], &x_nics[i])) goto error; + if (x_nics[i].devid < 0) + x_nics[i].devid = i; } d_config->nics = x_nics; And with that I get a working PXE boot when I set the device type in the config. (Side note that I think not using a type defaults to rtl8139 in the old xend driver. Maybe this should be the same for the xl driver). I am not sure how the right fix for it should look like as I am unsure which part was supposed to set the devid. Just right now it does not seem to be done. -Stefan

On Tue, 2013-12-17 at 17:34 +0100, Stefan Bader wrote:
Using virt-manager "hypervisor default" type:
<interface type='bridge'> <mac address='00:16:3e:5e:09:9d'/> <source bridge='br0'/> <script path='vif-bridge'/> </interface>
This causes the qemu call to have "-net none" which removes PXE boot abilities. A linux kernel has network through the xen pv-driver.
Changing in virt-manager to "e1000" type:
<interface type='bridge'> <mac address='00:16:3e:5e:09:9d'/> <source bridge='br0'/> <script path='vif-bridge'/> <model type='e1000'/> </interface>
This currently (Xen 4.3.1 + libvirt 1.2.0) fails and the qemu arguments in the log look suspicious: -device e1000,id=nic-1,netdev=net-1,mac=00:16:3e:5e:09:9d -netdev type=tap,id=net-1,ifname=vif1.-1-emu,script=no,downscript=no
Looking through git I found ba64b97134a6129a48684f22f31be92c3b6eef96 libxl: Allow libxl to set NIC devid
Is that a libvirt commit? Not seeing it in xen.git. Might this libxl fix be relevant: commit 5420f26507fc5c9853eb1076401a8658d72669da Author: Jim Fehlig <jfehlig@suse.com> Date: Fri Jan 11 12:22:26 2013 +0000 libxl: Set vfb and vkb devid if not done so by the caller Other devices set a sensible devid if the caller has not done so. Do the same for vfb and vkb. While at it, factor out the common code used to determine a sensible devid, so it can be used by other libxl__device_*_add functions. Signed-off-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com> and a follow up in dfeccbeaa. Although the comment implies that nic's were already correctly assigning a devid if the caller specified -1, so I don't know why it doesn't work for you :-(
which removes a line that sets the devid of new NICs. I assume the comment says (sorry I always seem to get confused reading it) that devid should be auto-set by the libxl library provided by Xen.
Correct.
But I could not find where that would be done.
I expect the above commits have pointed you to the right line, but the code should be quite near the top of libxl__device_nic_add in libxl.c
Even the xl command implementation sets it explicitly.
The caller is also allowed to do this if it cares what number is used.
libxl_device_nic_init sets it to -1 which would explain the wrong qemu arguments above.
Right, libxl_device_nic_init sets everything to "the defaults please" and __device_nic_add will instantiate any defaults which the application didn't supply.
For testing I re-added the following after the libxlMakeNic call:
--- libvirt-1.2.0.orig/src/libxl/libxl_conf.c 2013-12-11 17:04:17.000000000 +0 +++ libvirt-1.2.0/src/libxl/libxl_conf.c 2013-12-16 19:08:46.830016646 +0 @@ -907,6 +907,8 @@ libxlMakeNicList(virDomainDefPtr def, l for (i = 0; i < nnics; i++) { if (libxlMakeNic(l_nics[i], &x_nics[i])) goto error; + if (x_nics[i].devid < 0) + x_nics[i].devid = i; }
d_config->nics = x_nics;
And with that I get a working PXE boot when I set the device type in the config. (Side note that I think not using a type defaults to rtl8139 in the old xend driver. Maybe this should be the same for the xl driver). I am not sure how the right fix for it should look like as I am unsure which part was supposed to set the devid. Just right now it does not seem to be done.
-Stefan

On 17.12.2013 17:58, Ian Campbell wrote:
On Tue, 2013-12-17 at 17:34 +0100, Stefan Bader wrote:
Using virt-manager "hypervisor default" type:
<interface type='bridge'> <mac address='00:16:3e:5e:09:9d'/> <source bridge='br0'/> <script path='vif-bridge'/> </interface>
This causes the qemu call to have "-net none" which removes PXE boot abilities. A linux kernel has network through the xen pv-driver.
Changing in virt-manager to "e1000" type:
<interface type='bridge'> <mac address='00:16:3e:5e:09:9d'/> <source bridge='br0'/> <script path='vif-bridge'/> <model type='e1000'/> </interface>
This currently (Xen 4.3.1 + libvirt 1.2.0) fails and the qemu arguments in the log look suspicious: -device e1000,id=nic-1,netdev=net-1,mac=00:16:3e:5e:09:9d -netdev type=tap,id=net-1,ifname=vif1.-1-emu,script=no,downscript=no
Looking through git I found ba64b97134a6129a48684f22f31be92c3b6eef96 libxl: Allow libxl to set NIC devid
Is that a libvirt commit? Not seeing it in xen.git.
Oh, sorry, yes it is.
Might this libxl fix be relevant: commit 5420f26507fc5c9853eb1076401a8658d72669da Author: Jim Fehlig <jfehlig@suse.com> Date: Fri Jan 11 12:22:26 2013 +0000
libxl: Set vfb and vkb devid if not done so by the caller
Other devices set a sensible devid if the caller has not done so. Do the same for vfb and vkb. While at it, factor out the common code used to determine a sensible devid, so it can be used by other libxl__device_*_add functions.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
and a follow up in dfeccbeaa. Although the comment implies that nic's were already correctly assigning a devid if the caller specified -1, so I don't know why it doesn't work for you :-(
Ok, yes, the commit above indeed changes libxl__device_nic_add to call libxl__device_nextid for the devid... Just how is this actually called. Maybe not sufficient but "git grep libxl__device_nic_add" in the xen code only shows the definition and a declaration in libxl_internal.h to me...
which removes a line that sets the devid of new NICs. I assume the comment says (sorry I always seem to get confused reading it) that devid should be auto-set by the libxl library provided by Xen.
Correct.
But I could not find where that would be done.
I expect the above commits have pointed you to the right line, but the code should be quite near the top of libxl__device_nic_add in libxl.c
Even the xl command implementation sets it explicitly.
The caller is also allowed to do this if it cares what number is used.
libxl_device_nic_init sets it to -1 which would explain the wrong qemu arguments above.
Right, libxl_device_nic_init sets everything to "the defaults please" and __device_nic_add will instantiate any defaults which the application didn't supply.
For testing I re-added the following after the libxlMakeNic call:
--- libvirt-1.2.0.orig/src/libxl/libxl_conf.c 2013-12-11 17:04:17.000000000 +0 +++ libvirt-1.2.0/src/libxl/libxl_conf.c 2013-12-16 19:08:46.830016646 +0 @@ -907,6 +907,8 @@ libxlMakeNicList(virDomainDefPtr def, l for (i = 0; i < nnics; i++) { if (libxlMakeNic(l_nics[i], &x_nics[i])) goto error; + if (x_nics[i].devid < 0) + x_nics[i].devid = i; }
d_config->nics = x_nics;
And with that I get a working PXE boot when I set the device type in the config. (Side note that I think not using a type defaults to rtl8139 in the old xend driver. Maybe this should be the same for the xl driver). I am not sure how the right fix for it should look like as I am unsure which part was supposed to set the devid. Just right now it does not seem to be done.
-Stefan
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel

On Tue, 2013-12-17 at 18:32 +0100, Stefan Bader wrote:
Might this libxl fix be relevant: commit 5420f26507fc5c9853eb1076401a8658d72669da Author: Jim Fehlig <jfehlig@suse.com> Date: Fri Jan 11 12:22:26 2013 +0000
libxl: Set vfb and vkb devid if not done so by the caller
Other devices set a sensible devid if the caller has not done so. Do the same for vfb and vkb. While at it, factor out the common code used to determine a sensible devid, so it can be used by other libxl__device_*_add functions.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
and a follow up in dfeccbeaa. Although the comment implies that nic's were already correctly assigning a devid if the caller specified -1, so I don't know why it doesn't work for you :-(
Ok, yes, the commit above indeed changes libxl__device_nic_add to call libxl__device_nextid for the devid... Just how is this actually called. Maybe not sufficient but "git grep libxl__device_nic_add" in the xen code only shows the definition and a declaration in libxl_internal.h to me...
I have a feeling a macro might be involved... Here we go, look for DEFINE_DEVICE_REMOVE in libxl.c. We should really add the eventual function names in comments to provide grep fodder.... Ian.

On 18.12.2013 13:27, Ian Campbell wrote:
On Tue, 2013-12-17 at 18:32 +0100, Stefan Bader wrote:
Might this libxl fix be relevant: commit 5420f26507fc5c9853eb1076401a8658d72669da Author: Jim Fehlig <jfehlig@suse.com> Date: Fri Jan 11 12:22:26 2013 +0000
libxl: Set vfb and vkb devid if not done so by the caller
Other devices set a sensible devid if the caller has not done so. Do the same for vfb and vkb. While at it, factor out the common code used to determine a sensible devid, so it can be used by other libxl__device_*_add functions.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
and a follow up in dfeccbeaa. Although the comment implies that nic's were already correctly assigning a devid if the caller specified -1, so I don't know why it doesn't work for you :-(
Ok, yes, the commit above indeed changes libxl__device_nic_add to call libxl__device_nextid for the devid... Just how is this actually called. Maybe not sufficient but "git grep libxl__device_nic_add" in the xen code only shows the definition and a declaration in libxl_internal.h to me...
I have a feeling a macro might be involved...
Here we go, look for DEFINE_DEVICE_REMOVE in libxl.c. We should really add the eventual function names in comments to provide grep fodder....
Oh duh, yeah. So in DEFINE_DEVICE_ADD a libxl_device_nic_add is created which calls to libxl__device_nic_add. When I look for the single _ version I find a call from xl_cmdimpl.c and its public declaration in libxl.h. So I guess the bug is that libvirt in the libxl driver never seems to do so. Ok, thanks a lot for digging the out the DEFINE. As nice those are to create similar functions from template, grep and tags fail to be helpful with them. -Stefan
Ian.
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Wed, 2013-12-18 at 14:12 +0100, Stefan Bader wrote:
On 18.12.2013 13:27, Ian Campbell wrote:
On Tue, 2013-12-17 at 18:32 +0100, Stefan Bader wrote:
Might this libxl fix be relevant: commit 5420f26507fc5c9853eb1076401a8658d72669da Author: Jim Fehlig <jfehlig@suse.com> Date: Fri Jan 11 12:22:26 2013 +0000
libxl: Set vfb and vkb devid if not done so by the caller
Other devices set a sensible devid if the caller has not done so. Do the same for vfb and vkb. While at it, factor out the common code used to determine a sensible devid, so it can be used by other libxl__device_*_add functions.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
and a follow up in dfeccbeaa. Although the comment implies that nic's were already correctly assigning a devid if the caller specified -1, so I don't know why it doesn't work for you :-(
Ok, yes, the commit above indeed changes libxl__device_nic_add to call libxl__device_nextid for the devid... Just how is this actually called. Maybe not sufficient but "git grep libxl__device_nic_add" in the xen code only shows the definition and a declaration in libxl_internal.h to me...
I have a feeling a macro might be involved...
Here we go, look for DEFINE_DEVICE_REMOVE in libxl.c. We should really add the eventual function names in comments to provide grep fodder....
Oh duh, yeah. So in DEFINE_DEVICE_ADD a libxl_device_nic_add is created which calls to libxl__device_nic_add. When I look for the single _ version I find a call from xl_cmdimpl.c and its public declaration in libxl.h. So I guess the bug is that libvirt in the libxl driver never seems to do so
The macro creates libxl__add_nics which adds the nics from the libxl_domain_config->nics array. I don't think libvirt needs to call libxl_device_nic_add manually unless it is hotplugging a new nic at runtime.
. Ok, thanks a lot for digging the out the DEFINE. As nice those are to create similar functions from template, grep and tags fail to be helpful with them.
-Stefan
Ian.
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 18.12.2013 14:28, Ian Campbell wrote:
On Wed, 2013-12-18 at 14:12 +0100, Stefan Bader wrote:
On 18.12.2013 13:27, Ian Campbell wrote:
On Tue, 2013-12-17 at 18:32 +0100, Stefan Bader wrote:
Might this libxl fix be relevant: commit 5420f26507fc5c9853eb1076401a8658d72669da Author: Jim Fehlig <jfehlig@suse.com> Date: Fri Jan 11 12:22:26 2013 +0000
libxl: Set vfb and vkb devid if not done so by the caller
Other devices set a sensible devid if the caller has not done so. Do the same for vfb and vkb. While at it, factor out the common code used to determine a sensible devid, so it can be used by other libxl__device_*_add functions.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
and a follow up in dfeccbeaa. Although the comment implies that nic's were already correctly assigning a devid if the caller specified -1, so I don't know why it doesn't work for you :-(
Ok, yes, the commit above indeed changes libxl__device_nic_add to call libxl__device_nextid for the devid... Just how is this actually called. Maybe not sufficient but "git grep libxl__device_nic_add" in the xen code only shows the definition and a declaration in libxl_internal.h to me...
I have a feeling a macro might be involved...
Here we go, look for DEFINE_DEVICE_REMOVE in libxl.c. We should really add the eventual function names in comments to provide grep fodder....
Oh duh, yeah. So in DEFINE_DEVICE_ADD a libxl_device_nic_add is created which calls to libxl__device_nic_add. When I look for the single _ version I find a call from xl_cmdimpl.c and its public declaration in libxl.h. So I guess the bug is that libvirt in the libxl driver never seems to do so
The macro creates libxl__add_nics which adds the nics from the libxl_domain_config->nics array. I don't think libvirt needs to call libxl_device_nic_add manually unless it is hotplugging a new nic at runtime.
Bah, this is another one, DEFINE_DEVICES_ADD in libxl_devices.c, that does that. That seems to be used from a few places. That needs more research to understand the call paths. Probably simpler starting from the entry which looks to be libxl_domain_create_new from libvirt's side.
. Ok, thanks a lot for digging the out the DEFINE. As nice those are to create similar functions from template, grep and tags fail to be helpful with them.
-Stefan
Ian.
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 18.12.2013 14:28, Ian Campbell wrote:
On Wed, 2013-12-18 at 14:12 +0100, Stefan Bader wrote:
On 18.12.2013 13:27, Ian Campbell wrote:
On Tue, 2013-12-17 at 18:32 +0100, Stefan Bader wrote:
Might this libxl fix be relevant: commit 5420f26507fc5c9853eb1076401a8658d72669da Author: Jim Fehlig <jfehlig@suse.com> Date: Fri Jan 11 12:22:26 2013 +0000
libxl: Set vfb and vkb devid if not done so by the caller
Other devices set a sensible devid if the caller has not done so. Do the same for vfb and vkb. While at it, factor out the common code used to determine a sensible devid, so it can be used by other libxl__device_*_add functions.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
and a follow up in dfeccbeaa. Although the comment implies that nic's were already correctly assigning a devid if the caller specified -1, so I don't know why it doesn't work for you :-(
Ok, yes, the commit above indeed changes libxl__device_nic_add to call libxl__device_nextid for the devid... Just how is this actually called. Maybe not sufficient but "git grep libxl__device_nic_add" in the xen code only shows the definition and a declaration in libxl_internal.h to me...
I have a feeling a macro might be involved...
Here we go, look for DEFINE_DEVICE_REMOVE in libxl.c. We should really add the eventual function names in comments to provide grep fodder....
Oh duh, yeah. So in DEFINE_DEVICE_ADD a libxl_device_nic_add is created which calls to libxl__device_nic_add. When I look for the single _ version I find a call from xl_cmdimpl.c and its public declaration in libxl.h. So I guess the bug is that libvirt in the libxl driver never seems to do so
The macro creates libxl__add_nics which adds the nics from the libxl_domain_config->nics array. I don't think libvirt needs to call libxl_device_nic_add manually unless it is hotplugging a new nic at runtime.
Hm, so I think this is the path: libxl_domain_create_new -> do_domain_create -> initiate_domain_create -> libxl__bootloader_run (HVM domain, skipping bootloader) <- domcreate_bootloader_done -> domcreate_rebuild_done <- domcreate_launch_dm -> libxl__spawn_local_dm <- domcreate_devmodel_started In libxl__spawn_local_dm, there is the following loop: for (i = 0; i < d_config->num_nics; i++) { /* We have to init the nic here, because we still haven't * called libxl_device_nic_add at this point, but qemu needs * the nic information to be complete. */ ret = libxl__device_nic_setdefault(gc, &d_config->nics[i], domid); if (ret) goto error_out; } So I think when starting the dm, the devid just is not set as setdefault does not seem to do so. I would be done in the later domcreate_devmodel_started callback but that is too late for the generated qemu arguments. -Stefan

Stefan Bader wrote:
On 18.12.2013 14:28, Ian Campbell wrote:
On Wed, 2013-12-18 at 14:12 +0100, Stefan Bader wrote:
On 18.12.2013 13:27, Ian Campbell wrote:
On Tue, 2013-12-17 at 18:32 +0100, Stefan Bader wrote:
Might this libxl fix be relevant: commit 5420f26507fc5c9853eb1076401a8658d72669da Author: Jim Fehlig <jfehlig@suse.com> Date: Fri Jan 11 12:22:26 2013 +0000
libxl: Set vfb and vkb devid if not done so by the caller
Other devices set a sensible devid if the caller has not done so. Do the same for vfb and vkb. While at it, factor out the common code used to determine a sensible devid, so it can be used by other libxl__device_*_add functions.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
and a follow up in dfeccbeaa. Although the comment implies that nic's were already correctly assigning a devid if the caller specified -1, so I don't know why it doesn't work for you :-(
Ok, yes, the commit above indeed changes libxl__device_nic_add to call libxl__device_nextid for the devid... Just how is this actually called. Maybe not sufficient but "git grep libxl__device_nic_add" in the xen code only shows the definition and a declaration in libxl_internal.h to me...
I have a feeling a macro might be involved...
Here we go, look for DEFINE_DEVICE_REMOVE in libxl.c. We should really add the eventual function names in comments to provide grep fodder....
Oh duh, yeah. So in DEFINE_DEVICE_ADD a libxl_device_nic_add is created which calls to libxl__device_nic_add. When I look for the single _ version I find a call from xl_cmdimpl.c and its public declaration in libxl.h. So I guess the bug is that libvirt in the libxl driver never seems to do so
The macro creates libxl__add_nics which adds the nics from the libxl_domain_config->nics array. I don't think libvirt needs to call libxl_device_nic_add manually unless it is hotplugging a new nic at runtime.
Hm, so I think this is the path:
libxl_domain_create_new -> do_domain_create -> initiate_domain_create -> libxl__bootloader_run (HVM domain, skipping bootloader) <- domcreate_bootloader_done -> domcreate_rebuild_done <- domcreate_launch_dm -> libxl__spawn_local_dm <- domcreate_devmodel_started
In libxl__spawn_local_dm, there is the following loop:
for (i = 0; i < d_config->num_nics; i++) { /* We have to init the nic here, because we still haven't * called libxl_device_nic_add at this point, but qemu needs * the nic information to be complete. */ ret = libxl__device_nic_setdefault(gc, &d_config->nics[i], domid); if (ret) goto error_out; }
So I think when starting the dm, the devid just is not set as setdefault does not seem to do so. I would be done in the later domcreate_devmodel_started callback but that is too late for the generated qemu arguments.
Sorry for jumping in late... I stumbled across this problem just before openSUSE13.1 released and did a quick fix in libvirt https://build.opensuse.org/package/view_file/Virtualization:openSUSE13.1/lib... I removed setting the NIC devid in the libxl driver a while back to be consistent with other devices http://libvirt.org/git/?p=libvirt.git;a=commit;h=ba64b97134a6129a48684f22f31... The quick fix was to essentially revert the above commit until I could investigate further. Thank you for now having done that investigation :). Can the devid assignment logic be moved from libxl__device_nic_add() to libxl__device_nic_setdefault()? Regards, Jim

On Wed, 2013-12-18 at 17:44 -0700, Jim Fehlig wrote:
Stefan Bader wrote:
On 18.12.2013 14:28, Ian Campbell wrote:
On Wed, 2013-12-18 at 14:12 +0100, Stefan Bader wrote:
On 18.12.2013 13:27, Ian Campbell wrote:
On Tue, 2013-12-17 at 18:32 +0100, Stefan Bader wrote:
> Might this libxl fix be relevant: > commit 5420f26507fc5c9853eb1076401a8658d72669da > Author: Jim Fehlig <jfehlig@suse.com> > Date: Fri Jan 11 12:22:26 2013 +0000 > > libxl: Set vfb and vkb devid if not done so by the caller > > Other devices set a sensible devid if the caller has not done so. > Do the same for vfb and vkb. While at it, factor out the common code > used to determine a sensible devid, so it can be used by other > libxl__device_*_add functions. > > Signed-off-by: Jim Fehlig <jfehlig@suse.com> > Acked-by: Ian Campbell <ian.campbell@citrix.com> > Committed-by: Ian Campbell <ian.campbell@citrix.com> > > and a follow up in dfeccbeaa. Although the comment implies that nic's > were already correctly assigning a devid if the caller specified -1, so > I don't know why it doesn't work for you :-( > Ok, yes, the commit above indeed changes libxl__device_nic_add to call libxl__device_nextid for the devid... Just how is this actually called. Maybe not sufficient but "git grep libxl__device_nic_add" in the xen code only shows the definition and a declaration in libxl_internal.h to me...
I have a feeling a macro might be involved...
Here we go, look for DEFINE_DEVICE_REMOVE in libxl.c. We should really add the eventual function names in comments to provide grep fodder....
Oh duh, yeah. So in DEFINE_DEVICE_ADD a libxl_device_nic_add is created which calls to libxl__device_nic_add. When I look for the single _ version I find a call from xl_cmdimpl.c and its public declaration in libxl.h. So I guess the bug is that libvirt in the libxl driver never seems to do so
The macro creates libxl__add_nics which adds the nics from the libxl_domain_config->nics array. I don't think libvirt needs to call libxl_device_nic_add manually unless it is hotplugging a new nic at runtime.
Hm, so I think this is the path:
libxl_domain_create_new -> do_domain_create -> initiate_domain_create -> libxl__bootloader_run (HVM domain, skipping bootloader) <- domcreate_bootloader_done -> domcreate_rebuild_done <- domcreate_launch_dm -> libxl__spawn_local_dm <- domcreate_devmodel_started
In libxl__spawn_local_dm, there is the following loop:
for (i = 0; i < d_config->num_nics; i++) { /* We have to init the nic here, because we still haven't * called libxl_device_nic_add at this point, but qemu needs * the nic information to be complete. */ ret = libxl__device_nic_setdefault(gc, &d_config->nics[i], domid); if (ret) goto error_out; }
So I think when starting the dm, the devid just is not set as setdefault does not seem to do so. I would be done in the later domcreate_devmodel_started callback but that is too late for the generated qemu arguments.
Sorry for jumping in late...
I stumbled across this problem just before openSUSE13.1 released and did a quick fix in libvirt
https://build.opensuse.org/package/view_file/Virtualization:openSUSE13.1/lib...
I removed setting the NIC devid in the libxl driver a while back to be consistent with other devices
http://libvirt.org/git/?p=libvirt.git;a=commit;h=ba64b97134a6129a48684f22f31...
The quick fix was to essentially revert the above commit until I could investigate further. Thank you for now having done that investigation :). Can the devid assignment logic be moved from libxl__device_nic_add() to libxl__device_nic_setdefault()?
It certainly seems like it would be more natural to do it there. I suspect it might be done this way because at setdefault time you might be walking a list of nics none of which have been created yet -- so looking in xenstore would return "devid zero is free" for every one of them? How about we: * move the init to setdefault to catch the single NIC added via hotplug case * we add somewhere early in the domain create path a call to a function which assigns devids to an entire array of devices (and do it for all the different device types). Perhaps in initiate_domain_create() after the calls to libxl__domain_create_info_setdefault and libxl__domain_build_info_setdefault but before the loop calling libxl__device_disk_setdefault for the disks. * perhaps that same function should call setdefault too, after having assigned the device, rather than it being done later in an adhoc way? Does that sound at all plausible? Ian.

On 19.12.2013 11:19, Ian Campbell wrote:
On Wed, 2013-12-18 at 17:44 -0700, Jim Fehlig wrote:
Stefan Bader wrote:
On 18.12.2013 14:28, Ian Campbell wrote:
On Wed, 2013-12-18 at 14:12 +0100, Stefan Bader wrote:
On 18.12.2013 13:27, Ian Campbell wrote:
On Tue, 2013-12-17 at 18:32 +0100, Stefan Bader wrote:
>> Might this libxl fix be relevant: >> commit 5420f26507fc5c9853eb1076401a8658d72669da >> Author: Jim Fehlig <jfehlig@suse.com> >> Date: Fri Jan 11 12:22:26 2013 +0000 >> >> libxl: Set vfb and vkb devid if not done so by the caller >> >> Other devices set a sensible devid if the caller has not done so. >> Do the same for vfb and vkb. While at it, factor out the common code >> used to determine a sensible devid, so it can be used by other >> libxl__device_*_add functions. >> >> Signed-off-by: Jim Fehlig <jfehlig@suse.com> >> Acked-by: Ian Campbell <ian.campbell@citrix.com> >> Committed-by: Ian Campbell <ian.campbell@citrix.com> >> >> and a follow up in dfeccbeaa. Although the comment implies that nic's >> were already correctly assigning a devid if the caller specified -1, so >> I don't know why it doesn't work for you :-( >> > Ok, yes, the commit above indeed changes libxl__device_nic_add to call > libxl__device_nextid for the devid... Just how is this actually called. > Maybe not sufficient but "git grep libxl__device_nic_add" in the xen code only > shows the definition and a declaration in libxl_internal.h to me... > I have a feeling a macro might be involved...
Here we go, look for DEFINE_DEVICE_REMOVE in libxl.c. We should really add the eventual function names in comments to provide grep fodder....
Oh duh, yeah. So in DEFINE_DEVICE_ADD a libxl_device_nic_add is created which calls to libxl__device_nic_add. When I look for the single _ version I find a call from xl_cmdimpl.c and its public declaration in libxl.h. So I guess the bug is that libvirt in the libxl driver never seems to do so
The macro creates libxl__add_nics which adds the nics from the libxl_domain_config->nics array. I don't think libvirt needs to call libxl_device_nic_add manually unless it is hotplugging a new nic at runtime.
Hm, so I think this is the path:
libxl_domain_create_new -> do_domain_create -> initiate_domain_create -> libxl__bootloader_run (HVM domain, skipping bootloader) <- domcreate_bootloader_done -> domcreate_rebuild_done <- domcreate_launch_dm -> libxl__spawn_local_dm <- domcreate_devmodel_started
In libxl__spawn_local_dm, there is the following loop:
for (i = 0; i < d_config->num_nics; i++) { /* We have to init the nic here, because we still haven't * called libxl_device_nic_add at this point, but qemu needs * the nic information to be complete. */ ret = libxl__device_nic_setdefault(gc, &d_config->nics[i], domid); if (ret) goto error_out; }
So I think when starting the dm, the devid just is not set as setdefault does not seem to do so. I would be done in the later domcreate_devmodel_started callback but that is too late for the generated qemu arguments.
Sorry for jumping in late...
I stumbled across this problem just before openSUSE13.1 released and did a quick fix in libvirt
https://build.opensuse.org/package/view_file/Virtualization:openSUSE13.1/lib...
I removed setting the NIC devid in the libxl driver a while back to be consistent with other devices
http://libvirt.org/git/?p=libvirt.git;a=commit;h=ba64b97134a6129a48684f22f31...
The quick fix was to essentially revert the above commit until I could investigate further. Thank you for now having done that investigation :). Can the devid assignment logic be moved from libxl__device_nic_add() to libxl__device_nic_setdefault()?
It certainly seems like it would be more natural to do it there.
I suspect it might be done this way because at setdefault time you might be walking a list of nics none of which have been created yet -- so looking in xenstore would return "devid zero is free" for every one of them?
How about we: * move the init to setdefault to catch the single NIC added via hotplug case
Init of devid? Hm, would that work as I am not sure there is a simple way of differentiating between a NIC config for a single hotplug and one that is part of a create-time array...
* we add somewhere early in the domain create path a call to a function which assigns devids to an entire array of devices (and do it for all the different device types). Perhaps in initiate_domain_create() after the calls to libxl__domain_create_info_setdefault and libxl__domain_build_info_setdefault but before the loop calling libxl__device_disk_setdefault for the disks. * perhaps that same function should call setdefault too, after having assigned the device, rather than it being done later in an adhoc way?
Does that sound at all plausible?
I wonder, well this won't help for any other device types (maybe not really needed), what about just adding the following to the existing loop in domcreate_launch_dm (just a brain dump, not even tried to compile): for (i = 0; i < d_config->num_nics; i++) { /* We have to init the nic here, because we still haven't * called libxl_device_nic_add at this point, but qemu needs * the nic information to be complete. */ ret = libxl__device_nic_setdefault(gc, &d_config->nics[i], domid); if (ret) goto error_out; + if (d_config->nics[i].devid < 0) + d_config->nics[i].devid = i; } Of course this a gain won't work well if the caller had assigned some devids but not other. Ok, maybe do the loop twice, first round sets default and picks the highest pre-assigned devid and second round makes sure any still unassigned ones are set to ++that. Oh, just while talking about setdefault. Jim, this is one of the odd things when moving from xm to xl stack from libvirt: libvirt defaults to the netfront NIC when no model is specified and sets the type. The libxl setdefault function sets the model to rtl8139 but leaves the type untouched. So setting no model in the xml config creates a domain with no emulated NIC (this does not matter after Linux is up because the emulated devices get unplugged). Just that PXE boot will not work. This gets odd because with the old xen (xm) driver, no model meant rtl8139. Sigh, and to hijack this thread even further I noticed a quite unexpected behaviour when starting a domain trhough libvirt and then try to use "xl list -l" to get config details. "xl list" shows all running domains but "xl list -l" produces something like "you have to specify a domain name". I found the origin of this to be libxl_userdata_retrieve which takes a userdate_userid as an argument. Libvirt uses "libvirt-xml" for that, while xl uses "xl". This might be intentional and the bug is just that we need a better check for not finding the userdata and then skipping those domains. On the other hand ... its after all in both cases a domain created and started through libxl... Stefan
Ian.

On Thu, 2013-12-19 at 18:06 +0100, Stefan Bader wrote:
How about we: * move the init to setdefault to catch the single NIC added via hotplug case
Init of devid?
Yes, sorry for not being clear.
Hm, would that work as I am not sure there is a simple way of differentiating between a NIC config for a single hotplug and one that is part of a create-time array...
The create time array would be initialised with devid's != -1 as part of the steps outlined in the following bullets, so setdefault woudln't touch it again.
* we add somewhere early in the domain create path a call to a function which assigns devids to an entire array of devices (and do it for all the different device types). Perhaps in initiate_domain_create() after the calls to libxl__domain_create_info_setdefault and libxl__domain_build_info_setdefault but before the loop calling libxl__device_disk_setdefault for the disks. * perhaps that same function should call setdefault too, after having assigned the device, rather than it being done later in an adhoc way?
Does that sound at all plausible?
I wonder, well this won't help for any other device types (maybe not really needed), what about just adding the following to the existing loop in domcreate_launch_dm (just a brain dump, not even tried to compile):
for (i = 0; i < d_config->num_nics; i++) { /* We have to init the nic here, because we still haven't * called libxl_device_nic_add at this point, but qemu needs * the nic information to be complete. */ ret = libxl__device_nic_setdefault(gc, &d_config->nics[i], domid); if (ret) goto error_out; + if (d_config->nics[i].devid < 0) + d_config->nics[i].devid = i; }
Of course this a gain won't work well if the caller had assigned some devids but not other.
Indeed.
Ok, maybe do the loop twice, first round sets default and picks the highest pre-assigned devid and second round makes sure any still unassigned ones are set to ++that.
That would potentially leave holes, I don't know if that matters.
Oh, just while talking about setdefault. Jim, this is one of the odd things when moving from xm to xl stack from libvirt: libvirt defaults to the netfront NIC when no model is specified and sets the type. The libxl setdefault function sets the model to rtl8139 but leaves the type untouched.
This sounds like a bug in libxl to me -- it should do something consistent I think.
So setting no model in the xml config creates a domain with no emulated NIC (this does not matter after Linux is up because the emulated devices get unplugged). Just that PXE boot will not work. This gets odd because with the old xen (xm) driver, no model meant rtl8139.
Sigh, and to hijack this thread even further I noticed a quite unexpected behaviour when starting a domain trhough libvirt and then try to use "xl list -l" to get config details. "xl list" shows all running domains but "xl list -l" produces something like "you have to specify a domain name". I found the origin of this to be libxl_userdata_retrieve which takes a userdate_userid as an argument. Libvirt uses "libvirt-xml" for that, while xl uses "xl". This might be intentional and the bug is just that we need a better check for not finding the userdata and then skipping those domains. On the other hand ... its after all in both cases a domain created and started through libxl...
I think this was discussed a few weeks ago on the list, and there were one or two separate bugs and short comings. I'm not sure which subset were actually fixed. One issue is that xl stored the guest config and then retrieves it for use in xl list -l, but libvirt != xl and therefore has no config file to save. The solution is probably for the list stuff to be based on dynamically gathering the domain info, instead of reparsing the config. Ian.,

On 19.12.2013 18:57, Ian Campbell wrote:
On Thu, 2013-12-19 at 18:06 +0100, Stefan Bader wrote:
How about we: * move the init to setdefault to catch the single NIC added via hotplug case
Init of devid?
Yes, sorry for not being clear.
Hm, would that work as I am not sure there is a simple way of differentiating between a NIC config for a single hotplug and one that is part of a create-time array...
The create time array would be initialised with devid's != -1 as part of the steps outlined in the following bullets, so setdefault woudln't touch it again.
* we add somewhere early in the domain create path a call to a function which assigns devids to an entire array of devices (and do it for all the different device types). Perhaps in initiate_domain_create() after the calls to libxl__domain_create_info_setdefault and libxl__domain_build_info_setdefault but before the loop calling libxl__device_disk_setdefault for the disks. * perhaps that same function should call setdefault too, after having assigned the device, rather than it being done later in an adhoc way?
Does that sound at all plausible?
I wonder, well this won't help for any other device types (maybe not really needed), what about just adding the following to the existing loop in domcreate_launch_dm (just a brain dump, not even tried to compile):
for (i = 0; i < d_config->num_nics; i++) { /* We have to init the nic here, because we still haven't * called libxl_device_nic_add at this point, but qemu needs * the nic information to be complete. */ ret = libxl__device_nic_setdefault(gc, &d_config->nics[i], domid); if (ret) goto error_out; + if (d_config->nics[i].devid < 0) + d_config->nics[i].devid = i; }
Of course this a gain won't work well if the caller had assigned some devids but not other.
Indeed.
Ok, maybe do the loop twice, first round sets default and picks the highest pre-assigned devid and second round makes sure any still unassigned ones are set to ++that.
That would potentially leave holes, I don't know if that matters.
Yeah, probably rather rare. This just sounded like a less intrusive change which would only address the problem that no devid is no option when starting the dm.
Oh, just while talking about setdefault. Jim, this is one of the odd things when moving from xm to xl stack from libvirt: libvirt defaults to the netfront NIC when no model is specified and sets the type. The libxl setdefault function sets the model to rtl8139 but leaves the type untouched.
This sounds like a bug in libxl to me -- it should do something consistent I think.
So setting no model in the xml config creates a domain with no emulated NIC (this does not matter after Linux is up because the emulated devices get unplugged). Just that PXE boot will not work. This gets odd because with the old xen (xm) driver, no model meant rtl8139.
Sigh, and to hijack this thread even further I noticed a quite unexpected behaviour when starting a domain trhough libvirt and then try to use "xl list -l" to get config details. "xl list" shows all running domains but "xl list -l" produces something like "you have to specify a domain name". I found the origin of this to be libxl_userdata_retrieve which takes a userdate_userid as an argument. Libvirt uses "libvirt-xml" for that, while xl uses "xl". This might be intentional and the bug is just that we need a better check for not finding the userdata and then skipping those domains. On the other hand ... its after all in both cases a domain created and started through libxl...
I think this was discussed a few weeks ago on the list, and there were one or two separate bugs and short comings. I'm not sure which subset were actually fixed.
One issue is that xl stored the guest config and then retrieves it for use in xl list -l, but libvirt != xl and therefore has no config file to save.
Right, that kind of was what I tried to say in many words. :) Oh, hm probably with the addition that xl would save configs with one key and libvirt with another. So relying on that at least results in xl not being able to show details of libvirt created domains (and the other way round). But as you said, I seem to have missed some discussion about it (which I saw Jim mentioning a link but have not followed, yet, this morning.
The solution is probably for the list stuff to be based on dynamically gathering the domain info, instead of reparsing the config.
Yes, that sounds good. It would feel like a more intuitive behavior to me (not that I think I could call myself an expert on "intuitivity").
Ian.,

On Fri, 2013-12-20 at 11:16 +0100, Stefan Bader wrote:
One issue is that xl stored the guest config and then retrieves it for use in xl list -l, but libvirt != xl and therefore has no config file to save.
Right, that kind of was what I tried to say in many words. :) Oh, hm probably with the addition that xl would save configs with one key and libvirt with another.
It's not really about the key, libvirt simply does not produce an xl compatible configuration file (nor should it) Ian.

Stefan Bader wrote:
Oh, just while talking about setdefault. Jim, this is one of the odd things when moving from xm to xl stack from libvirt: libvirt defaults to the netfront NIC when no model is specified and sets the type. The libxl setdefault function sets the model to rtl8139 but leaves the type untouched.
The xend toolstack always creates both emulated and vif devices unless 'type=netfront' is explicitly specified. As you say, the guest gets to choose what to do with them. E.g. PXE boot using the emulated device, or have the driver for the PV device unplug the emulated one. I don't think libxl supports this right?
So setting no model in the xml config creates a domain with no emulated NIC (this does not matter after Linux is up because the emulated devices get unplugged). Just that PXE boot will not work. This gets odd because with the old xen (xm) driver, no model meant rtl8139.
Sigh, and to hijack this thread even further I noticed a quite unexpected behaviour when starting a domain trhough libvirt and then try to use "xl list -l" to get config details. "xl list" shows all running domains but "xl list -l" produces something like "you have to specify a domain name". I found the origin of this to be libxl_userdata_retrieve which takes a userdate_userid as an argument. Libvirt uses "libvirt-xml" for that, while xl uses "xl". This might be intentional and the bug is just that we need a better check for not finding the userdata and then skipping those domains. On the other hand ... its after all in both cases a domain created and started through libxl...
As Ian mentioned, this has already been discussed http://lists.xen.org/archives/html/xen-devel/2013-10/msg01921.html But looks like that thread has lost some steam... Regards, Jim

On Thu, 2013-12-19 at 11:39 -0700, Jim Fehlig wrote:
Stefan Bader wrote:
Oh, just while talking about setdefault. Jim, this is one of the odd things when moving from xm to xl stack from libvirt: libvirt defaults to the netfront NIC when no model is specified and sets the type. The libxl setdefault function sets the model to rtl8139 but leaves the type untouched.
The xend toolstack always creates both emulated and vif devices unless 'type=netfront' is explicitly specified. As you say, the guest gets to choose what to do with them. E.g. PXE boot using the emulated device, or have the driver for the PV device unplug the emulated one. I don't think libxl supports this right?
It should do, in fact I thought it was the default. How are you initialising the libxl_device_nic? Type == VIF_IOEMU (which is the default for a VIF on an HVM guest) means both emulated and pv. (there were bugs in the semantics here in very early versions of libxl, but I thought they were fixed even before 4.2) I don't think there is an option to have just the emulated device -- there is always a PV VIF there even if the guest doesn't use it. Ian.

Ian Campbell wrote:
On Thu, 2013-12-19 at 11:39 -0700, Jim Fehlig wrote:
Stefan Bader wrote:
Oh, just while talking about setdefault. Jim, this is one of the odd things when moving from xm to xl stack from libvirt: libvirt defaults to the netfront NIC when no model is specified and sets the type. The libxl setdefault function sets the model to rtl8139 but leaves the type untouched.
The xend toolstack always creates both emulated and vif devices unless 'type=netfront' is explicitly specified. As you say, the guest gets to choose what to do with them. E.g. PXE boot using the emulated device, or have the driver for the PV device unplug the emulated one. I don't think libxl supports this right?
On my 4.3.1 setup, I changed the above to
if (hvm) {
x_nic->nictype = LIBXL_NIC_TYPE_VIF_IOEMU;
if (l_nic->model) {
if (VIR_STRDUP(x_nic->model, l_nic->model) < 0)
return -1;
if (STREQ(l_nic->model, "netfront"))
x_nic->nictype = LIBXL_NIC_TYPE_VIF;
}
} else {
x_nic->nictype = LIBXL_NIC_TYPE_VIF;
}
which is better initialization logic IMO. If the domain is hvm, set nictype to LIBXL_NIC_TYPE_VIF_IOEMU, unless model 'netfront' is specified. This behavior is consistent with the legacy xen driver. The change seems to work fine and resolves the PXE issue Stefan noted - as long as I initialize devid in libvirt. So we'll need the above fix in libvirt, as well as a resolution to the nic devid initialization in libxl that started this thread.
Regards,
Jim It should do, in fact I thought it was the default.
How are you initialising the libxl_device_nic?
if (l_nic->model && !STREQ(l_nic->model, "netfront")) { if (VIR_STRDUP(x_nic->model, l_nic->model) < 0) return -1; x_nic->nictype = LIBXL_NIC_TYPE_VIF_IOEMU; } else { x_nic->nictype = LIBXL_NIC_TYPE_VIF; }
Type == VIF_IOEMU (which is the default for a VIF on an HVM guest) means both emulated and pv. (there were bugs in the semantics here in very early versions of libxl, but I thought they were fixed even before 4.2)
On my 4.3.1 setup, I changed the above to if (hvm) { x_nic->nictype = LIBXL_NIC_TYPE_VIF_IOEMU; if (l_nic->model) { if (VIR_STRDUP(x_nic->model, l_nic->model) < 0) return -1; if (STREQ(l_nic->model, "netfront")) x_nic->nictype = LIBXL_NIC_TYPE_VIF; } } else { x_nic->nictype = LIBXL_NIC_TYPE_VIF; } which is better initialization logic IMO. If the domain is hvm, set nictype to LIBXL_NIC_TYPE_VIF_IOEMU, unless model 'netfront' is specified. This behavior is consistent with the legacy xen driver. The change seems to work fine and resolves the PXE issue Stefan noted - as long as I initialize devid in libvirt. So we'll need the above fix in libvirt, as well as a resolution to the nic devid initialization in libxl that started this thread. Regards, Jim

Jim Fehlig wrote:
Ian Campbell wrote:
On Thu, 2013-12-19 at 11:39 -0700, Jim Fehlig wrote:
Stefan Bader wrote:
Oh, just while talking about setdefault. Jim, this is one of the odd things when moving from xm to xl stack from libvirt: libvirt defaults to the netfront NIC when no model is specified and sets the type. The libxl setdefault function sets the model to rtl8139 but leaves the type untouched.
The xend toolstack always creates both emulated and vif devices unless 'type=netfront' is explicitly specified. As you say, the guest gets to choose what to do with them. E.g. PXE boot using the emulated device, or have the driver for the PV device unplug the emulated one. I don't think libxl supports this right?
On my 4.3.1 setup, I changed the above to
Updated the system to Xen 4.4 rc1 meanwhile...
if (hvm) {
x_nic->nictype = LIBXL_NIC_TYPE_VIF_IOEMU;
if (l_nic->model) {
if (VIR_STRDUP(x_nic->model, l_nic->model) < 0)
return -1;
if (STREQ(l_nic->model, "netfront"))
x_nic->nictype = LIBXL_NIC_TYPE_VIF;
}
} else {
x_nic->nictype = LIBXL_NIC_TYPE_VIF;
}
which is better initialization logic IMO. If the domain is hvm, set nictype to LIBXL_NIC_TYPE_VIF_IOEMU, unless model 'netfront' is specified. This behavior is consistent with the legacy xen driver. The change seems to work fine and resolves the PXE issue Stefan noted -
I've submitted a patch for the PXE issue to the libvirt list https://www.redhat.com/archives/libvir-list/2014-January/msg00208.html
as long as I initialize devid in libvirt. So we'll need the above fix in libvirt, as well as a resolution to the nic devid initialization in libxl that started this thread.
Stefan, any progress on the devid initialization? Regards, Jim
participants (3)
-
Ian Campbell
-
Jim Fehlig
-
Stefan Bader