
On Wed, 2013-12-18 at 17:44 -0700, Jim Fehlig wrote:
Stefan Bader wrote:
On 18.12.2013 14:28, Ian Campbell wrote:
On Wed, 2013-12-18 at 14:12 +0100, Stefan Bader wrote:
On 18.12.2013 13:27, Ian Campbell wrote:
On Tue, 2013-12-17 at 18:32 +0100, Stefan Bader wrote:
> Might this libxl fix be relevant: > commit 5420f26507fc5c9853eb1076401a8658d72669da > Author: Jim Fehlig <jfehlig@suse.com> > Date: Fri Jan 11 12:22:26 2013 +0000 > > libxl: Set vfb and vkb devid if not done so by the caller > > Other devices set a sensible devid if the caller has not done so. > Do the same for vfb and vkb. While at it, factor out the common code > used to determine a sensible devid, so it can be used by other > libxl__device_*_add functions. > > Signed-off-by: Jim Fehlig <jfehlig@suse.com> > Acked-by: Ian Campbell <ian.campbell@citrix.com> > Committed-by: Ian Campbell <ian.campbell@citrix.com> > > and a follow up in dfeccbeaa. Although the comment implies that nic's > were already correctly assigning a devid if the caller specified -1, so > I don't know why it doesn't work for you :-( > Ok, yes, the commit above indeed changes libxl__device_nic_add to call libxl__device_nextid for the devid... Just how is this actually called. Maybe not sufficient but "git grep libxl__device_nic_add" in the xen code only shows the definition and a declaration in libxl_internal.h to me...
I have a feeling a macro might be involved...
Here we go, look for DEFINE_DEVICE_REMOVE in libxl.c. We should really add the eventual function names in comments to provide grep fodder....
Oh duh, yeah. So in DEFINE_DEVICE_ADD a libxl_device_nic_add is created which calls to libxl__device_nic_add. When I look for the single _ version I find a call from xl_cmdimpl.c and its public declaration in libxl.h. So I guess the bug is that libvirt in the libxl driver never seems to do so
The macro creates libxl__add_nics which adds the nics from the libxl_domain_config->nics array. I don't think libvirt needs to call libxl_device_nic_add manually unless it is hotplugging a new nic at runtime.
Hm, so I think this is the path:
libxl_domain_create_new -> do_domain_create -> initiate_domain_create -> libxl__bootloader_run (HVM domain, skipping bootloader) <- domcreate_bootloader_done -> domcreate_rebuild_done <- domcreate_launch_dm -> libxl__spawn_local_dm <- domcreate_devmodel_started
In libxl__spawn_local_dm, there is the following loop:
for (i = 0; i < d_config->num_nics; i++) { /* We have to init the nic here, because we still haven't * called libxl_device_nic_add at this point, but qemu needs * the nic information to be complete. */ ret = libxl__device_nic_setdefault(gc, &d_config->nics[i], domid); if (ret) goto error_out; }
So I think when starting the dm, the devid just is not set as setdefault does not seem to do so. I would be done in the later domcreate_devmodel_started callback but that is too late for the generated qemu arguments.
Sorry for jumping in late...
I stumbled across this problem just before openSUSE13.1 released and did a quick fix in libvirt
https://build.opensuse.org/package/view_file/Virtualization:openSUSE13.1/lib...
I removed setting the NIC devid in the libxl driver a while back to be consistent with other devices
http://libvirt.org/git/?p=libvirt.git;a=commit;h=ba64b97134a6129a48684f22f31...
The quick fix was to essentially revert the above commit until I could investigate further. Thank you for now having done that investigation :). Can the devid assignment logic be moved from libxl__device_nic_add() to libxl__device_nic_setdefault()?
It certainly seems like it would be more natural to do it there. I suspect it might be done this way because at setdefault time you might be walking a list of nics none of which have been created yet -- so looking in xenstore would return "devid zero is free" for every one of them? How about we: * move the init to setdefault to catch the single NIC added via hotplug case * we add somewhere early in the domain create path a call to a function which assigns devids to an entire array of devices (and do it for all the different device types). Perhaps in initiate_domain_create() after the calls to libxl__domain_create_info_setdefault and libxl__domain_build_info_setdefault but before the loop calling libxl__device_disk_setdefault for the disks. * perhaps that same function should call setdefault too, after having assigned the device, rather than it being done later in an adhoc way? Does that sound at all plausible? Ian.