On 05/02/2013 02:06 AM, Alex Jia wrote:
On 04/29/2013 02:48 AM, Laine Stump wrote:
> On 04/28/2013 06:12 AM, Alex Jia wrote:
>> GDB backtrace:
>>
>> Breakpoint 1, virPCIGetVirtualFunctionIndex
>> (pf_sysfs_device_link=0x7fc04400f470
>> "/sys/bus/pci/devices/0000:03:00.1",
vf_sysfs_device_link=<optimized
>> out>, vf_index=vf_index@entry=0x7fc06897b8f4)
>> at util/virpci.c:2107
>> 2107 if (virPCIDeviceAddressIsEqual(vf_bdf, virt_fns[i])) {
>> (gdb) p *vf_bdf
>> $1 = {domain = 0, bus = 3, slot = 16, function = 1}
>> (gdb) l
>> 2102 "virtual_functions"),
>> pf_sysfs_device_link);
>> 2103 goto out;
>> 2104 }
>> 2105
>> 2106 for (i = 0; i< num_virt_fns; i++) {
>> 2107 if (virPCIDeviceAddressIsEqual(vf_bdf, virt_fns[i])) {
>> 2108 *vf_index = i;
>> 2109 ret = 0;
>> 2110 break;
>> 2111 }
>> (gdb) p num_virt_fns
>> $46 = 2
>> (gdb) p virt_fns[0]
>> $48 = (virPCIDeviceAddressPtr) 0x0
>> (gdb) s
>> virPCIDeviceAddressIsEqual (bdf2=0x0, bdf1=0x7fc04400f330) at
>> util/virpci.c:1844
>> 1844 (bdf1->slot == bdf2->slot)&&
>> (gdb) s
>>
>> Program received signal SIGSEGV, Segmentation fault.
>>
>> RHBZ:
https://bugzilla.redhat.com/show_bug.cgi?id=957416
>>
>> Signed-off-by: Alex Jia<ajia(a)redhat.com>
>> ---
>> src/util/virpci.c | 3 ++-
>> 1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/src/util/virpci.c b/src/util/virpci.c
>> index 97bba74..dda044c 100644
>> --- a/src/util/virpci.c
>> +++ b/src/util/virpci.c
>> @@ -1897,7 +1897,8 @@ static bool
>> virPCIDeviceAddressIsEqual(virPCIDeviceAddressPtr bdf1,
>> virPCIDeviceAddressPtr bdf2)
>> {
>> - return ((bdf1->domain == bdf2->domain)&&
>> + return (bdf1&& bdf2&&
>> + (bdf1->domain == bdf2->domain)&&
>> (bdf1->bus == bdf2->bus)&&
>> (bdf1->slot == bdf2->slot)&&
>> (bdf1->function == bdf2->function));
> NACK.
>
> This patch only fixes the symptom (not the root cause), and only in the
> case of starting a domain with an<interface type='hostdev'. It
doesn't
> fix the second crash described in the BZ when running virsh
> nodedev-dumpxml - the code path of that doesn't ever get to
Yes, I just noticed this, it should be different code path.
> virPCIDeviceAddressIsEqual() (but *does* call the function that actually
> has the bug).
Yes, another bug.
IMO, the only bug here.
> The root cause of these crashes was a typo introduced just before the
> release of 1.0.4. I found that problem and pushed the correct patch on
> April 9:
>
>
http://libvirt.org/git/?p=libvirt.git;a=commit;h=9579b6bc209b46a0f079b214...
>
It should be okay if the patch is backported into rhel.
No backport is needed. This bug was only in one release of libvirt, and
that release is not (and almost surely will not be) in any public RHEL
or Fedora release.
> (Beyond that, I don't like the idea of ignoring a NULL
pointer -
> virPCIDeviceAddressIsEqual should always be passed non-NULL pointers,
> and its only current caller does guarantee that (except for when it has
> a bug). If we want virPCIDeviceAddressIsEqual to do something with NULL
> pointers, it should be logging an error and failing, but that would
> complicate the interface to the function beyond just returning a
> true/false (it would have to be tri-state, and the caller would need to
Yes, the function interface will be more complex and return value should
be tri-state not simple true/false.
Which will make the code in the caller unnecessarily more complex, since
we've already checked for a valid pointer and handled it earlier in that
same function (or to be more exact, a function called by that one -
virPCIGetVirtualFunctions(), which does guarantee that only non-NULL
pointers are placed into the array (when written correctly, as it is now
after the bugfix patch above).
> check all three possibilities). I think in this case it's better for the
> caller to make sure the pointers it sends are valid.)
Yes, a simple way is the caller makes sure pointers is Non-NULL, maybe,
we need also to add comment "The caller should make sure the pointers
it sends are valid" into the virPCIDeviceAddressIsEqual().
That's kind of a given for *any* function :-)
However,
in order to avoid the caller missing argument judgement, I think we
should also fix().
There are many private functions in libvirt that assume the caller has
sent in sane arguments. If every function checked every argument for
non-NULL, the code would be full of that. I think that's only necessary
for API-level functions, but this function is only a static, used just
within the same file (and only once).
I think that the proper course of action here is to leave it as it is;
note that if it had been "fixed" at the time the bug was introduced to
virPCIGetVirtualFunctions(), it would have been *much* more difficult to
detect, identify, and fix that bug.