On Sat, May 22, 2010 at 12:17:05PM -0700, Scott Feldman wrote:
On 5/22/10 11:34 AM, "Dave Allan" <dallan(a)redhat.com>
wrote:
> On Sat, May 22, 2010 at 11:14:20AM -0400, Stefan Berger wrote:
>> On Fri, 2010-05-21 at 23:35 -0700, Scott Feldman wrote:
>>> On 5/21/10 6:50 AM, "Stefan Berger"
<stefanb(a)linux.vnet.ibm.com> wrote:
>>>
>>>> This patch may get 802.1Qbh devices working. I am adding some code to
>>>> poll for the status of an 802.1Qbh device and loop for a while until
the
>>>> status indicates success. This part for sure needs more work and
>>>> testing...
>>>
>>> I think we can drop this patch 3/3. For bh, we don't want to poll for
>>> status because it may take awhile before status of other than in-progress
is
>>> indicated. Link UP on the eth is the async notification of status=success.
>>
>> The idea was to find out whether the association actually worked and if
>> not either fail the start of the VM or not hotplug the interface. If we
>> don't do that the user may end up having a VM that has no connectivity
>> (depending on how the switch handles an un-associated VM) and start
>> debugging all kinds of things... Really, I would like to know if
>> something went wrong. How long would we have to wait for the status to
>> change? How does a switch handle traffic from a VM if the association
>> failed? At least for 802.1Qbg we were going to get failure notification.
>
> I tend to agree that we should try to get some indication of whether
> the associate succeeded or failed. Is the time that we would have to
> poll bounded by anything, or is it reasonably short?
It's difficult to put an upper bound on how long to poll. In most case,
status would be available in a reasonably short period of time, but the
upper bound depends on activity external to the host.
That makes sense. The timeout should be a configurable value. What
do you think is a reasonable default?
> Mostly I'm concerned about the failure case: how would the
user know
> that something has gone wrong, and where would information to debug
> the problem appear?
Think of it as equivalent to waiting to get link UP after plugging in a
physical cable into a physical switch port. In some cases negotiation of
the link may take on the order of seconds. Depends on the physical media,
of course. A user can check for link UP using ethtool or ip cmd.
Similarly, a user can check for association status using ip cmd, once we
extend ip cmd to know about virtual ports (patch for ip cmd coming soon).
That's the way I was thinking about it as well. The difference I see
between an actual physical cable and what we're doing here is that if
you're in the data center and you plug in a cable, you're focused on
whether the link comes up. Here, the actor is likely to be an
automated process, and users will simply be presented with a VM with
no or incorrect connectivity, and they will have no idea what
happened. It's just not supportable to provide them with no
indication of what failed or why.
Dave