On Fri, May 10, 2019 at 05:48:38PM +0800, Cornelia Huck wrote:
On Fri, 10 May 2019 10:36:09 +0100
"Dr. David Alan Gilbert" <dgilbert(a)redhat.com> wrote:
> * Cornelia Huck (cohuck(a)redhat.com) wrote:
> > On Thu, 9 May 2019 17:48:26 +0100
> > "Dr. David Alan Gilbert" <dgilbert(a)redhat.com> wrote:
> >
> > > * Cornelia Huck (cohuck(a)redhat.com) wrote:
> > > > On Thu, 9 May 2019 16:48:57 +0100
> > > > "Dr. David Alan Gilbert" <dgilbert(a)redhat.com>
wrote:
> > > >
> > > > > * Cornelia Huck (cohuck(a)redhat.com) wrote:
> > > > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > > > Alex Williamson <alex.williamson(a)redhat.com> wrote:
> > > > > >
> > > > > > > On Sun, 5 May 2019 21:49:04 -0400
> > > > > > > Yan Zhao <yan.y.zhao(a)intel.com> wrote:
> > > > > >
> > > > > > > > + Errno:
> > > > > > > > + If vendor driver wants to claim a mdev device
incompatible to all other mdev
> > > > > > > > + devices, it should not register version
attribute for this mdev device. But if
> > > > > > > > + a vendor driver has already registered version
attribute and it wants to claim
> > > > > > > > + a mdev device incompatible to all other mdev
devices, it needs to return
> > > > > > > > + -ENODEV on access to this mdev device's
version attribute.
> > > > > > > > + If a mdev device is only incompatible to
certain mdev devices, write of
> > > > > > > > + incompatible mdev devices's version
strings to its version attribute should
> > > > > > > > + return -EINVAL;
> > > > > > >
> > > > > > > I think it's best not to define the specific errno
returned for a
> > > > > > > specific situation, let the vendor driver decide,
userspace simply
> > > > > > > needs to know that an errno on read indicates the
device does not
> > > > > > > support migration version comparison and that an errno
on write
> > > > > > > indicates the devices are incompatible or the target
doesn't support
> > > > > > > migration versions.
> > > > > >
> > > > > > I think I have to disagree here: It's probably valuable
to have an
> > > > > > agreed error for 'cannot migrate at all' vs
'cannot migrate between
> > > > > > those two particular devices'. Userspace might want to
do different
> > > > > > things (e.g. trying with different device pairs).
> > > > >
> > > > > Trying to stuff these things down an errno seems a bad idea; we
can't
> > > > > get much information that way.
> > > >
> > > > So, what would be a reasonable approach? Userspace should first read
> > > > the version attributes on both devices (to find out whether
migration
> > > > is supported at all), and only then figure out via writing whether
they
> > > > are compatible?
> > > >
> > > > (Or just go ahead and try, if it does not care about the reason.)
> > >
> > > Well, I'm OK with something like writing to test whether it's
> > > compatible, it's just we need a better way of saying 'no'.
> > > I'm not sure if that involves reading back from somewhere after
> > > the write or what.
> >
> > Hm, so I basically see two ways of doing that:
> > - standardize on some error codes... problem: error codes can be hard
> > to fit to reasons
> > - make the error available in some attribute that can be read
> >
> > I'm not sure how we can serialize the readback with the last write,
> > though (this looks inherently racy).
> >
> > How important is detailed error reporting here?
>
> I think we need something, otherwise we're just going to get vague
> user reports of 'but my VM doesn't migrate'; I'd like the error to
be
> good enough to point most users to something they can understand
> (e.g. wrong card family/too old a driver etc).
Ok, that sounds like a reasonable point. Not that I have a better idea
how to achieve that, though... we could also log a more verbose error
message to the kernel log, but that's not necessarily where a user will
look first.
Ideally, we'd want to have the user space program setting up things
querying the general compatibility for migration (so that it becomes
their problem on how to alert the user to problems :), but I'm not sure
how to eliminate the race between asking the vendor driver for
compatibility and getting the result of that operation.
Unless we introduce an interface that can retrieve _all_ results
together with the written value? Or is that not going to be much of a
problem in practice?
what about defining a migration_errors attribute, storing
recent 10 error
records with format like:
input string: error
as identical input strings always have the same error string, the 10 error
records may meet 10+ reason querying operations. And in practice, I think there
wouldn't be 10 simultaneous migration requests?
or could we just define some common errno? like
#define ENOMIGRATION 140 /* device not supporting migration */
#define EUNATCH 49 /* software version not match */
#define EHWNM 142 /* hardware not matching*/