On Thu, Jan 05, 2017 at 08:48:32AM -0200, Marcelo Tosatti wrote:
On Wed, Jan 04, 2017 at 11:36:31PM -0200, Eduardo Habkost wrote:
> On Wed, Jan 04, 2017 at 08:26:27PM -0200, Marcelo Tosatti wrote:
> > On Wed, Jan 04, 2017 at 05:59:17PM -0200, Eduardo Habkost wrote:
> > > On Wed, Jan 04, 2017 at 11:39:16AM -0200, Eduardo Habkost wrote:
> > > > On Wed, Jan 04, 2017 at 09:56:56AM -0200, Marcelo Tosatti wrote:
> > > > > On Tue, Dec 27, 2016 at 05:21:20PM -0200, Eduardo Habkost
wrote:
> > > > > > Instead of blocking migration on the source when invtsc is
> > > > > > enabled, rely on the migration destination to ensure
there's no
> > > > > > TSC frequency mismatch.
> > > > > >
> > > > > > We can't allow migration unconditionally because we
don't know if
> > > > > > the destination is a QEMU version that is really going to
ensure
> > > > > > there's no TSC frequency mismatch. To ensure we are
migrating to
> > > > > > a destination that won't ignore SET_TSC_KHZ errors,
allow invtsc
> > > > > > migration only on pc-*-2.9 and newer.
> > > > > >
> > > > > > Signed-off-by: Eduardo Habkost <ehabkost(a)redhat.com>
> > > [...]
> > > > > > @@ -2655,12 +2656,14 @@ int kvm_arch_put_registers(CPUState
*cpu, int level)
> > > > > > }
> > > > > >
> > > > > > if (level == KVM_PUT_FULL_STATE) {
> > > > > > - /* We don't check for kvm_arch_set_tsc_khz()
errors here,
> > > > > > - * because TSC frequency mismatch shouldn't
abort migration,
> > > > > > - * unless the user explicitly asked for a more
strict TSC
> > > > > > - * setting (e.g. using an explicit
"tsc-freq" option).
> > > > > > + /* Migration TSC frequency mismatch is fatal only
if we are
> > > > > > + * actually reporting Invariant TSC to the guest.
> > > > > > */
> > > > > > - kvm_arch_set_tsc_khz(cpu);
> > > > > > + ret = kvm_arch_set_tsc_khz(cpu);
> > > > > > + if ((x86_cpu->env.features[FEAT_8000_0007_EDX]
& CPUID_APM_INVTSC) &&
> > > > > > + ret < 0) {
> > > > > > + return ret;
> > > > > > + }
> > > > > > }
> > > > >
> > > > > Will the guest continue in the source in this case?
> > > > >
> > > > > I think this is past the point where migration has been
declared
> > > > > successful.
> > > > >
> > > > > Otherwise looks good.
> > > >
> > > > Good point. I will make additional tests and see if there's some
> > > > other place where the kvm_arch_set_tsc_khz() call can be moved
> > > > to.
> > >
> > > So, if we solve this and do something on (for example) post_load,
> > > we still have a problem: device state is migrated after RAM. This
> > > means QEMU will check for TSC scaling and abort migration very
> > > late.
> > >
> > > We could solve that by manually registering a SaveVMHandler that
> > > will send the TSC frequency on save_live_setup, so migration is
> > > aborted earlier.
> > >
> > > But: this sounds like just a complex hack to work around the real
> > > problems:
> > >
> > > 1) TSC frequency is guest-visible, and anything that affects
> > > guest ABI should depend on the VM configuration, not on host
> > > capabilities;
> >
> > Well not really: the TSC frequency where the guest starts
> > is the frequency the guest software expects.
> > So it does depend on host capabilities.
>
> Could you explain where this expectation comes from? I can see
> multiple cases where choosing the TSC frequency where the VM
> starts is not the best option.
1. Boot guest.
2. Calibrate TSC.
3. Use delay() with TSC calibration above, or
use TSC to measure the passage of time (TSC clock
in userspace).
If TSC scaling is available, using a different frequency should
be safe, shouldn't it? Otherwise, migrating with TSC scaling
wouldn't be safe either.
Anyway: I don't disagree the starting host frequency is a good
default. It is. But I don't think it's the best option on all
cases.
> I am considering two possible scenarios below. You probably have
> another scenario in mind, so it would be useful if you could
> describe it so we can consider possible solutions.
>
>
> Scenario 1:
>
> You have two hosts: A and B, both with TSC scaling. They have
> different frequencies. The VM can be started on either one of
> them, and can be migrated to either one depending on the policy
> of management software.
>
> Maybe B is a backup host, the VM is expected to run most times on
> host A, and we want to use the TSC frequency from host A every
> time. Maybe it's the opposite and we want to use the frequency of
> B. Maybe we want to use the lowest frequency of both, maybe the
> highest one. We (QEMU developers) can recommend policy to libvirt
> developers, but a given QEMU process doesn't have information to
> decide what's best.
I can't see any practical scenario here, you will always want
to start with TSC frequency of the host where the VM was started.
If i am mistaken, please describe a practical case.
(If a practical scenario comes up, and there is a use-case
for setting the TSC frequency on startup, lets say
a Windows VM which fails to boot if the TSC frequency
is too high, then it should be supported... But the
only known scenario to me, applying to 99.999% of cases,
is to expose the TSC frequency where the guest booted at).
I don't have any specific case: my point is that I can't tell
what's the best frequency if I don't know where the hosts are
expected to be migrated to.
You claim that using the starting host frequency is the best
option on the vast majority of cases. Maybe it's true, and it
would be a good default. The only problem is that this default
affects migratability:
> Scenario 2:
>
> Host A has TSC scaling, host B doesn't have TSC scaling. We want
> to be able to start the VM on host A, and migrate to B. In this
> case, the only possible solution is to use B's frequency when
> starting the VM. The QEMU process doesn't have enough information
> to make that decision.
That is a good point. But again, its a special case and
should be supported by -cpu xxx,tsc-frequency=zzzz.
However, for the vast majority of 99.999% cases, the issue
can be handled entirely in QEMU, without libvirt involvement,
and without adding extra steps to the management software.
I agree it should cover most cases. The only problem here is that
it can break migration in unexpected ways.
Then my point is: assuming that libvirt will prefer to require
explicit TSC frequency configuration to enable invtsc migration
(instead of getting unpredictable migration compatibility), is
the added complexity to migration code worth the effort, if
choosing an explicit frequency is safer and more predictable? I
believe this is where we disagree.
> > > 2) Setting TSC frequency depending on the host will make
> > > migratability unpredictable for management software: the same
> > > VM config could be migratable to host A when started on host
> > > B, but not migratable to host A when started on host C.
> >
> > Well, just check the frequency.
>
> What do you mean by "just check the frequency"? What exactly
> should management software do?
VM booted on host-A can be migrated to host-B if TSC
frequency matches.
Except when TSC scaling is available. Then you may or may not
migrate from A to B, and you don't know that unless you actually
try to migrate.
> Whatever management software do, if you just use the source host
> frequency you can get into a situation where you can start a VM
> on host A and migrate it to B, but can't start the VM on host B
> and migrate it to A. This would be confusing for users, and
> probably break assumptions of existing management software.
Well this is a side effect of invariant TSC and migration.
Look, i agree with all your points, but my point is this: i personally
prefer to handle the 99.999% case, which is the TSC frequency exposed is the one
from the host where the guest booted, in QEMU entirely (for example, to
make life easier for people who run qemu manually such as myself).
Right. I don't mind not covering 100% of cases. But I worry if
the cases not covered by us behave unexpectedly and
unpredictably. If we caused a failure every time an
unsafe/unsupported config was used, it would be OK. But making
the ability to migrate unpredictable is a more serious issue IMO.
I believe we both agree about how the final version should
behave, but disagree about what is more important in the first
version:
* My proposal for the first version means:
* People would have to configure TSC frequency manually if
they want invtsc + migration (until we also provide a
mechanism to query the host TSC frequency so
management/scripts could choose it as default);
* Adding more code to libvirt.
* Your proposal for the first version means:
* The ability to migrate won't be predictable by libvirt;
* Extra complexity on the migration code to ensure
we abort migration on mismatch.
We seem to be weighting those issues differently. To me, having
predictability on migration ability since the first version is
more important to me than making configuration easier (on the
first version).
> > > I suggest we allow migration with invtsc if and only if
> > > tsc-frequency is set explicitly by management software. In other
> > > words, apply only patches 1/4 and 2/4 from this series. After
> > > that, we will need libvirt support for configuring tsc-frequency.
> >
> > I don't like that for the following reasons:
> >
> > * It moves low level complexity from QEMU/KVM to libvirt
> > (libvirt has to call KVM_GET_TSC_KHZ from a vcpu thread).
>
> It doesn't. It could ask QEMU for that information (the new
> query-cpu-model-expansion QMP command would allow that).
>
> Or, alternatively, it could just let the user choose the
> frequency.
Again, you want to expose the host where the VM booted in most
cases (except the ones you list above).
> It's probably acceptable for many use cases where
> invtsc+migration is important.
Ok, thats better. Can you add a patch to your series with the steps
as how mgmt software should proceed ?
> > * It makes it difficult to run QEMU manually (i use QEMU
> > manually all the time).
>
> I believe the set of people that: 1) need invtsc; 2) need
> migration to work; and 3) run QEMU manually will be able to add
> "tsc-frequency=XXX" to the command-line easily. :)
Ok, so migration is only allowed if tsc-frequency= is specified.
> > * It requires patching libvirt.
> >
> > In my experience things work better when the functionality is
> > not split across libvirt/qemu.
>
> In my experience things break when management software is the
> only component able to make a decision but we don't provide
> mechanisms to let management software make that decision.
>
> The TSC frequency affects migratability to hosts. Choose the
> wrong frequency, and you might not be able to migrate. Only
> management software knows to which hosts the VM could be migrated
> in the future, and which TSC frequency would allow that.
True.
> > Can't this be fixed in QEMU? Just check that destination host supports
> > TSC scaling before migration (or that KVM_GET_TSC_KHZ return value
> > matches on source and destination).
> >
>
> This solves only one use case: where you want to expose the
> starting host TSC frequency to the guest. It doesn't cover any
> scenario where the TSC frequency of the starting host isn't the
> best one. (See the two scenarios I described above)
True.
> You seem to have a specific use case in mind. If you describe it,
> we could decide if it's worth the extra migration code complexity
> to deal with invtsc migration without explicit tsc-frequency. By
> now, I am not convinced yet.
I don't have any specific use case in mind. I'm just trying to
avoid moving complexity to libvirt which in my experience is a
the best thing to do.
I think both our proposals make things harder for libvirt in
different ways. I just want to make the complexity explicit for
libvirt, instead of giving them the illusion of safety (making
migration break in ways libvirt can't detect).
Anyway, your points are still valid and it would be still useful
to do what you propose. I will give it a try on a new version of
patch 4/4 that can abort migration earlier. But note that I
expect some pushback from other maintainers because of the
complexity of the code. If that happens, be aware that I won't be
able to justify the added complexity because I would still think
it's not worth it.
> Maybe your use case just needs a explicit "invtsc-migration=on"
> command-line flag without a mechanism to abort migration on
> mismatch? I can't tell.
Again, there is no special case.
> Note that even if we follow your suggestion and implement an
> alternative version of patch 4/4 to cover your use case, I will
> strongly recommend libvirt developers to support configuring TSC
> frequency if they want to support invtsc + migration without
> surprising/unpredictable restrictions on migratability.
Well, alright. If you make the TSC frequency of the host
available to mgmt software as you describe, and write the steps mgmt
software should take, i'm good.
I plan to. The problem is that the mechanism to query the host
frequency may be unavailable in the first version.
Thanks for the help Eduardo.
--
Eduardo