On Wed, Jan 04, 2017 at 08:26:27PM -0200, Marcelo Tosatti wrote:
On Wed, Jan 04, 2017 at 05:59:17PM -0200, Eduardo Habkost wrote:
> On Wed, Jan 04, 2017 at 11:39:16AM -0200, Eduardo Habkost wrote:
> > On Wed, Jan 04, 2017 at 09:56:56AM -0200, Marcelo Tosatti wrote:
> > > On Tue, Dec 27, 2016 at 05:21:20PM -0200, Eduardo Habkost wrote:
> > > > Instead of blocking migration on the source when invtsc is
> > > > enabled, rely on the migration destination to ensure there's no
> > > > TSC frequency mismatch.
> > > >
> > > > We can't allow migration unconditionally because we don't
know if
> > > > the destination is a QEMU version that is really going to ensure
> > > > there's no TSC frequency mismatch. To ensure we are migrating to
> > > > a destination that won't ignore SET_TSC_KHZ errors, allow invtsc
> > > > migration only on pc-*-2.9 and newer.
> > > >
> > > > Signed-off-by: Eduardo Habkost <ehabkost(a)redhat.com>
> [...]
> > > > @@ -2655,12 +2656,14 @@ int kvm_arch_put_registers(CPUState *cpu, int
level)
> > > > }
> > > >
> > > > if (level == KVM_PUT_FULL_STATE) {
> > > > - /* We don't check for kvm_arch_set_tsc_khz() errors
here,
> > > > - * because TSC frequency mismatch shouldn't abort
migration,
> > > > - * unless the user explicitly asked for a more strict TSC
> > > > - * setting (e.g. using an explicit "tsc-freq"
option).
> > > > + /* Migration TSC frequency mismatch is fatal only if we are
> > > > + * actually reporting Invariant TSC to the guest.
> > > > */
> > > > - kvm_arch_set_tsc_khz(cpu);
> > > > + ret = kvm_arch_set_tsc_khz(cpu);
> > > > + if ((x86_cpu->env.features[FEAT_8000_0007_EDX] &
CPUID_APM_INVTSC) &&
> > > > + ret < 0) {
> > > > + return ret;
> > > > + }
> > > > }
> > >
> > > Will the guest continue in the source in this case?
> > >
> > > I think this is past the point where migration has been declared
> > > successful.
> > >
> > > Otherwise looks good.
> >
> > Good point. I will make additional tests and see if there's some
> > other place where the kvm_arch_set_tsc_khz() call can be moved
> > to.
>
> So, if we solve this and do something on (for example) post_load,
> we still have a problem: device state is migrated after RAM. This
> means QEMU will check for TSC scaling and abort migration very
> late.
>
> We could solve that by manually registering a SaveVMHandler that
> will send the TSC frequency on save_live_setup, so migration is
> aborted earlier.
>
> But: this sounds like just a complex hack to work around the real
> problems:
>
> 1) TSC frequency is guest-visible, and anything that affects
> guest ABI should depend on the VM configuration, not on host
> capabilities;
Well not really: the TSC frequency where the guest starts
is the frequency the guest software expects.
So it does depend on host capabilities.
Could you explain where this expectation comes from? I can see
multiple cases where choosing the TSC frequency where the VM
starts is not the best option.
I am considering two possible scenarios below. You probably have
another scenario in mind, so it would be useful if you could
describe it so we can consider possible solutions.
Scenario 1:
You have two hosts: A and B, both with TSC scaling. They have
different frequencies. The VM can be started on either one of
them, and can be migrated to either one depending on the policy
of management software.
Maybe B is a backup host, the VM is expected to run most times on
host A, and we want to use the TSC frequency from host A every
time. Maybe it's the opposite and we want to use the frequency of
B. Maybe we want to use the lowest frequency of both, maybe the
highest one. We (QEMU developers) can recommend policy to libvirt
developers, but a given QEMU process doesn't have information to
decide what's best.
Scenario 2:
Host A has TSC scaling, host B doesn't have TSC scaling. We want
to be able to start the VM on host A, and migrate to B. In this
case, the only possible solution is to use B's frequency when
starting the VM. The QEMU process doesn't have enough information
to make that decision.
> 2) Setting TSC frequency depending on the host will make
> migratability unpredictable for management software: the same
> VM config could be migratable to host A when started on host
> B, but not migratable to host A when started on host C.
Well, just check the frequency.
What do you mean by "just check the frequency"? What exactly
should management software do?
Whatever management software do, if you just use the source host
frequency you can get into a situation where you can start a VM
on host A and migrate it to B, but can't start the VM on host B
and migrate it to A. This would be confusing for users, and
probably break assumptions of existing management software.
> I suggest we allow migration with invtsc if and only if
> tsc-frequency is set explicitly by management software. In other
> words, apply only patches 1/4 and 2/4 from this series. After
> that, we will need libvirt support for configuring tsc-frequency.
I don't like that for the following reasons:
* It moves low level complexity from QEMU/KVM to libvirt
(libvirt has to call KVM_GET_TSC_KHZ from a vcpu thread).
It doesn't. It could ask QEMU for that information (the new
query-cpu-model-expansion QMP command would allow that).
Or, alternatively, it could just let the user choose the
frequency. It's probably acceptable for many use cases where
invtsc+migration is important.
* It makes it difficult to run QEMU manually (i use QEMU
manually all the time).
I believe the set of people that: 1) need invtsc; 2) need
migration to work; and 3) run QEMU manually will be able to add
"tsc-frequency=XXX" to the command-line easily. :)
* It requires patching libvirt.
In my experience things work better when the functionality is
not split across libvirt/qemu.
In my experience things break when management software is the
only component able to make a decision but we don't provide
mechanisms to let management software make that decision.
The TSC frequency affects migratability to hosts. Choose the
wrong frequency, and you might not be able to migrate. Only
management software knows to which hosts the VM could be migrated
in the future, and which TSC frequency would allow that.
Can't this be fixed in QEMU? Just check that destination host supports
TSC scaling before migration (or that KVM_GET_TSC_KHZ return value
matches on source and destination).
This solves only one use case: where you want to expose the
starting host TSC frequency to the guest. It doesn't cover any
scenario where the TSC frequency of the starting host isn't the
best one. (See the two scenarios I described above)
You seem to have a specific use case in mind. If you describe it,
we could decide if it's worth the extra migration code complexity
to deal with invtsc migration without explicit tsc-frequency. By
now, I am not convinced yet.
Maybe your use case just needs a explicit "invtsc-migration=on"
command-line flag without a mechanism to abort migration on
mismatch? I can't tell.
Note that even if we follow your suggestion and implement an
alternative version of patch 4/4 to cover your use case, I will
strongly recommend libvirt developers to support configuring TSC
frequency if they want to support invtsc + migration without
surprising/unpredictable restrictions on migratability.
--
Eduardo