On 10/14/24 21:41, Jim Fehlig via Devel wrote:
On 10/14/24 11:42, Daniel P. Berrangé wrote:
> On Mon, Oct 14, 2024 at 06:00:53PM +0200, Claudio Fontana wrote:
>> On 10/10/24 15:29, Daniel P. Berrangé wrote:
>>> On Thu, Aug 08, 2024 at 05:38:10PM -0600, Jim Fehlig via Devel wrote:
>>>> From: Claudio Fontana <cfontana(a)suse.de>
>>>>
>>>> Add a new VIR_DOMAIN_SAVE_PARALLEL flag to the save and restore
>>>> APIs, which can be used to specify the use of multiple, parallel
>>>> channels for saving a domain. The number of parallel channels
>>>> can be set using the VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS
>>>> typed parameter.
>>>>
>>>> Signed-off-by: Claudio Fontana <cfontana(a)suse.de>
>>>> Signed-off-by: Jim Fehlig <jfehlig(a)suse.com>
>>>> ---
>>>> include/libvirt/libvirt-domain.h | 13 +++++++++++++
>>>> src/libvirt-domain.c | 6 ++++++
>>>> 2 files changed, 19 insertions(+)
>>>>
>>>> diff --git a/include/libvirt/libvirt-domain.h
b/include/libvirt/libvirt-domain.h
>>>> index 4266237abe..f4bf9c3cdb 100644
>>>> --- a/include/libvirt/libvirt-domain.h
>>>> +++ b/include/libvirt/libvirt-domain.h
>>>> @@ -1596,6 +1596,7 @@ typedef enum {
>>>> VIR_DOMAIN_SAVE_RUNNING = 1 << 1, /* Favor running over
paused (Since: 0.9.5) */
>>>> VIR_DOMAIN_SAVE_PAUSED = 1 << 2, /* Favor paused over
running (Since: 0.9.5) */
>>>> VIR_DOMAIN_SAVE_RESET_NVRAM = 1 << 3, /* Re-initialize NVRAM
from template (Since: 8.1.0) */
>>>> + VIR_DOMAIN_SAVE_PARALLEL = 1 << 4, /* Save and restore
using parallel channels (Since: 10.6.0) */
>>>> } virDomainSaveRestoreFlags;
>>>>
>>>> int virDomainSave (virDomainPtr domain,
>>>> @@ -1641,6 +1642,18 @@ int virDomainRestoreParams
(virConnectPtr conn,
>>>> */
>>>> # define VIR_DOMAIN_SAVE_PARAM_DXML "dxml"
>>>>
>>>> +/**
>>>> + * VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS:
>>>> + *
>>>> + * this optional parameter mirrors the migration parameter
>>>> + * VIR_MIGRATE_PARAM_PARALLEL_CONNECTIONS.
>>>> + *
>>>> + * It specifies the number of extra connections to use when transfering
state.
>>>> + *
>>>> + * Since: 10.6.0
>>>> + */
>>>> +# define VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS
"parallel.connections"
>>>
>>> "CONNECTIONS" is perhaps the wrong name, as that's kinda an
impl
>>
>> I would suggest CONNECTIONS it is the correct name.
>>
>>> detail. How about VIR_MIGRATE_PARAM_PARALLEL_CPUS to reflect
>>> the number of CPUs we're going to burn time on.
>>
>> This is another aspect of it, but I would suggest that _this_ is much more an
implementation detail (inside qemu) on how the parallel connection is actually managed.
>> Maybe in the future there will be a pool of cpus serving this, or another
optional way to serve the connections.
>>
>> Considering the fact that libvirt _already has_ for regular migrations
>>
>> {.typedParam = VIR_MIGRATE_PARAM_PARALLEL_CONNECTIONS,
>> .param = QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS,
>> .party = QEMU_MIGRATION_SOURCE | QEMU_MIGRATION_DESTINATION},
>>
>> which controls exactly the same QEMU parameter (ie the multifd-channels),
>> using anything else than --parallel-connections is in my opinion very surprising
for the users (which would know what to expect from the same command line option),
>> and honestly simply absurd.
>
> This is comparing two different scenarios which work in varying ways.
>
> In the case of live migration there are network connections that
this is a fair point
> we're controlling. So calling the parameter "parallel
connections"
> is contextually relevant, though we could equally have called
> it "parallel vcpus" - depends whether you're thinking about multifd
> as being there to max out TCP connections, or to max our local CPU
> usage.
>
> In the case of migrating to a file though, there is no concept
> of network connections at all. QEMU is opening a file and directly
> writing to that file from multiple threads. Calling the parameter
> 'parallel connections' makes no sense in this context, as there
> are no connections in use. "parallel CPUs" reflects that we're
> consuming multiple CPUs to write out data, though we could also
> call it "parallel threads" for similar reasons.
here I disagree, because we don't know what QEMU will do to serve those multifd
channels (threads, coroutines, or whatever construct might be implemented in the future to
improve efficiency).
I can change this to PARALLEL_CPUS if really desired, but there's something
about it that doesn't feel right. It's hard to put to words, probably because
the reasons are mostly subjective. Claudio already raised the objective ones.
I do prefer PARALLEL_THREADS over PARALLEL_CPUS. How about reusing the name
already selected by QEMU and going with PARALLEL_CHANNELS?
Good idea, PARALLEL_CHANNELS seems to be the best to me.
Regards,
Jim
Claudio