Re: [libvirt] RFC: virInterface change transaction API

Monday, 18 April 2011

On Fri, Apr 08, 2011 at 03:31:05PM -0400, Laine Stump wrote:
...
 I see 3 layers to this:

 1) libvirt

    At the libvirt layer, this feature just requires 3 new APIs, which
    are directly passed through to netcf:

        virInterfaceChangeStart(virConnectPtr conn, unsigned int flags);
        virInterfaceChangeCommit(virConnectPtr conn, unsigned int flags);
        virInterfaceChangeRollback(virConnectPtr conn, unsigned int flags);

    For the initial implementation, these will be simple passthroughs
    to similarly named netcf functions. (in the future, it would be
    useful for the server side of libvirt to determine if client<->server
    connectivity was lost due to the network changes, and automatically
    tell netcf to do a rollback). 
It helps to outline the usage scenarios for these APIs. The first
scenario is where the app is running locally, and thus able to
use the rollback API:

 a) Local app

   virInterfaceChangeStart

   n * virInterface{Define,Undefine,Start,Destroy}

   if (...app determined check to see if everything works..)
      virInterfaceChangeCommit
   else
      virInterfaceChangeRollback

 b) Remote app with rollback-on-reboot

   virInterfaceChangeStart

   n * virInterface{Define,Undefine,Start,Destroy}

   if (...app determined check to see if everything works..)
      virInterfaceChangeCommit
   else
      invoke remote power switch/fence agent to reboot

   ...init process does equivalent of virInterfaceChangeRollback

  c) Remote app with libvirt based auto-rollback

   virInterfaceChangeStart
   virInterfaceAutoRollbackTest("somehostname", someportnumber, somemillisecs)

   n * virInterface{Define,Undefine,Start,Destroy}

   virInterfaceChangeCommit

   (if they messed up, virInterfaceChangeCommit would have
    failed due to network failure, and after somemillisecs
    auto rollback test will trigger virInterfaceChangeRollback

The virInterfaceAutoRollbackTest API is something I assume we'd build
into libvirtd in the next stage of work after the basic APIs are done.

...
 2) netcf

    The netcf api will have these same three APIs, just named slightly
    differently:

         ncf_change_start(struct netcf *ncf, unsigned int flags);

            There are two possibilities for this. Either:

             A) call the initscript described below to save all config
                files that might possibly be changed (snapshot_config)

               or

             B) set a flag in *ncf indicating that all future calls
                to netcf that would end up modifying a particular
                config file should save off that file *if it hasn't
                already been saved*.

             (A) is simpler, but relies on the initscript having
             exact/complete matching knowledge of what files netcf may
             change. Should we worry about that and deal with the
             complexities of (B), or is (A) good enough for now? 
I don't think it matters a huge amount.

...
         ncf_change_rollback(struct netcf *ncf, unsigned int flags);

            Again, two possbilities:

            A)
               a) save the config of all current interfaces (in memory)
               b) call the initscript below to restore the config to its
                  original state.
               c) compare the new config to the old, and:
                  * bring down any interfaces that no longer exist
                    (PROBLEM: once an interface has no config files, you can
                     no longer operate on it with "ifdown")
                  * bounce any interfaces that have changed
                  * bring up any interfaces that have been re-added 
Yes, this doesn't really work, because you need to determine what
has changed *before* restoring the config, so that you can have a
chance of doing 'ifdown' while the config still actually exists.

...
            B)
                a) ifdown all interfaces
                b) call initscript to restore previous config
 (rollback_config)
                c) ifup all interfaces.

            (A) is much simpler, but may lead to unnecessary
            difficulties when we bounce interfaces that didn't really
            need it. So, the same question oas for ncf_change_start() -
            is the more exact operation worth the extra complexity? 
To me  A) seems more complex than B). The problem with B) is that
if you have a multi-homed network, you may be taking down interfaces
that where never changed.

...
         ncf_change_commit(struct netcf *ncf, unsigned int flags);

             The simplest function - this will just call the initscript
             to erase the backup (commit_config).

 3) initscript

    This initscript will at first live in (be installed by) netcf
    (called /etc/init.d/networking-config?), but hopefully it will
    eventually be accepted by the initscripts package (which includes
    the networking-related initscripts), as it is of general use. (Dan
    Kenigsberg already already took a stab at this script last year,
    but received no reply from the initscripts maintainers, implying
    they may not be too keen on the idea right now - it might take some
    convincing ;-)

 https://fedorahosted.org/pipermail/initscripts-devel/2010-February/000025...

    It will have three commands, one of which will be called
    automatically by "start" (the command called automatically at boot
    time):

    snapshot_config

      This will save a copy of (what the script believes are - is this
      problematic?) all network-config related files. It may or may not
      be called by netcf (see the notes in ncf_start_change() above.

      If this function finds that a snapshot has already been taken,
      it should fail.

    rollback_config (automatically called from "start" at boottime)

      This will move back (from the saved copies) all files that were
      changed/removed since snapshot, *and delete any files that have
      been added*.

      Note that this command doesn't need to worry about ifup/ifdown,
      because it will be called prior to any other networking startup
      (part of the reason that netcf will need to deal with that).

      I notice that Dan K's version saves the modified files to a
      "rollback-${date}" directory. Does this seem like a good idea?
      It's nice to not lose anything, but there is no provision for
      eliminating old versions, so it could grow without bound.

    commit_config

      This will just remove all the files in the save directory.

 So, the two problems I have right now:

 1) Do we accept the inexact method of just saving all files that match
    a list of patterns during *start(), then in *rollback() erasing all
    files matching that pattern and copying the old file back? Or do we
    need to keep track of what files have been changed/removed and added,
    and copy back / delete only those files during rollback?

    (A version control system would keep track of this rather nicely,
    but that's too complex for something that's intended to be a
    failsafe (and that we would also like to eventually be in the base
    OS install). Dan B. at one point suggested using patchfiles if I
    wanted the save info to keep exact track of which files would need
    to be replaced/deleted on rollback, but on further thought this
    turns out to not be workable, since we would need to run diff (to
    create the patchfile) after all changes had been made, and any
    outside changes to any of the files would leave the patchfile
    un-appliable, thus causing our "failsafe" to fail :-( ). Therefore,
    we will need to rely on the list of globs to tell us what files
    need to be deleted, or keep our own list in a separate file.)

 2) Is it going to be okay to ifdown all interfaces prior to the
    rollback, and ifup all interfaces afterwards? Or must we compare
    the new config to the original, and ifdown only those interfaces
    that had been previously added/changed, then ifup only those
    interfaces that had been previously removed/changed? 
As long as that is hidden as a private impl detail, it is not critical.
Long term I think you really only want to touch interfaces you actually
changed. eg, if you messed up eth0 config (used for libvirtd access) but
did not mess up eth1 (used for guest traffic), we don't really want to
screw up all guest networking to repair eth0.

(NB assuming eth1 and eth0 are separate LANs, so you couldn't just
 connect to libvirtd on eth1 and issue a virIntefaceChangeRollback
call)

...
 3) If anyone has ideas on making the initscript more palatable to
the
    initscripts people, please speak up! :-) (one comment from an
 initscripts
    person was that 1) for the general case it would be difficult to
 draw the
    line on what parts of network connectivity should be included in this
    rollback functionality, and 2) at some point this becomes a general
    system config problem, and would really be better addressed by a
    general system wide config management system. These are both
    concerns that need well qualified answers. (I tend to think that this
    is intended as a failsafe to prevent unreachable systems, so it should
    be as simple as possible, and thus shouldn't be burdened with the
    complexity of a full system config management system (which could
    also co-exist at a higher level), but better answers are welcome.) 
We have a well defined set of files which we need to cope with so I think
the scope is pretty clear. Trying to solve the entire OS system config
problem is just insanity. We want to be able to ensure connectivity in as
simple a manner as possible.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] RFC: virInterface change transaction API