On Mon, Feb 05, 2018 at 01:39:56PM +0100, Michal Privoznik wrote:
> On 02/01/2018 08:51 AM, Nikolay Shirokovskiy wrote:
> >
> >
> > On 29.01.2018 09:09, Michal Privoznik wrote:
> >> On 12/20/2017 07:35 AM, Nikolay Shirokovskiy wrote:
> >>> Host tcp4/tcp6 ports is a global resource thus we need to make
> >>> port accounting also global or we have issues described in [1] when
> >>> port allocator ranges of different instances are overlapped (which
> >>> is by default for qemu for example).
> >>>
> >>> Let's have only one global port allocator object that take care
> >>> of the entire ports range (0 - 65535) and introduce port range object
> >>> for clients to specify desired auto allocation band.
> >>>
> >>> [1]
https://www.redhat.com/archives/libvir-list/2017-December/msg00600.html
> >>> ---
> >>> src/bhyve/bhyve_driver.c | 4 +-
> >>> src/bhyve/bhyve_utils.h | 2 +-
> >>> src/libvirt_private.syms | 3 +-
> >>> src/libxl/libxl_conf.c | 8 +--
> >>> src/libxl/libxl_conf.h | 8 +--
> >>> src/libxl/libxl_driver.c | 18 +++---
> >>> src/qemu/qemu_conf.h | 6 +-
> >>> src/qemu/qemu_driver.c | 30 +++++-----
> >>> src/util/virportallocator.c | 125
++++++++++++++++++++++++++++-------------
> >>> src/util/virportallocator.h | 20 ++++---
> >>> tests/bhyvexml2argvtest.c | 6 +-
> >>> tests/libxlxml2domconfigtest.c | 8 +--
> >>> tests/virportallocatortest.c | 48 ++++++++++------
> >>> 13 files changed, 175 insertions(+), 111 deletions(-)
> >>>
> >>
> >>> diff --git a/src/util/virportallocator.c b/src/util/virportallocator.c
> >>> index fcd4f74..cd64356 100644
> >>> --- a/src/util/virportallocator.c
> >>> +++ b/src/util/virportallocator.c
> >>> @@ -35,10 +35,14 @@
> >>>
> >>> #define VIR_FROM_THIS VIR_FROM_NONE
> >>>
> >>> +typedef struct _virPortAllocator virPortAllocator;
> >>> +typedef virPortAllocator *virPortAllocatorPtr;
> >>> struct _virPortAllocator {
> >>> virObjectLockable parent;
> >>> virBitmapPtr bitmap;
> >>> +};
> >>>
> >>> +struct _virPortRange {
> >>> char *name;
> >>>
> >>> unsigned short start;
> >>> @@ -48,6 +52,7 @@ struct _virPortAllocator {
> >>> };
> >>>
> >>> static virClassPtr virPortAllocatorClass;
> >>> +static virPortAllocatorPtr virPortAllocatorInstance;
> >>
> >> I wonder if this is the way to go. I mean, this virPortAllocatorInstance
> >> is going to be a global variable that will never be freed (even if we
> >> wanted to). I mean, if virPortRange had a pointer to virPortAllocator
> >
> > Not sure why we need to free it. It is like global variables for classes,
> > we don't need to free them yet. As to libxlxml2domconfigtest it can be
> > fixed just like virportallocatortest by releasing all acquired ports.
>
> Well, okay. Disregard my suggestion. However, what we still need to
> discuss is the driver separation work of Daniel's. Dan, how badly will
> this hit you if I merged these? In another thread I suggested to turn
> this into a separate deaemon (which might be overkill).
The caching of the used ports in the bitmap is just an optimization, to
avoid us having to retry the bind()+listen() on every port we've previously
got in use. If we split the daemon, if multiple daemons all need port
allocation tracking, they'll get separate virPortAllocator bitmap instances.
Since one daemon won't see what other daemon has in use, it will mean that
we must try to bind()+listen() on ports that the other daemon has in use.
Thereafter we'll have cached that usage the bitmap.
The main downside is that if one daemon releases a port, the other daemon
won't see that release. This is only a significant problem if the 2 (or
more) daemons are using the same port range. This would, however, be
exactly the same when we have a per-QEMU instance daemon. The proposed
change, however, does not make life worse than it already is in this
respect.
IOW, we'll probably have some trouble, but that's not a reason to reject
this proposal. It is just one of many things we'll need to figure out
wrt unique assignment.
Well, you get slightly worse odds of having the same kind of race if you
have multiple instances of the port allocation approach in multiple
processes.
Our problem is that when we bind()+listen() we still need to close that
port and have qemu open it again. This race window is still present but
will be worsened by multiple of these doing the same thing.
When qemu will be able to accept the socket via FD passing then this
would be strictly an optimization, but until then it worsens the odds of
failure.