On Tue, Nov 15, 2016 at 3:07 PM, Peter Krempa <pkrempa(a)redhat.com> wrote:
On Mon, Nov 14, 2016 at 20:04:16 +0530, Prasanna Kumar Kalever
wrote:
> This patch offer,
> 1. Optimizing the calls to glfs_init() and friends
> 2. Temporarily reduce the memory leak appear in libvirt process account,
> even if the root cause for the leaks are glfs_fini() (7 - 10MB per object)
That is not a valid use case as I've said a few times.
> [Hopefully gluster should address this in its future releases, not very near]
>
> Currently, a start of a VM will call 2 glfs_new/glfs_init (which will create
> glfs object, once for stat, read headers and next to chown) and then will fork
> qemu process which will call once again (for actual read write IO).
>
> Not that all, in case if we are have 4 extra attached disks, then the total
> calls to glfs_init() and friends will be (4+1)*2 in libvirt and (4+1)*1 in
> qemu space i.e 15 calls. Since we don't have control over qemu process as that
> executes in a different process environment, lets do not bother much about it.
>
> This patch shrinks these 10 calls (i.e objects from above example) to just
> one, by maintaining a cache of glfs objects.
>
> The glfs object is shared across other only if volume name and all the
> volfile servers match. In case of hit glfs object takes a ref and
> only on close unref happens.
>
> Thanks to 'Peter Krempa' for the discussion.
>
> Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever(a)redhat.com>
> ---
>
> WORK IN PROGRESS: (WIP)
> ----------------
> While initially caching the glfs object, i.e. in
> virStorageBackendGlusterSetPreopened() I have took a ref=2, so on the following
> virStorageBackendGlusterClosePreopened() --ref will make ref=1, which will
> help not cleaning up the object which has to be shared with next coming disks
> (if conditions meat).
>
> Given some context, the idea is that on time-out (or after all disks are
> initiallized) some one should call virStorageBackendGlusterClosePreopened()
> which will ideally make ref=0, hence cached object will be cleaned/deleted
> calling glfs_fini()
Apologies for delay w.r.t this patch, was on PTO last week.
Second option is to add storage driver APIs that will allow to register
and unregister the storage driver volumes when the VM driver will need
to access them.
This is basically equivalent to calling virStorageFileInit(As) right when
starting the VM and avoiding closing it until cleaning up from the
start.
This will immediately remove the two openings of the gluster volume when
chmoding and checking that the volume exists.
Additionally it gives you an convenient point to do reference counting
on similar connections to the gluster node.
Wonderful. Was not able to figure out a point for ref counting. This
approch unblocks me :-)
Thankfully gluster's security handling is so terrible that we don't
really need to pass auth objects to it, since it would complicate things
further.
>
> I had a thought of doing the time-out cleanup call in
> virSecurityManagerSetAllLabel() or similar, but that looks too odd for me?
No, your endeavour needs to be split into two separate things:
1) Unifying of calls to virStorageFileInit and virStorageFileDeinit so
that it's called just once during VM start
2) adding connection caching to the gluster backend driver so that
multiple similar connections don't need to open the connection.
Yeah, makes perfect sense.
Adding timed closing of the cached connections would be complicated and
prone to breaking.
>
> Some help ?
> Thanks in advance.
>
> ---
> src/storage/storage_backend_gluster.c | 136 ++++++++++++++++++++++++++++++++--
> src/storage/storage_backend_gluster.h | 33 ++++++++-
> 2 files changed, 160 insertions(+), 9 deletions(-)
>
> diff --git a/src/storage/storage_backend_gluster.c
b/src/storage/storage_backend_gluster.c
> index 8e86704..4f53ebc 100644
> --- a/src/storage/storage_backend_gluster.c
> +++ b/src/storage/storage_backend_gluster.c
> @@ -47,19 +47,132 @@ struct _virStorageBackendGlusterState {
> char *dir; /* dir from URI, or "/"; always starts and ends in
'/' */
> };
>
> +virGlusterDefPtr ConnCache = {0,};
variables in libvirt don't start with a capital letter. Also the type
name is not accurately chosen.
Agree.
> +
> typedef struct _virStorageBackendGlusterState virStorageBackendGlusterState;
> typedef virStorageBackendGlusterState *virStorageBackendGlusterStatePtr;
>
> +void
> +virStorageBackendGlusterSetPreopened(virStorageSourcePtr src, glfs_t *fs)
> +{
> + size_t i;
> + virStorageBackendGlusterStatePtrPreopened entry = NULL;
> +
> + if (ConnCache == NULL && (VIR_ALLOC(ConnCache) < 0))
This needs to be serialized, otherwise it can create races between
multiple threads.
Please use the virOnce infrastructure for this.
Okay!
> + return;
> +
> + for (i = 0; i < ConnCache->nConn; i++) {
> + if (STREQ(ConnCache->Conn[i]->volname, src->volume))
As I've pointed out, just matching the volume name is not enough. All
hosts and ports need to be checked as well, otherwise you can't be sure
that the volume is the same.
I've reiterated this point multiple times during our chat.
I remember this, sorry for not mentioning this in WIP list, will
address this in next spin.
Are there any util functions to compare IP vs FQDN Hostname ?
Else I need to write one.
> + return;
> + }
> +
> + if (VIR_ALLOC(entry) < 0)
> + goto L1;
> +
> + if (VIR_STRDUP(entry->volname, src->volume) < 0)
> + goto L1;
> +
> + entry->nhosts = src->nhosts;
> + for (i = 0; i < src->nhosts; i++) {
> + if (VIR_ALLOC_N(entry->hosts[i], strlen(src->hosts[i].name) + 1) <
0)
> + goto L2;
> + strcpy(entry->hosts[i], src->hosts[i].name);
VIR_STRDUP instead of all the above.
Uhhhh...
My bad.
> + }
> +
> + entry->fs = fs;
> + entry->ref = 2; /* persist glfs obj per volume until a final timeout
> + virStorageBackendGlusterClosePreopened() is called */
> +
> + if (VIR_INSERT_ELEMENT(ConnCache->Conn, -1, ConnCache->nConn, entry) <
0)
> + goto L2;
> +
> + return;
> +
> +L2:
> + for (i = 0; i < entry->nhosts; i++)
> + VIR_FREE(entry->hosts[i]);
> +L1:
These won't pass syntax check. Please follow the libvirt coding rules
and choose sensible names for the labels.
sure, Looks like I have edited some portions post to make syntax-check
execution.
sorry, will try not to repeat this.
> + if(ConnCache->nConn == 0)
> + VIR_FREE(ConnCache);
> + VIR_FREE(entry->volname);
> + VIR_FREE(entry);
> +}
> +
> +glfs_t *
> +virStorageBackendGlusterFindPreopened(virStorageSourcePtr src)
> +{
> + size_t i, j, k, ret = 0;
> + size_t min, max;
> +
> + if (ConnCache == NULL)
> + return NULL;
> +
> + virStorageBackendGlusterStatePtrPreopened entry;
> +
> + for (i = 0; i < ConnCache->nConn; i++) {
> + entry = ConnCache->Conn[i];
> + if (STREQ(entry->volname, src->volume)) {
> + min = entry->nhosts < src->nhosts ? entry->nhosts :
src->nhosts;
> + max = entry->nhosts >= src->nhosts ? entry->nhosts :
src->nhosts;
If the number of hosts does not match, there's no point in checking that
they are equal ... since they certainly are not.
Yeah, I was also thinking if I can convince you with the subset servers match ?
Example:
Disk1:
IPa, IPc, IPe, IPg
Disk2:
IPa, IPe
Since all addresses in Disk2 matches with Disk1, therefore ... ?
> + for (j = 0; j< min; j++) {
> + if (entry->nhosts == min) {
> + for (k = 0; k < max; k++) {
> + if (STREQ(entry->hosts[j], src->hosts[k].name)) {
You are not checking port numbers.
Should also take care about transport type as well.
> + ret = 1;
> + break;
> + }
> + }
> + if (!ret)
> + return NULL;
> + } else {
> + for (k = 0; k < max; k++) {
> + if (STREQ(src->hosts[j].name, entry->hosts[k])) {
> + ret = 1;
> + break;
> + }
> + }
> + if (!ret)
> + return NULL;
> + }
> + }
> + entry->ref++;
> + return entry->fs;
> + }
> + }
> + return NULL;
> +}
> +
> +int
> +virStorageBackendGlusterClosePreopened(glfs_t *fs)
> +{
> + size_t i;
> + int ret = 0;
> +
> + if (fs == NULL)
> + return ret;
> +
> + for (i = 0; i < ConnCache->nConn; i++) {
> + if (ConnCache->Conn[i]->fs == fs) {
> + if (--ConnCache->Conn[i]->ref)
> + return ret;
Please use libvirt's reference counted objects instead of hand-rolled
reference counting stuff.
> +
> + ret = glfs_fini(ConnCache->Conn[i]->fs);
> + VIR_FREE(ConnCache->Conn[i]->volname);
> + VIR_FREE(ConnCache->Conn[i]);
> +
> + VIR_DELETE_ELEMENT(ConnCache->Conn, i, ConnCache->nConn);
This needs locked access to the cache. This would introduce a rather
nasty race possibility.
Yeah, something like virMutexLock(connCache->lock);
> + }
> + }
> + return ret;
> +}
> +
> static void
> virStorageBackendGlusterClose(virStorageBackendGlusterStatePtr state)
> {
> if (!state)
> return;
>
> - /* Yuck - glusterfs-api-3.4.1 appears to always return -1 for
> - * glfs_fini, with errno containing random data, so there's no way
> - * to tell if it succeeded. 3.4.2 is supposed to fix this.*/
> - if (state->vol && glfs_fini(state->vol) < 0)
> + if (state->vol &&
virStorageBackendGlusterClosePreopened(state->vol) < 0)
> VIR_DEBUG("shutdown of gluster volume %s failed with errno %d",
> state->volname, errno);
>
> @@ -556,8 +669,7 @@ virStorageFileBackendGlusterDeinit(virStorageSourcePtr src)
> src, src->hosts->name, src->hosts->port ?
src->hosts->port : "0",
> src->volume, src->path);
>
> - if (priv->vol)
> - glfs_fini(priv->vol);
> + virStorageBackendGlusterClosePreopened(priv->vol);
> VIR_FREE(priv->canonpath);
>
> VIR_FREE(priv);
> @@ -630,11 +742,20 @@ virStorageFileBackendGlusterInit(virStorageSourcePtr src)
> src, priv, src->volume, src->path,
> (unsigned int)src->drv->uid, (unsigned
int)src->drv->gid);
>
> +
> + priv->vol = virStorageBackendGlusterFindPreopened(src);
> + if (priv->vol) {
> + src->drv->priv = priv;
> + return 0;
> + }
> +
> if (!(priv->vol = glfs_new(src->volume))) {
> virReportOOMError();
> goto error;
> }
>
> + virStorageBackendGlusterSetPreopened(src, priv->vol);
> +
> for (i = 0; i < src->nhosts; i++) {
> if (virStorageFileBackendGlusterInitServer(priv, src->hosts + i) < 0)
> goto error;
> @@ -652,8 +773,7 @@ virStorageFileBackendGlusterInit(virStorageSourcePtr src)
> return 0;
>
> error:
> - if (priv->vol)
> - glfs_fini(priv->vol);
> + virStorageBackendGlusterClosePreopened(priv->vol);
> VIR_FREE(priv);
>
> return -1;
> diff --git a/src/storage/storage_backend_gluster.h
b/src/storage/storage_backend_gluster.h
> index 6796016..a0326aa 100644
> --- a/src/storage/storage_backend_gluster.h
> +++ b/src/storage/storage_backend_gluster.h
> @@ -22,9 +22,40 @@
> #ifndef __VIR_STORAGE_BACKEND_GLUSTER_H__
> # define __VIR_STORAGE_BACKEND_GLUSTER_H__
>
> -# include "storage_backend.h"
> +#include "storage_backend.h"
> +#include <glusterfs/api/glfs.h>
NACK to this hunk. The gluster stuff needs to stay contained in the
backend driver and should not leak to other parts of libvirt.
Will remove.
>
> extern virStorageBackend virStorageBackendGluster;
> extern virStorageFileBackend virStorageFileBackendGluster;
>
> +
> +struct _virStorageBackendGlusterStatePreopened {
> + char *volname;
> + size_t nhosts;
> + char *hosts[1024]; /* FIXME: 1024 ? */
You need to allocate this dynamically.
That's the idea, since this was initial patch, just gotneglected.
> + glfs_t *fs;
This needs to be internal.
> + int ref;
> +};
> +
> +typedef struct _virStorageBackendGlusterStatePreopened
virStorageBackendGlusterStatePreopened;
> +typedef virStorageBackendGlusterStatePreopened
*virStorageBackendGlusterStatePtrPreopened;
> +
> +struct _virGlusterDef {
> + size_t nConn;
> + virStorageBackendGlusterStatePtrPreopened *Conn;
> +};
> +
> +typedef struct _virGlusterDef virGlusterDef;
> +typedef virGlusterDef *virGlusterDefPtr;
> +
> +extern virGlusterDefPtr ConnCache;
> +
> +void
> +virStorageBackendGlusterSetPreopened(virStorageSourcePtr src, glfs_t *fs);
> +
> +glfs_t*
> +virStorageBackendGlusterFindPreopened(virStorageSourcePtr src);
> +
> +int
> +virStorageBackendGlusterClosePreopened(glfs_t *fs);
> #endif /* __VIR_STORAGE_BACKEND_GLUSTER_H__ */
Again these can't be exported since they take gluster data types. Also
it looks that they are internal to the gluster impl so there's no need
to export them at all.
Make Sense.
Thanks a lot Peter!
Will post a modified version sometime tomorrow.
--
Prasanna
Peter