-- 
Eli Qiao
Sent with Sparrow

On Tuesday, 7 February 2017 at 6:15 PM, Daniel P. Berrange wrote:

On Tue, Feb 07, 2017 at 02:43:04PM +0800, Eli Qiao wrote:
On Tuesday, 7 February 2017 at 12:17 AM, Daniel P. Berrange wrote:

On Mon, Feb 06, 2017 at 10:23:39AM +0800, Eli Qiao wrote:
virResCtrlSetCacheBanks: Set cache banks of a libvirt domain. It will
create new resource domain under `/sys/fs/resctrl` and fill the
schemata according the cache banks configration.

virResCtrlUpdate: Update the schemata after libvirt domain destroy.

Signed-off-by: Eli Qiao <liyong.qiao@intel.com (mailto:liyong.qiao@intel.com)>
---
src/libvirt_private.syms | 2 +
src/util/virresctrl.c | 644 ++++++++++++++++++++++++++++++++++++++++++++++-
src/util/virresctrl.h | 47 +++-
3 files changed, 691 insertions(+), 2 deletions(-)


diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h
index 3cc41da..11f43d8 100644
--- a/src/util/virresctrl.h
+++ b/src/util/virresctrl.h
@@ -26,6 +26,7 @@

# include "virutil.h"
# include "virbitmap.h"
+# include "domain_conf.h"

#define RESCTRL_DIR "/sys/fs/resctrl"
#define RESCTRL_INFO_DIR "/sys/fs/resctrl/info"
@@ -82,9 +83,53 @@ struct _virResCtrl {
virResCacheBankPtr cache_banks;
};

+/**
+ * a virResSchemata represents a schemata object under a resource control
+ * domain.
+ */
+typedef struct _virResSchemataItem virResSchemataItem;
+typedef virResSchemataItem *virResSchemataItemPtr;
+struct _virResSchemataItem {
+ unsigned int socket_no;
+ unsigned schemata;
+};
+
+typedef struct _virResSchemata virResSchemata;
+typedef virResSchemata *virResSchemataPtr;
+struct _virResSchemata {
+ unsigned int n_schemata_items;
+ virResSchemataItemPtr schemata_items;
+};
+
+/**
+ * a virResDomain represents a resource control domain. It's a double linked
+ * list.
+ */
+
+typedef struct _virResDomain virResDomain;
+typedef virResDomain *virResDomainPtr;
+
+struct _virResDomain {
+ char* name;
+ virResSchemataPtr schematas[RDT_NUM_RESOURCES];
+ char* tasks;
+ int n_sockets;
+ virResDomainPtr pre;
+ virResDomainPtr next;
+};
+
+/* All resource control domains on this host*/
+
+typedef struct _virResCtrlDomain virResCtrlDomain;
+typedef virResCtrlDomain *virResCtrlDomainPtr;
+struct _virResCtrlDomain {
+ unsigned int num_domains;
+ virResDomainPtr domains;
+};


I don't think any of these need to be in the header file - they're
all private structs used only by the .c file.
Yep.
The biggest issue I see here is a complete lack of any kind of
mutex locking. Libvirt is multi-threaded, so you must expect to
have virResCtrlSetCacheBanks() and virResCtrlUpdate called
concurrently and thus use mutexes to ensure safety.
okay.
With the data structure design of using linked list of virResDomain
though, doing good locking is going to be hard. It'll force you to
have a single global mutex across the whole set of data structures
which will really harm concurrency for VM startup.

Really each virResDomain needs to be completely standalone, with
its own dedicate mutex. A global mutex for virResCtrlDomain can
be acquired whle you lookup the virResDomain object to use. Thereafter
the global mutex should be released and only the mutex for the specific
domain used.

oh, yes, but lock is really a hard thing, I need to be careful to avoid dead lock.

bool virResCtrlAvailable(void);
int virResCtrlInit(void);
virResCtrlPtr virResCtrlGet(int);
-
+int virResCtrlSetCacheBanks(virDomainCachetunePtr, unsigned char*, pid_t);
+int virResCtrlUpdate(void);


This API design doesn't really make locking very easy - since you are not
passing any info into the virResCtrlUpdate() method you've created the
need to do global rescans when updating.


yes, what if I use a global mutex variable in virresctrl.c?

I'd like to see finer grained locking if possible. We try really hard to
make guest startup be highly parallizeable, so any global locks that will
be required by all VMs starting hurts that.

hi Daniel, thanks for your reply,

hmm.. I know what’s your mean, but just have no idea how to add a finer mutex/locking

when you boot a VM and require cache allocation, we need to get the cache information on host, which is a global value,
every VM should share it and modify it after the allocation, then flush changes to /sys/fs/resctrl. which is to say there should be one  operation on cache allocation at one time.

the process may be like:

1) start a VM1 
2) grain locking/mutex, get cache left information on host            ———  1) start VM2
3) calculate the schemata of this VM and flush it to /sys/fs/resctrl ——— 2) wait for locking
4) update global cache left information                                                       3) wait for locking
5) release lock/mutex,                                                                               4)  grain locking/mutex, get cache left information on host
6) VM1  started                                                                                          5) calculate the schemata of this VM and flush it to /sys/fs/resctrl
                                                                                                                   6) update ..
                                                                                                                   7) release locking..

VM2 should wait after VM1 flush schemata to /sys/fs/resctrl, or it will cause racing...

 

IMHO virResCtrlSetCacheBanks needs to return an object that represents the
config for that particular VM. This can be passed back in, when the guest
is shutdown to release resources once again, avoiding any global scans.

Hmm.. what object should virResCtrlSetCacheBanks return? schemata setting?

Well it might not need to return an object neccessarily. Perhaps you can
just pass the VM PID into the method when your shutdown instead, and have
a hash table keyed off PID to lookup the data structure needed to cleanup.

not only cleanup the struct, but still need to calculate the default schemata of host (release resources to host)
 

I expect that when the VM reboot, we recalculate from cachetune(in xml
setting) again, that because we are not sure if the host can offer the
VM for enough cache when it restart again.

You shouldn't need to care about reboot as a concept in these APIs. From
the QEMU driver POV, a cold reboot will just be done as a stop followed
by start. So these low level cache APIs just need to cope with start+stop.

Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|