[libvirt] [ocaml] reset and resync the libvirt-ocaml repository
by Pino Toscano
Hi,
for reasons mostly lost in the history, after the libvirt-ocaml
repository was converted to git, it was not used by its main author
(Rich Jones); the development continued on Rich's git, at
http://git.annexia.org/?p=ocaml-libvirt.git;a=summary
After a talk with Rich, we agreed that it was better to move the
development back to libvirt.org, just like all the other bindings.
There are two problems however:
1) the first 38 commits have an bad author/committer date, and this is
also the reason why the existing libvirt-ocaml is not mirrored on
github
2) the top 3 commits on libvirt-ocaml were not integrated back to
Rich's ocaml-libvirt, and maybe their content might not be totally
OK (I will let Rich comment more on this)
While rewriting history is bad,
- most probably there are not many users of libvirt-ocaml around,
- the repository itself is very small (< 500k),
- in general it will better to have a working repository
So what I'm proposing is to replace the libvirt-ocaml repository with
a fixed version of Rich's ocaml-libvirt, and directly on the git hosting
side (i.e. not using force-push on the current one). Rich has already
commit access for libvirt, so there are no problems to keep his
maintainer role on it. Once done, we can notify users in this list
about it.
What do you think? Is it an acceptable path forward?
--
Pino Toscano
6 years, 1 month
[libvirt] [RFC] virfile: fix cast-align error
by Marc Hartmayer
Use the correct type in order to fix the following error on s390x:
In function 'virFileIsSharedFSType':
../../src/util/virfile.c:3578:38: error: cast increases required alignment of target type [-Werror=cast-align]
virFileIsSharedFixFUSE(path, (long *) &sb.f_type);
Signed-off-by: Marc Hartmayer <mhartmay(a)linux.ibm.com>
---
src/util/virfile.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/util/virfile.c b/src/util/virfile.c
index 2a7e87102a25..832d832696d5 100644
--- a/src/util/virfile.c
+++ b/src/util/virfile.c
@@ -3466,7 +3466,7 @@ int virFilePrintf(FILE *fp, const char *msg, ...)
static int
virFileIsSharedFixFUSE(const char *path,
- long *f_type)
+ unsigned int *f_type)
{
char *dirpath = NULL;
const char **mounts = NULL;
@@ -3575,7 +3575,7 @@ virFileIsSharedFSType(const char *path,
if (sb.f_type == FUSE_SUPER_MAGIC) {
VIR_DEBUG("Found FUSE mount for path=%s. Trying to fix it", path);
- virFileIsSharedFixFUSE(path, (long *) &sb.f_type);
+ virFileIsSharedFixFUSE(path, &sb.f_type);
}
VIR_DEBUG("Check if path %s with FS magic %lld is shared",
--
2.17.0
6 years, 1 month
[libvirt] The results of lspci are inconsistent between vfio reset pci devices and reset devices by sysfs interafce
by Wuzongyong (Euler Dept)
Hi,
I start a virtual machine with commandline:
/usr/libexec/qemu-kvm --enable-kvm -smp 8 -m 8192 -device vfio-pci,host=0000:81:00.0
Then I pause the qemu process before executing the main_loop function by gdb.
At this moment, lspci shows the regions are disabled like below:
81:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)
Subsystem: NVIDIA Corporation Device 118f
Physical Slot: 0-6
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 35
NUMA node: 1
Region 0: Memory at c8000000 (32-bit, non-prefetchable) [disabled] [size=16M]
Region 1: Memory at 27800000000 (64-bit, prefetchable) [disabled] [size=16G]
Region 3: Memory at 27c00000000 (64-bit, prefetchable) [disabled] [size=32M]
But after the command:
echo 1 > /sys/bus/pci/devices/0000:81:00.0/reset
lspci shows the regions are *not* disabled:
81:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)
Subsystem: Huawei Technologies Co., Ltd. Device 2061
Physical Slot: 0-6
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 7
NUMA node: 1
Region 0: Memory at c8000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at 27800000000 (64-bit, prefetchable) [size=16G]
Region 3: Memory at 27c00000000 (64-bit, prefetchable) [size=32M]
AFAIK, qemu performs vfio_pci_reset like the below callstack:
Qemu:
vfio_pci_reset
ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)
Kernel:
vfio_pci_ioctl
pci_try_reset_function
__pci_reset_function_locked
pci_parent_bus_reset
pci_reset_bridge_secondary_bus
and write 1 to the reset interface of sysfs go through the path:
Kernel:
reset_store
pci_reset_function
__pci_reset_function_locked
pci_parent_bus_reset
pci_reset_bridge_secondary_bus
So seem that these two methods are same actually, I am confused why the results are inconsistent.
Thanks,
Zongyong Wu
6 years, 1 month
[libvirt] VM XML format for XEN
by debaprasad.guchait@wipro.com
Hi Team,
Please share the XML format for virtual HBA or how can I assign virtual HBA port to XEN vm.
Regards
Debaprasad
Sensitivity: Internal & Restricted
The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com
6 years, 1 month
[libvirt] [PATCH 0/2] Allow coldplug and coldunplug of hub device
by Han Han
Han Han (2):
qemu: Allow coldplugging of hub device
qemu: Allow coldunplugging of hub device
src/conf/domain_conf.c | 30 ++++++++++++++++++++++++++++++
src/conf/domain_conf.h | 3 +++
src/libvirt_private.syms | 1 +
src/qemu/qemu_driver.c | 16 ++++++++++++++--
4 files changed, 48 insertions(+), 2 deletions(-)
--
2.19.1
6 years, 1 month
[libvirt] Overview of libvirt incremental backup API, part 1 (full pull mode)
by Eric Blake
The following (long) email describes a portion of the work-flow of how
my proposed incremental backup APIs will work, along with the backend
QMP commands that each one executes. I will reply to this thread with
further examples (the first example is long enough to be its own email).
This is an update to a thread last posted here:
https://www.redhat.com/archives/libvir-list/2018-June/msg01066.html
I'm still pulling together pieces in the src/qemu directory of libvirt
while implementing my proposed API, but you can track the status of my
code (currently around 6000 lines of code and 1500 lines of
documentation added) at:
https://repo.or.cz/libvirt/ericb.git
The documentation below describes the end goal of my demo (which I will
be presenting at KVM Forum), even if the current git checkout of my work
in progress doesn't quite behave that way.
My hope is that the API itself is in a stable enough state to include in
the libvirt 4.9 release (end of this month - which really means upstream
commit prior to KVM Forum!) by demo-ing how it is used with qemu
experimental commands, even if the qemu driver portions of my series are
not yet ready to be committed because they are waiting for the qemu side
of incremental backups to stabilize. If we like the API and are willing
to commit to it, then downstream vendors can backport whatever fixes in
the qemu driver on top of the existing API without having to suffer from
rebase barriers preventing the addition of new API.
Performing a full backup can work on any disk format, but incremental
(all changes since the most recent checkpoint) and differential (all
changes since an arbitrary earlier checkpoint) backups require the use
of a persistent bitmap for tracking the changes between checkpoints, and
that in turn requires a disk with qcow2 format. The API can handle
multiple disks at the same point in time (so I'll demonstrate two at
once), and is designed to handle both push model (qemu writes to a
specific destination, and the format has to be one that qemu knows) and
pull model (qemu opens up an NBD server for all disks, then you connect
one or more read-only client per export on that server to read the
information of interest into a destination of your choosing).
This demo also shows how I consume the data over a pull model backup.
Remember, in the pull model, you don't have to use a qemu binary as the
NBD client (you merely need a client that can request base:allocation
and qemu:dirty-bitmap:name contexts) - it's just that it is easier to
demonstrate everything with the tools already at hand. Thus, I use
existing qemu-img 3.0 functionality to extract the dirty bitmap (the
qemu:dirty-bitmap:name context) in one process, and a second qemu-io
process (using base:allocation to optimize reads of holes) for
extracting the actual data; the demo shows both processes accessing the
read-only NBD server in parallel. While I use two processes, it is also
feasible to write a single client that can get at both contexts through
a single NBD connection (the qemu 3.0 server supports that, even if none
of the qemu 3.0 clients can request multiple contexts). Down the road,
we may further enhance tools shipped with qemu to be easier to use as
such a client, but that does not affect the actual backup API (which is
merely what it takes to get the NBD server up and running).
- Preliminary setup:
I'm using bash as my shell, and set
$ orig1=/path/to/disk1.img orig2=/path/to/disk2.img
$ dom=my_domain qemu_img=/path/to/qemu-img
$ virsh="/path/to/virsh -k 0"
to make later steps easier to type. While the steps below should work
with qemu 3.0, I found it easier to test with both self-built qemu
(modify the <emulator> line in my domain) and self-built libvirtd
(systemctl stop libvirtd, then run src/libvrtd, also my use of $virsh
with heartbeat disabled, so that I was able to attach gdb during
development without having to worry about the connection dying). Also,
you may need 'setenforce 0' when using self-built binaries, since
otherwise SELinux labeling gets weird (obviously, when the actual code
is ready to check into libvirt, it will work with SELinux enforcing and
with system-installed rather than self-installed binaries). I also used:
$ $virsh domblklist $dom
to verify that I have plugged in $orig1 and $orig2 as two of the disks
to $dom (I used:
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='none' error_policy='stop'
io='native'/>
<source file='/path/to/disk1.img'/>
<backingStore/>
<target dev='sdc' bus='scsi'/>
</disk>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='none' error_policy='stop'
io='native'/>
<source file='/path/to/disk2.img'/>
<backingStore/>
<target dev='sdd' bus='scsi'/>
</disk>
in my domain XML)
- First example: creating a full backup via pull model, initially with
no checkpoint created
$ cat > backup.xml <<EOF
<domainbackup mode='pull'>
<server transport='tcp' name='localhost' port='10809'/>
<disks>
<disk name='$orig1' type='file'>
<scratch file='$PWD/scratch1.img'/>
</disk>
<disk name='sdd' type='file'>
<scratch file='$PWD/scratch2.img'/>
</disk>
</disks>
</domainbackup>
EOF
$ $qemu_img create -f qcow2 -b $orig1 -F qcow2 scratch1.img
$ $qemu_img create -f qcow2 -b $orig2 -F qcow2 scratch2.img
Here, I'm explicitly requesting a pull backup (the API defaults to push
otherwise), as well as explicitly requesting the NBD server to be set up
(the XML should support both transport='tcp' and transport='unix'). Note
that the <server> is global; but the server will support multiple export
names at once, so that you can connect multiple clients to process those
exports in parallel. Ideally, if <server> is omitted, libvirt should
auto-generate an appropriate server name, and has a way for you to query
what it generated (right now, I don't have that working in libvirt, so
being explicit is necessary - but again, the goal now is to prove that
the API is reasonable for including it in libvirt 4.9; enhancements like
making <server> optional can come later even if they miss libvirt 4.9).
I'm also requesting that the backup operate on only two disks of the
domain, and pointing libvirt to the scratch storage it needs to use for
the duration of the backup (ideally, libvirt will generate an
appropriate scratch file name itself if omitted from the XML, and create
scratch files itself instead of me having to pre-create them). Note that
I can give either the path to my original disk ($orig1, $orig2) or the
target name in the domain XML (in my case sdc, sdd); libvirt will
normalize my input and always uses the target name when reposting the
XML in output.
$ $virsh backup-begin $dom backup.xml
Backup id 1 started
backup used description from 'backup.xml'
Kicks off the backup job. virsh called
virDomainBackupBegin(dom, "<domainbackup ...>", NULL, 0)
and in turn libvirt makes the all of following QMP calls (if any QMP
call fails, libvirt attempts to unroll things so that there is no
lasting change to the guest before actually reporting failure):
{"execute":"nbd-server-start",
"arguments":{"addr":{"type":"inet",
"data":{"host":"localhost", "port":"10809"}}}}
{"execute":"blockdev-add",
"arguments":{"driver":"qcow2", "node-name":"backup-sdc",
"file":{"driver":"file",
"filename":"$PWD/scratch1.img"},
"backing":"'$node1'"}}
{"execute":"blockdev-add",
"arguments":{"driver":"qcow2", "node-name":"backup-sdd",
"file":{"driver":"file",
"filename":"$PWD/scratch2.img"},
"backing":"'$node2'"}}
{"execute":"transaction",
"arguments":{"actions":[
{"type":"blockdev-backup", "data":{
"device":"$node1", "target":"backup-sdc", "sync":"none",
"job-id":"backup-sdc" }},
{"type":"blockdev-backup", "data":{
"device":"$node2", "target":"backup-sdd", "sync":"none",
"job-id":"backup-sdd" }}
]}}
{"execute":"nbd-server-add",
"arguments":{"device":"backup-sdc", "name":"sdc"}}
{"execute":"nbd-server-add",
"arguments":{"device":"backup-sdd", "name":"sdd"}}
libvirt populated $node1 and $node2 to be the node names actually
assigned by qemu; until Peter's work on libvirt using node names
everywhere actually lands, libvirt is scraping the auto-generated
#blockNNN name from query-block and friends (the same as it already does
in other situations like write threshold).
With this command complete, libvirt has now kicked off a pull backup
job, which includes single qemu NBD server, with two separate exports
named 'sdc' and 'sdd' that expose the state of the disks at the time of
the API call (any guest writes to $orig1 or $orig2 trigger copy-on-write
actions into scratch1.img and scratch2.img to preserve the fact that
reading from NBD sees unchanging contents).
We can double-check what libvirt is tracking for the running backup job,
including the fact that libvirt normalized the <disk> names to match the
domain XML target listings, and matching the names of the exports being
served over the NBD server:
$ $virsh backup-dumpxml $dom 1
<domainbackup type='pull' id='1'>
<server transport='tcp' name='localhost' port='10809'/>
<disks>
<disk name='sdc' type='file'>
<scratch file='/home/eblake/scratch1.img'/>
</disk>
<disk name='sdd' type='file'>
<scratch file='/home/eblake/scratch2.img'/>
</disk>
</disks>
</domainbackup>
where 1 on the command line would be replaced by whatever id was printed
by the earlier backup-begin command (yes, my demo can hard-code things
to 1, because the current qemu and initial libvirt implementations only
support one backup job at a time, although we have plans to allow
parallel jobs in the future).
This translated to the libvirt API call
virDomainBackupGetXMLDesc(dom, 1, 0)
and did not have to make any QMP calls into qemu.
Now that the backup job is running, we want to scrape the data off the
NBD server. The most naive way is:
$ $qemu_img convert -f raw -O $fmt nbd://localhost:10809/sdc full1.img
$ $qemu_img convert -f raw -O $fmt nbd://localhost:10809/sdd full2.img
where we hope that qemu-img convert is able to recognize the holes in
the source and only write into the backup copy where actual data lives.
You don't have to uses qemu-img; it's possible to use any NBD client,
such as the kernel NBD module:
$ modprobe nbd
$ qemu-nbd -c /dev/nbd0 -f raw nbd://localhost:10809/sdc
$ cp /dev/nbd0 full1.img
$ qemu-nbd -d /dev/nbd0
The above demonstrates the flexibility of the pull model (your backup
file can be ANY format you choose; here I did 'cp' to copy it to a raw
destination), but it was also a less efficient NBD client, since the
kernel module doesn't yet know about NBD_CMD_BLOCK_STATUS for learning
where the holes are, nor about NBD_OPT_STRUCTURED_REPLY for faster reads
of those holes.
Of course, we don't have to blindly read the entire image, but can
instead use two clients in parallel (per exported disk), one which is
using 'qemu-img map' to learn which parts of the export contain data,
then feeds it through a bash 'while read' loop to parse out which
offsets contain interesting data, and spawning a second client per
region to copy just that subset of the file. Here, I'll use 'qemu-io
-C' to perform copy-on-read - that requires that my output file be qcow2
rather than any other particular format, but I'm guaranteed that my
output backup file is only populated in the same places that $orig1 was
populated at the time the backup started.
$ $qemu_img create -f qcow2 full1.img $size_of_orig1
$ $qemu_img rebase -u -f qcow2 -F raw -b nbd://localhost:10809/sdc \
full1.img
$ while read line; do
[[ $line =~ .*start.:.([0-9]*).*length.:.([0-9]*).*data.:.true.* ]] ||
continue
start=${BASH_REMATCH[1]} len=${BASH_REMATCH[2]}
qemu-io -C -c "r $start $len" -f qcow2 full1.img
done < <($qemu_img map --output=json -f raw nbd://localhost:10809/sdc)
$ $qemu_img rebase -u -f qcow2 -b '' full1.img
and the nice thing about this loop is that once you've figured out how
to parse qemu-img map output as one client process, you can use any
other process (such as qemu-nbd -c, then dd if=/dev/nbd0 of=$dest bs=64k
skip=$((start/64/1024)) seek=$((start/64/1024)) count=$((len/64/1024))
conv=fdatasync) as the NBD client that reads the subset of data of
interest (and thus, while qemu-io had to write to full1.img as qcow2,
you can use an alternative client to write to raw or any other format of
your choosing).
Now that we've copied off the full backup image (or just a portion of it
- after all, this is a pull model where we are in charge of how much
data we want to read), it's time to tell libvirt that it can conclude
the backup job:
$ $virsh backup-end $dom 1
Backup id 1 completed
again, where the command line '1' came from the output to backup-begin
and could change to something else rather than being hard-coded in the
demo. This maps to the libvirt API call
virDomainBackupEnd(dom, 1, 0)
which in turn maps to the QMP commands:
{"execute":"nbd-server-remove",
"arguments":{"name":"sdc"}}
{"execute":"nbd-server-remove",
"arguments":{"name":"sdd"}}
{"execute":"nbd-server-stop"}
{"execute":"block-job-cancel",
"arguments":{"device":"sdc"}}
{"execute":"block-job-cancel",
"arguments":{"device":"sdd"}}
{"execute":"blockdev-del",
"arguments":{"node-name":"backup-sdc"}}
{"execute":"blockdev-del",
"arguments":{"node-name":"backup-sdd"}}
to clean up all the things added during backup-begin.
More to come in part 2.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
6 years, 1 month
[libvirt] [PATCH v2] virFileIsSharedFixFUSE: Copy mnt_dir when browsing mount table
by Han Han
Fix typos of function name in commit msg.
v1 version: https://www.redhat.com/archives/libvir-list/2018-October/msg00511.html
virFileIsSharedFixFUSE doesn't fix f_type when "fuse.glusterfs"
is not the last row of mount table. For example, it doesn't works on
the mount table like following:
10.XX.XX.XX:/gv0 /mnt fuse.glusterfs rw 0 0
root@XX.XX.XX:/tmp/mkdir /tmp/br0 fuse.sshfs rw 0 0
Copy mnt_dir of struct mntent in case its mnt_dir is changed by
getmntent_r in the loop later.
Signed-off-by: Han Han <hhan(a)redhat.com>
---
src/util/virfile.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/src/util/virfile.c b/src/util/virfile.c
index 2a7e87102a..c503462633 100644
--- a/src/util/virfile.c
+++ b/src/util/virfile.c
@@ -3469,7 +3469,7 @@ virFileIsSharedFixFUSE(const char *path,
long *f_type)
{
char *dirpath = NULL;
- const char **mounts = NULL;
+ char **mounts = NULL;
size_t nmounts = 0;
char *p;
FILE *f = NULL;
@@ -3491,8 +3491,12 @@ virFileIsSharedFixFUSE(const char *path,
if (STRNEQ("fuse.glusterfs", mb.mnt_type))
continue;
- if (VIR_APPEND_ELEMENT_COPY(mounts, nmounts, mb.mnt_dir) < 0)
+ char *mnt_dir;
+ if (VIR_STRDUP(mnt_dir, mb.mnt_dir) < 0 ||
+ VIR_APPEND_ELEMENT_COPY(mounts, nmounts, mnt_dir) < 0) {
+ VIR_FREE(mnt_dir);
goto cleanup;
+ }
}
/* Add NULL sentinel so that this is a virStringList */
@@ -3512,7 +3516,7 @@ virFileIsSharedFixFUSE(const char *path,
else
*p = '\0';
- if (virStringListHasString(mounts, dirpath)) {
+ if (virStringListHasString((const char **)mounts, dirpath)) {
VIR_DEBUG("Found gluster FUSE mountpoint=%s for path=%s. "
"Fixing shared FS type", dirpath, path);
*f_type = GFS2_MAGIC;
@@ -3523,7 +3527,7 @@ virFileIsSharedFixFUSE(const char *path,
ret = 0;
cleanup:
endmntent(f);
- VIR_FREE(mounts);
+ virStringListFree(mounts);
VIR_FREE(dirpath);
return ret;
}
--
2.19.1
6 years, 1 month
[libvirt] Regarding the installtion of libvirt on ubuntu
by Mousumi Paul
I have installed libvirt-bin on ubuntu using the following command:sudo apt-get install libvirt-bin
I checked the virsh version. It says 4.6.0.Now i want to recompile libvirt with esx, xen and hyperv driver.Therefore i went to /usr/src/libvirt-4.6.0 and trying to run the following command:./configure --prefix=/usr --localstatedir=/var --sysconfdir=/etc --with-esx=yes --with-xen=yes
but i am getting the following error:error: you must install the gnutls>=3.2.0 pkg-config module to compile libvirt.
So i have installed gnutls-binand it is already the latest version.
Why libvirt is not able to identify gnutls... what is the solution for this???
Thanks & Regards,Mousumi Paul---Senior R & D Engineer
Vigyanlabs Innovations Private Limited
6 years, 1 month
[libvirt] Regarding openwsman requirement in libvirt configuration
by Mousumi Paul
Trying to install libvirt 1.2.14 with the following command :./configure --prefix=/usr --localstatedir=/var --sysconfdir=/etc --with-esx=yes --with-xen=yes --with-hyperv=yes
The above generates the following error:
configure: error: openwsman is required for the Hyper-V driver
I have installed openwsman on ubuntu using the following command:sudo apt-get install openwsman
But still getting the same error. Please help me to compile libvirt with hyperv driver.
Thanks & Regards,Mousumi Paul---Senior R & D Engineer
Vigyanlabs Innovations Private Limited
6 years, 1 month
[libvirt] [PATCHv2 0/4] Introduce x86 RDT (CMT&MBM) host capability
by Wang Huaqiang
This series of patches introduced the x86 Cache Monitoring Technology
(CMT) to libvirt by interacting with kernel resource control (resctrl)
interface. CMT is one of the Intel(R) x86 CPU feature which belongs to
the Resource Director Technology (RDT). CMT reports the occupancy of the
last level cache, which is shared by all CPU cores.
In v1 series, we are introducing CMT for libvirt, including reporting
host capability and creating CMT groups. Introducing host capability
is pretty much a well self-contained step, we only cover this step in
this series. As an extension of v1, MBM capability is also introduced.
These patches will not cover the part of creating CMT groups, which
will be subsequent patches.
We have serval discussion about the enabling of CMT, please refer to
following links for the RFCs.
RFCv3
https://www.redhat.com/archives/libvir-list/2018-August/msg01213.html
RFCv2
https://www.redhat.com/archives/libvir-list/2018-July/msg00409.html
https://www.redhat.com/archives/libvir-list/2018-July/msg01241.html
RFCv1
https://www.redhat.com/archives/libvir-list/2018-June/msg00674.html
1. About reason why CMT is necessary for libvirt?
The perf events of 'CMT, MBML, MBMT' have been phased out since Linux
kernel commit c39a0e2c8850f08249383f2425dbd8dbe4baad69, in libvirt
the perf based cmt,mbm will not work with the latest linux kernel. These
patches add CMT feature to libvirt through kernel resctrlfs interface.
2. Interfaces for CMT from the high level.
CMT, CAT, MBM and MBA are orthogonal features, each could works
independently.
If 'CMT' is enabled in host, then a 'cache monitor' is introduced for
cache, which is role is monitoring the last level cache utilization
of target system process. Cache monitor capabilities is shown under
element <cache>.
'MBM', a monitor named memory bandwidth monitor is introduced, for
role of monitoring memory bandwidth utilization. The capability
information block is located under <memory bandwidth> element.
2.1 Query the host capability of CMT.
The element 'monitor' represents the host capabilities of CMT.
The explanations of involved attributes:
- 'maxMonitors': denotes the maximum monitoring groups could be
created, which is limited by the number of hardware 'RMID'.
- 'reuseThreshold': An adjustable value affects the final reuse of
resources used by monitor. After the action of removing a
monitor, the kernel may not release all hardware resources that
monitor used immediately if the cache occupancy value associated
with 'removed' monitor is above this threshold. Once the cache
occupancy is below this threshold, the underlying hardware
resource will be reclaimed and be put into the resource pool
for next reusing.
- 'llc_occupancy': a feature of CMT, reporting the last level cache
occupancy information.
- 'mbm_total_bytes': a feature of MBM, reporting total memory
bandwidth utilization, in bytes, including local memory and
remote memory for multi-node system.
- 'mbm_local_bytes': a feature of MBM, reporting only local memory
bandwidth utilization.
# virsh capabilities
...
<cache>
<bank id='0' level='3' type='both' size='15' unit='MiB' cpus='0-5'>
<control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/>
</bank>
<bank id='1' level='3' type='both' size='15' unit='MiB' cpus='6-11'>
<control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/>
</bank>
+ <monitor level='3' reuseThreshold='270336' maxMonitors='176'>
+ <feature name='llc_occupancy'/>
+ </monitor>
</cache>
<memory_bandwidth>
<node id='0' cpus='0-5'>
<control granularity='10' min ='10' maxAllocs='4'/>
</node>
<node id='1' cpus='6-11'>
<control granularity='10' min ='10' maxAllocs='4'/>
</node>
+ <monitor maxMonitors='176'>
+ <feature name='mbm_total_bytes'/>
+ <feature name='mbm_local_bytes'/>
+ </monitor>
</memory_bandwidth>
...
</host>
Changes since v1:
- Introduced MBM capability.
- Capability layout changed
* Moved <monitor> from cahe <bank> to <cache>
* Renamed <Threshold> to <reuseThreshold>
- Document for 'reuseThreshold' changed.
- Introduced API virResctrlInfoGetMonitorPrefix
- Added more tests, covering standalone CMT, fake new
feature.
- Creating CMT resource control group will be
subsequent job.
Wang Huaqiang (4):
util: Introduce monitor capability interface
conf: Refactor cache bank capability structure
conf: Refactor memory bandwidth capability structure
conf: Introduce RDT monitor host capability
docs/schemas/capability.rng | 37 +++-
src/conf/capabilities.c | 126 ++++++++---
src/conf/capabilities.h | 24 ++-
src/libvirt_private.syms | 2 +
src/util/virresctrl.c | 240 +++++++++++++++++++++
src/util/virresctrl.h | 62 ++++++
.../resctrl/info/L3_MON/max_threshold_occupancy | 1 +
.../resctrl/info/L3_MON/mon_features | 1 +
.../resctrl/info/L3_MON/num_rmids | 1 +
.../linux-resctrl-cmt/resctrl/manualres/cpus | 1 +
.../linux-resctrl-cmt/resctrl/manualres/schemata | 1 +
.../linux-resctrl-cmt/resctrl/manualres/tasks | 0
.../linux-resctrl-cmt/resctrl/schemata | 1 +
tests/vircaps2xmldata/linux-resctrl-cmt/system | 1 +
.../resctrl/info/L3/cbm_mask | 1 +
.../resctrl/info/L3/min_cbm_bits | 1 +
.../resctrl/info/L3/num_closids | 1 +
.../resctrl/info/L3_MON/max_threshold_occupancy | 1 +
.../resctrl/info/L3_MON/mon_features | 10 +
.../resctrl/info/L3_MON/num_rmids | 1 +
.../resctrl/info/MB/bandwidth_gran | 1 +
.../resctrl/info/MB/min_bandwidth | 1 +
.../resctrl/info/MB/num_closids | 1 +
.../resctrl/manualres/cpus | 1 +
.../resctrl/manualres/schemata | 1 +
.../resctrl/manualres/tasks | 0
.../linux-resctrl-fake-feature/resctrl/schemata | 1 +
.../linux-resctrl-fake-feature/system | 1 +
.../resctrl/info/L3_MON/max_threshold_occupancy | 1 +
.../linux-resctrl/resctrl/info/L3_MON/mon_features | 3 +
.../linux-resctrl/resctrl/info/L3_MON/num_rmids | 1 +
.../vircaps2xmldata/vircaps-x86_64-resctrl-cmt.xml | 53 +++++
.../vircaps-x86_64-resctrl-fake-feature.xml | 73 +++++++
tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml | 7 +
tests/vircaps2xmltest.c | 2 +
35 files changed, 624 insertions(+), 36 deletions(-)
create mode 100644 tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/info/L3_MON/max_threshold_occupancy
create mode 100644 tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/info/L3_MON/mon_features
create mode 100644 tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/info/L3_MON/num_rmids
create mode 100644 tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/manualres/cpus
create mode 100644 tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/manualres/schemata
create mode 100644 tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/manualres/tasks
create mode 100644 tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/schemata
create mode 120000 tests/vircaps2xmldata/linux-resctrl-cmt/system
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/L3/cbm_mask
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/L3/min_cbm_bits
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/L3/num_closids
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/L3_MON/max_threshold_occupancy
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/L3_MON/mon_features
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/L3_MON/num_rmids
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/MB/bandwidth_gran
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/MB/min_bandwidth
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/MB/num_closids
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/manualres/cpus
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/manualres/schemata
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/manualres/tasks
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/schemata
create mode 120000 tests/vircaps2xmldata/linux-resctrl-fake-feature/system
create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshold_occupancy
create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features
create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids
create mode 100644 tests/vircaps2xmldata/vircaps-x86_64-resctrl-cmt.xml
create mode 100644 tests/vircaps2xmldata/vircaps-x86_64-resctrl-fake-feature.xml
--
2.7.4
6 years, 1 month