Re: [libvirt] [Qemu-devel] [PATCH v4] XBZRLE delta for live migration of large memory apps
by Anthony Liguori
On 08/08/2011 03:42 AM, Shribman, Aidan wrote:
> Subject: [PATCH v4] XBZRLE delta for live migration of large memory apps
> From: Aidan Shribman<aidan.shribman(a)sap.com>
>
> By using XBZRLE (Xor Binary Zero Run-Length-Encoding) we can reduce VM downtime
> and total live-migration time of VMs running memory write intensive workloads
> typical of large enterprise applications such as SAP ERP Systems, and generally
> speaking for any application with a sparse memory update pattern.
>
> On the sender side XBZRLE is used as a compact delta encoding of page updates,
> retrieving the old page content from an LRU cache (default size of 64 MB). The
> receiving side uses the existing page content and XBZRLE to decode the new page
> content.
>
> Work was originally based on research results published VEE 2011: Evaluation of
> Delta Compression Techniques for Efficient Live Migration of Large Virtual
> Machines by Benoit, Svard, Tordsson and Elmroth. Additionally the delta encoder
> XBRLE was improved further using XBZRLE instead.
>
> XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it
> ideal for in-line, real-time encoding such as is needed for live-migration.
>
> A typical usage scenario:
> {qemu} migrate_set_cachesize 256m
> {qemu} migrate -x -d tcp:destination.host:4444
> {qemu} info migrate
> ...
> transferred ram-duplicate: A kbytes
> transferred ram-duplicate: B pages
> transferred ram-normal: C kbytes
> transferred ram-normal: D pages
> transferred ram-xbrle: E kbytes
> transferred ram-xbrle: F pages
> overflow ram-xbrle: G pages
> cache-hit ram-xbrle: H pages
> cache-lookup ram-xbrle: J pages
>
> Testing: live migration with XBZRLE completed in 110 seconds, without live
> migration was not able to complete.
>
> A simple synthetic memory r/w load generator:
> .. include<stdlib.h>
> .. include<stdio.h>
> .. int main()
> .. {
> .. char *buf = (char *) calloc(4096, 4096);
> .. while (1) {
> .. int i;
> .. for (i = 0; i< 4096 * 4; i++) {
> .. buf[i * 4096 / 4]++;
> .. }
> .. printf(".");
> .. }
> .. }
>
> Signed-off-by: Benoit Hudzia<benoit.hudzia(a)sap.com>
> Signed-off-by: Petter Svard<petters(a)cs.umu.se>
> Signed-off-by: Aidan Shribman<aidan.shribman(a)sap.com>
One thing that strikes me about this algorithm is that it's very good
for a particular type of workload--shockingly good really.
I think workload aware migration compression is possible for a lot of
different types of workloads. That makes me a bit wary of QEMU growing
quite a lot of compression mechanisms.
It makes me think that this logic may really belong at a higher level
where more information is known about the workload. For instance, I can
imagine XBZRLE living in something like libvirt.
Today, parsing migration traffic is pretty horrible but I think we're
pretty strongly committed to fixing that in 1.0. That makes me wonder
if it would be nicer architecturally for a higher level tool to own
something like this.
Originally, when I added migration, I had the view that we would have
transport plugins based on the exec: protocol. That hasn't really
happened since libvirt really owns migration but I think having XBZRLE
as a transport plugin for libvirt is something worth considering.
I'm curious what people think about this type of approach. CC'ing
libvirt to get their input.
Regards,
Anthony Liguori
>
> --
>
> Makefile.target | 1 +
> arch_init.c | 351 ++++++++++++++++++++++++++++++++++++++++++++++------
> block-migration.c | 3 +-
> hash.h | 72 +++++++++++
> hmp-commands.hx | 36 ++++--
> hw/hw.h | 3 +-
> lru.c | 142 +++++++++++++++++++++
> lru.h | 13 ++
> migration-exec.c | 6 +-
> migration-fd.c | 6 +-
> migration-tcp.c | 6 +-
> migration-unix.c | 6 +-
> migration.c | 119 +++++++++++++++++-
> migration.h | 25 +++-
> qmp-commands.hx | 43 ++++++-
> savevm.c | 13 ++-
> sysemu.h | 13 ++-
> xbzrle.c | 126 +++++++++++++++++++
> xbzrle.h | 12 ++
> 19 files changed, 917 insertions(+), 79 deletions(-)
>
> diff --git a/Makefile.target b/Makefile.target
> index 2800f47..b3215de 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -186,6 +186,7 @@ endif #CONFIG_BSD_USER
> ifdef CONFIG_SOFTMMU
>
> obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o
> +obj-y += lru.o xbzrle.o
> # virtio has to be here due to weird dependency between PCI and virtio-net.
> # need to fix this properly
> obj-y += virtio-blk.o virtio-balloon.o virtio-net.o virtio-serial-bus.o
> diff --git a/arch_init.c b/arch_init.c
> old mode 100644
> new mode 100755
> index 4486925..d67dc82
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -40,6 +40,17 @@
> #include "net.h"
> #include "gdbstub.h"
> #include "hw/smbios.h"
> +#include "lru.h"
> +#include "xbzrle.h"
> +
> +//#define DEBUG_ARCH_INIT
> +#ifdef DEBUG_ARCH_INIT
> +#define DPRINTF(fmt, ...) \
> + do { fprintf(stdout, "arch_init: " fmt, ## __VA_ARGS__); } while (0)
> +#else
> +#define DPRINTF(fmt, ...) \
> + do { } while (0)
> +#endif
>
> #ifdef TARGET_SPARC
> int graphic_width = 1024;
> @@ -88,6 +99,161 @@ const uint32_t arch_type = QEMU_ARCH;
> #define RAM_SAVE_FLAG_PAGE 0x08
> #define RAM_SAVE_FLAG_EOS 0x10
> #define RAM_SAVE_FLAG_CONTINUE 0x20
> +#define RAM_SAVE_FLAG_XBZRLE 0x40
> +
> +/***********************************************************/
> +/* RAM Migration State */
> +typedef struct ArchMigrationState {
> + int use_xbrle;
> + int64_t xbrle_cache_size;
> +} ArchMigrationState;
> +
> +static ArchMigrationState arch_mig_state;
> +
> +void arch_set_params(int blk_enable, int shared_base, int use_xbrle,
> + int64_t xbrle_cache_size, void *opaque)
> +{
> + arch_mig_state.use_xbrle = use_xbrle;
> + arch_mig_state.xbrle_cache_size = xbrle_cache_size;
> +}
> +
> +#define BE16_MAGIC 0x0123
> +
> +/***********************************************************/
> +/* XBZRLE (Xor Binary Zero Run-Length Encoding) */
> +typedef struct XBZRLEHeader {
> + uint32_t xh_cksum; /* not used */
> + uint16_t xh_magic;
> + uint16_t xh_len;
> + uint8_t xh_flags;
> +} XBZRLEHeader;
> +
> +static uint8_t dup_buf[TARGET_PAGE_SIZE];
> +
> +/***********************************************************/
> +/* accounting */
> +typedef struct AccountingInfo{
> + uint64_t dup_pages;
> + uint64_t norm_pages;
> + uint64_t xbrle_bytes;
> + uint64_t xbrle_pages;
> + uint64_t xbrle_overflow;
> + uint64_t xbrle_cache_lookup;
> + uint64_t xbrle_cache_hit;
> + uint64_t iterations;
> +} AccountingInfo;
> +
> +static AccountingInfo acct_info;
> +
> +static void acct_clear(void)
> +{
> + memset(&acct_info, 0, sizeof(acct_info));
> +}
> +
> +uint64_t dup_mig_bytes_transferred(void)
> +{
> + return acct_info.dup_pages;
> +}
> +
> +uint64_t dup_mig_pages_transferred(void)
> +{
> + return acct_info.dup_pages;
> +}
> +
> +uint64_t norm_mig_bytes_transferred(void)
> +{
> + return acct_info.norm_pages * TARGET_PAGE_SIZE;
> +}
> +
> +uint64_t norm_mig_pages_transferred(void)
> +{
> + return acct_info.norm_pages;
> +}
> +
> +uint64_t xbrle_mig_bytes_transferred(void)
> +{
> + return acct_info.xbrle_bytes;
> +}
> +
> +uint64_t xbrle_mig_pages_transferred(void)
> +{
> + return acct_info.xbrle_pages;
> +}
> +
> +uint64_t xbrle_mig_pages_overflow(void)
> +{
> + return acct_info.xbrle_overflow;
> +}
> +
> +uint64_t xbrle_mig_pages_cache_hit(void)
> +{
> + return acct_info.xbrle_cache_hit;
> +}
> +
> +uint64_t xbrle_mig_pages_cache_lookup(void)
> +{
> + return acct_info.xbrle_cache_lookup;
> +}
> +
> +static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
> + int cont, int flag)
> +{
> + qemu_put_be64(f, offset | cont | flag);
> + if (!cont) {
> + qemu_put_byte(f, strlen(block->idstr));
> + qemu_put_buffer(f, (uint8_t *)block->idstr,
> + strlen(block->idstr));
> + }
> +}
> +
> +#define ENCODING_FLAG_XBZRLE 0x1
> +
> +static int save_xbrle_page(QEMUFile *f, uint8_t *current_page,
> + ram_addr_t current_addr, RAMBlock *block, ram_addr_t offset, int cont)
> +{
> + int encoded_len = 0, bytes_sent = 0;
> + XBZRLEHeader hdr = {0, BE16_MAGIC};
> + uint8_t *encoded, *old_page;
> +
> + /* abort if page not cached */
> + acct_info.xbrle_cache_lookup++;
> + old_page = lru_lookup(current_addr);
> + if (!old_page) {
> + goto done;
> + }
> + acct_info.xbrle_cache_hit++;
> +
> + /* XBZRLE (XOR+ZRLE) encoding */
> + encoded = (uint8_t *) qemu_malloc(TARGET_PAGE_SIZE);
> + encoded_len = xbzrle_encode(encoded, old_page, current_page,
> + TARGET_PAGE_SIZE);
> +
> + if (encoded_len< 0) {
> + DPRINTF("XBZRLE encoding overflow - sending uncompressed\n");
> + acct_info.xbrle_overflow++;
> + goto done;
> + }
> +
> + hdr.xh_len = encoded_len;
> + hdr.xh_flags |= ENCODING_FLAG_XBZRLE;
> +
> + /* Send XBZRLE compressed page */
> + save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_XBZRLE);
> +
> + qemu_put_be32(f, hdr.xh_cksum);
> + qemu_put_buffer(f, (uint8_t *)&hdr.xh_magic, sizeof (hdr.xh_magic));
> + qemu_put_be16(f, hdr.xh_len);
> + qemu_put_byte(f, hdr.xh_flags);
> +
> + qemu_put_buffer(f, encoded, encoded_len);
> + acct_info.xbrle_pages++;
> + bytes_sent = encoded_len + sizeof(hdr);
> + acct_info.xbrle_bytes += bytes_sent;
> +
> +done:
> + qemu_free(encoded);
> + return bytes_sent;
> +}
>
> static int is_dup_page(uint8_t *page, uint8_t ch)
> {
> @@ -107,7 +273,7 @@ static int is_dup_page(uint8_t *page, uint8_t ch)
> static RAMBlock *last_block;
> static ram_addr_t last_offset;
>
> -static int ram_save_block(QEMUFile *f)
> +static int ram_save_block(QEMUFile *f, int stage)
> {
> RAMBlock *block = last_block;
> ram_addr_t offset = last_offset;
> @@ -120,6 +286,7 @@ static int ram_save_block(QEMUFile *f)
> current_addr = block->offset + offset;
>
> do {
> + lru_free_cb_t free_cb = qemu_free;
> if (cpu_physical_memory_get_dirty(current_addr, MIGRATION_DIRTY_FLAG)) {
> uint8_t *p;
> int cont = (block == last_block) ? RAM_SAVE_FLAG_CONTINUE : 0;
> @@ -128,28 +295,35 @@ static int ram_save_block(QEMUFile *f)
> current_addr + TARGET_PAGE_SIZE,
> MIGRATION_DIRTY_FLAG);
>
> - p = block->host + offset;
> + if (arch_mig_state.use_xbrle) {
> + p = qemu_malloc(TARGET_PAGE_SIZE);
> + memcpy(p, block->host + offset, TARGET_PAGE_SIZE);
> + } else {
> + p = block->host + offset;
> + }
>
> if (is_dup_page(p, *p)) {
> - qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_COMPRESS);
> - if (!cont) {
> - qemu_put_byte(f, strlen(block->idstr));
> - qemu_put_buffer(f, (uint8_t *)block->idstr,
> - strlen(block->idstr));
> - }
> + save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS);
> qemu_put_byte(f, *p);
> bytes_sent = 1;
> - } else {
> - qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_PAGE);
> - if (!cont) {
> - qemu_put_byte(f, strlen(block->idstr));
> - qemu_put_buffer(f, (uint8_t *)block->idstr,
> - strlen(block->idstr));
> + acct_info.dup_pages++;
> + if (arch_mig_state.use_xbrle&& !*p) {
> + p = dup_buf;
> + free_cb = NULL;
> }
> + } else if (stage == 2&& arch_mig_state.use_xbrle) {
> + bytes_sent = save_xbrle_page(f, p, current_addr, block,
> + offset, cont);
> + }
> + if (!bytes_sent) {
> + save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
> qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
> bytes_sent = TARGET_PAGE_SIZE;
> + acct_info.norm_pages++;
> + }
> + if (arch_mig_state.use_xbrle) {
> + lru_insert(current_addr, p, free_cb);
> }
> -
> break;
> }
>
> @@ -221,6 +395,9 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>
> if (stage< 0) {
> cpu_physical_memory_set_dirty_tracking(0);
> + if (arch_mig_state.use_xbrle) {
> + lru_fini();
> + }
> return 0;
> }
>
> @@ -235,6 +412,11 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
> last_block = NULL;
> last_offset = 0;
>
> + if (arch_mig_state.use_xbrle) {
> + lru_init(arch_mig_state.xbrle_cache_size/TARGET_PAGE_SIZE, 0);
> + acct_clear();
> + }
> +
> /* Make sure all dirty bits are set */
> QLIST_FOREACH(block,&ram_list.blocks, next) {
> for (addr = block->offset; addr< block->offset + block->length;
> @@ -264,8 +446,9 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
> while (!qemu_file_rate_limit(f)) {
> int bytes_sent;
>
> - bytes_sent = ram_save_block(f);
> + bytes_sent = ram_save_block(f, stage);
> bytes_transferred += bytes_sent;
> + acct_info.iterations++;
> if (bytes_sent == 0) { /* no more blocks */
> break;
> }
> @@ -285,19 +468,79 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
> int bytes_sent;
>
> /* flush all remaining blocks regardless of rate limiting */
> - while ((bytes_sent = ram_save_block(f)) != 0) {
> + while ((bytes_sent = ram_save_block(f, stage))) {
> bytes_transferred += bytes_sent;
> }
> cpu_physical_memory_set_dirty_tracking(0);
> + if (arch_mig_state.use_xbrle) {
> + lru_fini();
> + }
> }
>
> qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
>
> expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
>
> + DPRINTF("ram_save_live: expected(%ld)<= max(%ld)?\n", expected_time,
> + migrate_max_downtime());
> +
> return (stage == 2)&& (expected_time<= migrate_max_downtime());
> }
>
> +static int load_xbrle(QEMUFile *f, ram_addr_t addr, void *host)
> +{
> + int len, rc = -1;
> + uint8_t *encoded;
> + XBZRLEHeader hdr = {0};
> +
> + /* extract ZRLE header */
> + hdr.xh_cksum = qemu_get_be32(f);
> + qemu_get_buffer(f, (uint8_t *)&hdr.xh_magic, sizeof (hdr.xh_magic));
> + hdr.xh_len = qemu_get_be16(f);
> + hdr.xh_flags = qemu_get_byte(f);
> +
> + if (!(hdr.xh_flags& ENCODING_FLAG_XBZRLE)) {
> + fprintf(stderr, "Failed to load XZBRLE page - wrong compression!\n");
> + goto done;
> + }
> +
> + if (hdr.xh_len> TARGET_PAGE_SIZE) {
> + fprintf(stderr, "Failed to load XZBRLE page - len overflow!\n");
> + goto done;
> + }
> +
> + /* load data and decode */
> + encoded = (uint8_t *) qemu_malloc(hdr.xh_len);
> + qemu_get_buffer(f, encoded, hdr.xh_len);
> + /* covert endianess if magic indicated destination differs from source */
> + if (hdr.xh_magic != BE16_MAGIC) {
> + const uint64_t *end = (uint64_t *) encoded +
> + hdr.xh_len / sizeof (uint64_t);
> + uint64_t *p;
> + for (p = (uint64_t *) encoded; p< end; p++) {
> + bswap64s(p);
> + }
> + }
> +
> + /* decode ZRLE */
> + len = xbzrle_decode(host, host, encoded, hdr.xh_len);
> + if (len == -1) {
> + fprintf(stderr, "Failed to load XBZRLE page - decode error!\n");
> + goto done;
> + }
> +
> + if (len != TARGET_PAGE_SIZE) {
> + fprintf(stderr, "Failed to load XBZRLE page - size %d expected %d!\n",
> + len, TARGET_PAGE_SIZE);
> + goto done;
> + }
> +
> + rc = 0;
> +done:
> + qemu_free(encoded);
> + return rc;
> +}
> +
> static inline void *host_from_stream_offset(QEMUFile *f,
> ram_addr_t offset,
> int flags)
> @@ -328,16 +571,38 @@ static inline void *host_from_stream_offset(QEMUFile *f,
> return NULL;
> }
>
> +static inline void *host_from_stream_offset_versioned(int version_id,
> + QEMUFile *f, ram_addr_t offset, int flags)
> +{
> + void *host;
> + if (version_id == 3) {
> + host = qemu_get_ram_ptr(offset);
> + } else {
> + host = host_from_stream_offset(f, offset, flags);
> + }
> + if (!host) {
> + fprintf(stderr, "Failed to convert RAM address to host"
> + " for offset 0x%lX!\n", offset);
> + abort();
> + }
> + return host;
> +}
> +
> int ram_load(QEMUFile *f, void *opaque, int version_id)
> {
> ram_addr_t addr;
> - int flags;
> + int flags, ret = 0;
> + static uint64_t seq_iter;
> +
> + seq_iter++;
>
> if (version_id< 3 || version_id> 4) {
> - return -EINVAL;
> + ret = -EINVAL;
> + goto done;
> }
>
> do {
> + void *host;
> addr = qemu_get_be64(f);
>
> flags = addr& ~TARGET_PAGE_MASK;
> @@ -346,7 +611,8 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
> if (flags& RAM_SAVE_FLAG_MEM_SIZE) {
> if (version_id == 3) {
> if (addr != ram_bytes_total()) {
> - return -EINVAL;
> + ret = -EINVAL;
> + goto done;
> }
> } else {
> /* Synchronize RAM block list */
> @@ -365,8 +631,10 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
>
> QLIST_FOREACH(block,&ram_list.blocks, next) {
> if (!strncmp(id, block->idstr, sizeof(id))) {
> - if (block->length != length)
> - return -EINVAL;
> + if (block->length != length) {
> + ret = -EINVAL;
> + goto done;
> + }
> break;
> }
> }
> @@ -374,7 +642,8 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
> if (!block) {
> fprintf(stderr, "Unknown ramblock \"%s\", cannot "
> "accept migration\n", id);
> - return -EINVAL;
> + ret = -EINVAL;
> + goto done;
> }
>
> total_ram_bytes -= length;
> @@ -383,17 +652,10 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
> }
>
> if (flags& RAM_SAVE_FLAG_COMPRESS) {
> - void *host;
> uint8_t ch;
>
> - if (version_id == 3)
> - host = qemu_get_ram_ptr(addr);
> - else
> - host = host_from_stream_offset(f, addr, flags);
> - if (!host) {
> - return -EINVAL;
> - }
> -
> + host = host_from_stream_offset_versioned(version_id,
> + f, addr, flags);
> ch = qemu_get_byte(f);
> memset(host, ch, TARGET_PAGE_SIZE);
> #ifndef _WIN32
> @@ -403,21 +665,28 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
> }
> #endif
> } else if (flags& RAM_SAVE_FLAG_PAGE) {
> - void *host;
> -
> - if (version_id == 3)
> - host = qemu_get_ram_ptr(addr);
> - else
> - host = host_from_stream_offset(f, addr, flags);
> -
> + host = host_from_stream_offset_versioned(version_id,
> + f, addr, flags);
> qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
> + } else if (flags& RAM_SAVE_FLAG_XBZRLE) {
> + host = host_from_stream_offset_versioned(version_id,
> + f, addr, flags);
> + if (load_xbrle(f, addr, host)< 0) {
> + ret = -EINVAL;
> + goto done;
> + }
> }
> +
> if (qemu_file_has_error(f)) {
> - return -EIO;
> + ret = -EIO;
> + goto done;
> }
> } while (!(flags& RAM_SAVE_FLAG_EOS));
>
> - return 0;
> +done:
> + DPRINTF("Completed load of VM with exit code %d seq iteration %ld\n",
> + ret, seq_iter);
> + return ret;
> }
>
> void qemu_service_io(void)
> diff --git a/block-migration.c b/block-migration.c
> index 3e66f49..504df70 100644
> --- a/block-migration.c
> +++ b/block-migration.c
> @@ -689,7 +689,8 @@ static int block_load(QEMUFile *f, void *opaque, int version_id)
> return 0;
> }
>
> -static void block_set_params(int blk_enable, int shared_base, void *opaque)
> +static void block_set_params(int blk_enable, int shared_base,
> + int use_xbrle, int64_t xbrle_cache_size, void *opaque)
> {
> block_mig_state.blk_enable = blk_enable;
> block_mig_state.shared_base = shared_base;
> diff --git a/hash.h b/hash.h
> new file mode 100644
> index 0000000..7109905
> --- /dev/null
> +++ b/hash.h
> @@ -0,0 +1,72 @@
> +#ifndef _LINUX_HASH_H
> +#define _LINUX_HASH_H
> +/* Fast hashing routine for ints, longs and pointers.
> + (C) 2002 William Lee Irwin III, IBM */
> +
> +/*
> + * Knuth recommends primes in approximately golden ratio to the maximum
> + * integer representable by a machine word for multiplicative hashing.
> + * Chuck Lever verified the effectiveness of this technique:
> + * http://www.citi.umich.edu/techreports/reports/citi-tr-00-1.pdf
> + *
> + * These primes are chosen to be bit-sparse, that is operations on
> + * them can use shifts and additions instead of multiplications for
> + * machines where multiplications are slow.
> + */
> +
> +typedef uint64_t u64;
> +typedef uint32_t u32;
> +#define BITS_PER_LONG TARGET_LONG_BITS
> +
> +/* 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1 */
> +#define GOLDEN_RATIO_PRIME_32 0x9e370001UL
> +/* 2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1 */
> +#define GOLDEN_RATIO_PRIME_64 0x9e37fffffffc0001UL
> +
> +#if BITS_PER_LONG == 32
> +#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_PRIME_32
> +#define hash_long(val, bits) hash_32(val, bits)
> +#elif BITS_PER_LONG == 64
> +#define hash_long(val, bits) hash_64(val, bits)
> +#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_PRIME_64
> +#else
> +#error Wordsize not 32 or 64
> +#endif
> +
> +static inline u64 hash_64(u64 val, unsigned int bits)
> +{
> + u64 hash = val;
> +
> + /* Sigh, gcc can't optimise this alone like it does for 32 bits. */
> + u64 n = hash;
> + n<<= 18;
> + hash -= n;
> + n<<= 33;
> + hash -= n;
> + n<<= 3;
> + hash += n;
> + n<<= 3;
> + hash -= n;
> + n<<= 4;
> + hash += n;
> + n<<= 2;
> + hash += n;
> +
> + /* High bits are more random, so use them. */
> + return hash>> (64 - bits);
> +}
> +
> +static inline u32 hash_32(u32 val, unsigned int bits)
> +{
> + /* On some cpus multiply is faster, on others gcc will do shifts */
> + u32 hash = val * GOLDEN_RATIO_PRIME_32;
> +
> + /* High bits are more random, so use them. */
> + return hash>> (32 - bits);
> +}
> +
> +static inline unsigned long hash_ptr(void *ptr, unsigned int bits)
> +{
> + return hash_long((unsigned long)ptr, bits);
> +}
> +#endif /* _LINUX_HASH_H */
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> old mode 100644
> new mode 100755
> index e5585ba..e49d5be
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -717,24 +717,27 @@ ETEXI
>
> {
> .name = "migrate",
> - .args_type = "detach:-d,blk:-b,inc:-i,uri:s",
> - .params = "[-d] [-b] [-i] uri",
> - .help = "migrate to URI (using -d to not wait for completion)"
> - "\n\t\t\t -b for migration without shared storage with"
> - " full copy of disk\n\t\t\t -i for migration without "
> - "shared storage with incremental copy of disk "
> - "(base image shared between src and destination)",
> + .args_type = "detach:-d,blk:-b,inc:-i,xbrle:-x,uri:s",
> + .params = "[-d] [-b] [-i] [-x] uri",
> + .help = "migrate to URI"
> + "\n\t -d to not wait for completion"
> + "\n\t -b for migration without shared storage with"
> + " full copy of disk"
> + "\n\t -i for migration without"
> + " shared storage with incremental copy of disk"
> + " (base image shared between source and destination)"
> + "\n\t -x to use XBRLE page delta compression",
> .user_print = monitor_user_noop,
> .mhandler.cmd_new = do_migrate,
> },
>
> -
> STEXI
> -@item migrate [-d] [-b] [-i] @var{uri}
> +@item migrate [-d] [-b] [-i] [-x] @var{uri}
> @findex migrate
> Migrate to @var{uri} (using -d to not wait for completion).
> -b for migration with full copy of disk
> -i for migration with incremental copy of disk (base image is shared)
> + -x to use XBRLE page delta compression
> ETEXI
>
> {
> @@ -753,10 +756,23 @@ Cancel the current VM migration.
> ETEXI
>
> {
> + .name = "migrate_set_cachesize",
> + .args_type = "value:s",
> + .params = "value",
> + .help = "set cache size (in MB) for XBRLE migrations",
> + .mhandler.cmd = do_migrate_set_cachesize,
> + },
> +
> +STEXI
> +@item migrate_set_cachesize @var{value}
> +Set cache size (in MB) for xbrle migrations.
> +ETEXI
> +
> + {
> .name = "migrate_set_speed",
> .args_type = "value:o",
> .params = "value",
> - .help = "set maximum speed (in bytes) for migrations. "
> + .help = "set maximum XBRLE cache size (in bytes) for migrations. "
> "Defaults to MB if no size suffix is specified, ie. B/K/M/G/T",
> .user_print = monitor_user_noop,
> .mhandler.cmd_new = do_migrate_set_speed,
> diff --git a/hw/hw.h b/hw/hw.h
> index 9d2cfc2..aa336ec 100644
> --- a/hw/hw.h
> +++ b/hw/hw.h
> @@ -239,7 +239,8 @@ static inline void qemu_get_sbe64s(QEMUFile *f, int64_t *pv)
> int64_t qemu_ftell(QEMUFile *f);
> int64_t qemu_fseek(QEMUFile *f, int64_t pos, int whence);
>
> -typedef void SaveSetParamsHandler(int blk_enable, int shared, void * opaque);
> +typedef void SaveSetParamsHandler(int blk_enable, int shared,
> + int use_xbrle, int64_t xbrle_cache_size, void *opaque);
> typedef void SaveStateHandler(QEMUFile *f, void *opaque);
> typedef int SaveLiveStateHandler(Monitor *mon, QEMUFile *f, int stage,
> void *opaque);
> diff --git a/lru.c b/lru.c
> new file mode 100644
> index 0000000..e7230d0
> --- /dev/null
> +++ b/lru.c
> @@ -0,0 +1,142 @@
> +#include<assert.h>
> +#include<math.h>
> +#include "qemu-common.h"
> +#include "qemu-queue.h"
> +#include "host-utils.h"
> +#include "lru.h"
> +#include "hash.h"
> +
> +typedef struct CacheItem {
> + ram_addr_t it_addr;
> + uint8_t *it_data;
> + lru_free_cb_t it_free;
> + QCIRCLEQ_ENTRY(CacheItem) it_lru_next;
> + QCIRCLEQ_ENTRY(CacheItem) it_bucket_next;
> +} CacheItem;
> +
> +typedef QCIRCLEQ_HEAD(, CacheItem) CacheBucket;
> +static CacheBucket *page_hash;
> +static int64_t cache_table_size;
> +static uint64_t cache_max_items;
> +static int64_t cache_num_items;
> +static uint8_t cache_hash_bits;
> +
> +static QCIRCLEQ_HEAD(page_lru, CacheItem) page_lru;
> +
> +static uint64_t next_pow_of_2(uint64_t v)
> +{
> + v--;
> + v |= v>> 1;
> + v |= v>> 2;
> + v |= v>> 4;
> + v |= v>> 8;
> + v |= v>> 16;
> + v |= v>> 32;
> + v++;
> + return v;
> +}
> +
> +void lru_init(int64_t max_items, void *param)
> +{
> + int i;
> +
> + cache_num_items = 0;
> + cache_max_items = max_items;
> + /* add 20% to table size to reduce collisions */
> + cache_table_size = next_pow_of_2(1.2 * max_items);
> + cache_hash_bits = ctz64(cache_table_size) - 1;
> +
> + QCIRCLEQ_INIT(&page_lru);
> +
> + page_hash = qemu_mallocz(sizeof(CacheBucket) * cache_table_size);
> + assert(page_hash);
> + for (i = 0; i< cache_table_size; i++) {
> + QCIRCLEQ_INIT(&page_hash[i]);
> + }
> +}
> +
> +static CacheBucket *page_bucket_list(ram_addr_t addr)
> +{
> + return&page_hash[hash_long(addr, cache_hash_bits)];
> +}
> +
> +static void do_lru_remove(CacheItem *it)
> +{
> + assert(it);
> +
> + QCIRCLEQ_REMOVE(&page_lru, it, it_lru_next);
> + QCIRCLEQ_REMOVE(page_bucket_list(it->it_addr), it, it_bucket_next);
> + if (it->it_free) {
> + (*it->it_free)(it->it_data);
> + }
> + qemu_free(it);
> + cache_num_items--;
> +}
> +
> +static int do_lru_remove_first(void)
> +{
> + CacheItem *first;
> +
> + if (QCIRCLEQ_EMPTY(&page_lru)) {
> + return -1;
> + }
> + first = QCIRCLEQ_FIRST(&page_lru);
> + do_lru_remove(first);
> + return 0;
> +}
> +
> +
> +void lru_fini(void)
> +{
> + while (!do_lru_remove_first()) {
> + }
> + qemu_free(page_hash);
> +}
> +
> +static CacheItem *do_lru_lookup(ram_addr_t addr)
> +{
> + CacheBucket *head = page_bucket_list(addr);
> + CacheItem *it;
> +
> + if (QCIRCLEQ_EMPTY(head)) {
> + return NULL;
> + }
> + QCIRCLEQ_FOREACH(it, head, it_bucket_next) {
> + if (addr == it->it_addr) {
> + return it;
> + }
> + }
> + return NULL;
> +}
> +
> +uint8_t *lru_lookup(ram_addr_t addr)
> +{
> + CacheItem *it = do_lru_lookup(addr);
> + return it ? it->it_data : NULL;
> +}
> +
> +void lru_insert(ram_addr_t addr, uint8_t *data, lru_free_cb_t free_cb)
> +{
> + CacheItem *it;
> +
> + /* remove old if item exists */
> + it = do_lru_lookup(addr);
> + if (it) {
> + do_lru_remove(it);
> + }
> +
> + /* evict LRU if require free space */
> + if (cache_num_items == cache_max_items) {
> + do_lru_remove_first();
> + }
> +
> + /* add new entry */
> + it = qemu_mallocz(sizeof(*it));
> + it->it_addr = addr;
> + it->it_data = data;
> + it->it_free = free_cb;
> + QCIRCLEQ_INSERT_HEAD(page_bucket_list(addr), it, it_bucket_next);
> + QCIRCLEQ_INSERT_TAIL(&page_lru, it, it_lru_next);
> + cache_num_items++;
> +}
> +
> diff --git a/lru.h b/lru.h
> new file mode 100644
> index 0000000..6c70095
> --- /dev/null
> +++ b/lru.h
> @@ -0,0 +1,13 @@
> +#ifndef _LRU_H_
> +#define _LRU_H_
> +
> +#include<unistd.h>
> +#include<stdint.h>
> +#include "cpu-all.h"
> +typedef void (*lru_free_cb_t)(void *);
> +void lru_init(ssize_t num_items, void *param);
> +void lru_fini(void);
> +void lru_insert(ram_addr_t id, uint8_t *pdata, lru_free_cb_t free_cb);
> +uint8_t *lru_lookup(ram_addr_t addr);
> +#endif
> +
> diff --git a/migration-exec.c b/migration-exec.c
> index 14718dd..fe8254a 100644
> --- a/migration-exec.c
> +++ b/migration-exec.c
> @@ -67,7 +67,9 @@ MigrationState *exec_start_outgoing_migration(Monitor *mon,
> int64_t bandwidth_limit,
> int detach,
> int blk,
> - int inc)
> + int inc,
> + int use_xbrle,
> + int64_t xbrle_cache_size)
> {
> FdMigrationState *s;
> FILE *f;
> @@ -99,6 +101,8 @@ MigrationState *exec_start_outgoing_migration(Monitor *mon,
>
> s->mig_state.blk = blk;
> s->mig_state.shared = inc;
> + s->mig_state.use_xbrle = use_xbrle;
> + s->mig_state.xbrle_cache_size = xbrle_cache_size;
>
> s->state = MIG_STATE_ACTIVE;
> s->mon = NULL;
> diff --git a/migration-fd.c b/migration-fd.c
> index 6d14505..4a1ddbd 100644
> --- a/migration-fd.c
> +++ b/migration-fd.c
> @@ -56,7 +56,9 @@ MigrationState *fd_start_outgoing_migration(Monitor *mon,
> int64_t bandwidth_limit,
> int detach,
> int blk,
> - int inc)
> + int inc,
> + int use_xbrle,
> + int64_t xbrle_cache_size)
> {
> FdMigrationState *s;
>
> @@ -82,6 +84,8 @@ MigrationState *fd_start_outgoing_migration(Monitor *mon,
>
> s->mig_state.blk = blk;
> s->mig_state.shared = inc;
> + s->mig_state.use_xbrle = use_xbrle;
> + s->mig_state.xbrle_cache_size = xbrle_cache_size;
>
> s->state = MIG_STATE_ACTIVE;
> s->mon = NULL;
> diff --git a/migration-tcp.c b/migration-tcp.c
> index b55f419..4ca5bf6 100644
> --- a/migration-tcp.c
> +++ b/migration-tcp.c
> @@ -81,7 +81,9 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon,
> int64_t bandwidth_limit,
> int detach,
> int blk,
> - int inc)
> + int inc,
> + int use_xbrle,
> + int64_t xbrle_cache_size)
> {
> struct sockaddr_in addr;
> FdMigrationState *s;
> @@ -101,6 +103,8 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon,
>
> s->mig_state.blk = blk;
> s->mig_state.shared = inc;
> + s->mig_state.use_xbrle = use_xbrle;
> + s->mig_state.xbrle_cache_size = xbrle_cache_size;
>
> s->state = MIG_STATE_ACTIVE;
> s->mon = NULL;
> diff --git a/migration-unix.c b/migration-unix.c
> index 57232c0..0813902 100644
> --- a/migration-unix.c
> +++ b/migration-unix.c
> @@ -80,7 +80,9 @@ MigrationState *unix_start_outgoing_migration(Monitor *mon,
> int64_t bandwidth_limit,
> int detach,
> int blk,
> - int inc)
> + int inc,
> + int use_xbrle,
> + int64_t xbrle_cache_size)
> {
> FdMigrationState *s;
> struct sockaddr_un addr;
> @@ -100,6 +102,8 @@ MigrationState *unix_start_outgoing_migration(Monitor *mon,
>
> s->mig_state.blk = blk;
> s->mig_state.shared = inc;
> + s->mig_state.use_xbrle = use_xbrle;
> + s->mig_state.xbrle_cache_size = xbrle_cache_size;
>
> s->state = MIG_STATE_ACTIVE;
> s->mon = NULL;
> diff --git a/migration.c b/migration.c
> old mode 100644
> new mode 100755
> index 9ee8b17..ccacf81
> --- a/migration.c
> +++ b/migration.c
> @@ -34,6 +34,11 @@
> /* Migration speed throttling */
> static uint32_t max_throttle = (32<< 20);
>
> +/* Migration XBRLE cache size */
> +#define DEFAULT_MIGRATE_CACHE_SIZE (64 * 1024 * 1024)
> +
> +static int64_t migrate_cache_size = DEFAULT_MIGRATE_CACHE_SIZE;
> +
> static MigrationState *current_migration;
>
> int qemu_start_incoming_migration(const char *uri)
> @@ -80,6 +85,7 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data)
> int detach = qdict_get_try_bool(qdict, "detach", 0);
> int blk = qdict_get_try_bool(qdict, "blk", 0);
> int inc = qdict_get_try_bool(qdict, "inc", 0);
> + int use_xbrle = qdict_get_try_bool(qdict, "xbrle", 0);
> const char *uri = qdict_get_str(qdict, "uri");
>
> if (current_migration&&
> @@ -90,17 +96,21 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data)
>
> if (strstart(uri, "tcp:",&p)) {
> s = tcp_start_outgoing_migration(mon, p, max_throttle, detach,
> - blk, inc);
> + blk, inc, use_xbrle,
> + migrate_cache_size);
> #if !defined(WIN32)
> } else if (strstart(uri, "exec:",&p)) {
> s = exec_start_outgoing_migration(mon, p, max_throttle, detach,
> - blk, inc);
> + blk, inc, use_xbrle,
> + migrate_cache_size);
> } else if (strstart(uri, "unix:",&p)) {
> s = unix_start_outgoing_migration(mon, p, max_throttle, detach,
> - blk, inc);
> + blk, inc, use_xbrle,
> + migrate_cache_size);
> } else if (strstart(uri, "fd:",&p)) {
> s = fd_start_outgoing_migration(mon, p, max_throttle, detach,
> - blk, inc);
> + blk, inc, use_xbrle,
> + migrate_cache_size);
> #endif
> } else {
> monitor_printf(mon, "unknown migration protocol: %s\n", uri);
> @@ -185,6 +195,36 @@ static void migrate_print_status(Monitor *mon, const char *name,
> qdict_get_int(qdict, "total")>> 10);
> }
>
> +static void migrate_print_ram_status(Monitor *mon, const char *name,
> + const QDict *status_dict)
> +{
> + QDict *qdict;
> + uint64_t overflow, cache_hit, cache_lookup;
> +
> + qdict = qobject_to_qdict(qdict_get(status_dict, name));
> +
> + monitor_printf(mon, "transferred %s: %" PRIu64 " kbytes\n", name,
> + qdict_get_int(qdict, "bytes")>> 10);
> + monitor_printf(mon, "transferred %s: %" PRIu64 " pages\n", name,
> + qdict_get_int(qdict, "pages"));
> + overflow = qdict_get_int(qdict, "overflow");
> + if (overflow> 0) {
> + monitor_printf(mon, "overflow %s: %" PRIu64 " pages\n", name,
> + overflow);
> + }
> + cache_hit = qdict_get_int(qdict, "cache-hit");
> + if (cache_hit> 0) {
> + monitor_printf(mon, "cache-hit %s: %" PRIu64 " pages\n", name,
> + cache_hit);
> + }
> + cache_lookup = qdict_get_int(qdict, "cache-lookup");
> + if (cache_lookup> 0) {
> + monitor_printf(mon, "cache-lookup %s: %" PRIu64 " pages\n", name,
> + cache_lookup);
> + }
> +
> +}
> +
> void do_info_migrate_print(Monitor *mon, const QObject *data)
> {
> QDict *qdict;
> @@ -198,6 +238,18 @@ void do_info_migrate_print(Monitor *mon, const QObject *data)
> migrate_print_status(mon, "ram", qdict);
> }
>
> + if (qdict_haskey(qdict, "ram-duplicate")) {
> + migrate_print_ram_status(mon, "ram-duplicate", qdict);
> + }
> +
> + if (qdict_haskey(qdict, "ram-normal")) {
> + migrate_print_ram_status(mon, "ram-normal", qdict);
> + }
> +
> + if (qdict_haskey(qdict, "ram-xbrle")) {
> + migrate_print_ram_status(mon, "ram-xbrle", qdict);
> + }
> +
> if (qdict_haskey(qdict, "disk")) {
> migrate_print_status(mon, "disk", qdict);
> }
> @@ -214,6 +266,23 @@ static void migrate_put_status(QDict *qdict, const char *name,
> qdict_put_obj(qdict, name, obj);
> }
>
> +static void migrate_put_ram_status(QDict *qdict, const char *name,
> + uint64_t bytes, uint64_t pages,
> + uint64_t overflow, uint64_t cache_hit,
> + uint64_t cache_lookup)
> +{
> + QObject *obj;
> +
> + obj = qobject_from_jsonf("{ 'bytes': %" PRId64 ", "
> + "'pages': %" PRId64 ", "
> + "'overflow': %" PRId64 ", "
> + "'cache-hit': %" PRId64 ", "
> + "'cache-lookup': %" PRId64 " }",
> + bytes, pages, overflow, cache_hit,
> + cache_lookup);
> + qdict_put_obj(qdict, name, obj);
> +}
> +
> void do_info_migrate(Monitor *mon, QObject **ret_data)
> {
> QDict *qdict;
> @@ -228,6 +297,21 @@ void do_info_migrate(Monitor *mon, QObject **ret_data)
> migrate_put_status(qdict, "ram", ram_bytes_transferred(),
> ram_bytes_remaining(), ram_bytes_total());
>
> + if (s->use_xbrle) {
> + migrate_put_ram_status(qdict, "ram-duplicate",
> + dup_mig_bytes_transferred(),
> + dup_mig_pages_transferred(), 0, 0, 0);
> + migrate_put_ram_status(qdict, "ram-normal",
> + norm_mig_bytes_transferred(),
> + norm_mig_pages_transferred(), 0, 0, 0);
> + migrate_put_ram_status(qdict, "ram-xbrle",
> + xbrle_mig_bytes_transferred(),
> + xbrle_mig_pages_transferred(),
> + xbrle_mig_pages_overflow(),
> + xbrle_mig_pages_cache_hit(),
> + xbrle_mig_pages_cache_lookup());
> + }
> +
> if (blk_mig_active()) {
> migrate_put_status(qdict, "disk", blk_mig_bytes_transferred(),
> blk_mig_bytes_remaining(),
> @@ -341,7 +425,8 @@ void migrate_fd_connect(FdMigrationState *s)
>
> DPRINTF("beginning savevm\n");
> ret = qemu_savevm_state_begin(s->mon, s->file, s->mig_state.blk,
> - s->mig_state.shared);
> + s->mig_state.shared, s->mig_state.use_xbrle,
> + s->mig_state.xbrle_cache_size);
> if (ret< 0) {
> DPRINTF("failed, %d\n", ret);
> migrate_fd_error(s);
> @@ -448,3 +533,27 @@ int migrate_fd_close(void *opaque)
> qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
> return s->close(s);
> }
> +
> +void do_migrate_set_cachesize(Monitor *mon, const QDict *qdict)
> +{
> + ssize_t bytes;
> + const char *value = qdict_get_str(qdict, "value");
> +
> + bytes = strtosz(value, NULL);
> + if (bytes< 0) {
> + monitor_printf(mon, "invalid cache size: %s\n", value);
> + return;
> + }
> +
> + /* On 32-bit hosts, QEMU is limited by virtual address space */
> + if (bytes> (2047<< 20)&& HOST_LONG_BITS == 32) {
> + monitor_printf(mon, "cache can't exceed 2047 MB RAM limit on host\n");
> + return;
> + }
> + if (bytes != (uint64_t) bytes) {
> + monitor_printf(mon, "cache size too large\n");
> + return;
> + }
> + migrate_cache_size = bytes;
> +}
> +
> diff --git a/migration.h b/migration.h
> index d13ed4f..6dc0543 100644
> --- a/migration.h
> +++ b/migration.h
> @@ -32,6 +32,8 @@ struct MigrationState
> void (*release)(MigrationState *s);
> int blk;
> int shared;
> + int use_xbrle;
> + int64_t xbrle_cache_size;
> };
>
> typedef struct FdMigrationState FdMigrationState;
> @@ -76,7 +78,9 @@ MigrationState *exec_start_outgoing_migration(Monitor *mon,
> int64_t bandwidth_limit,
> int detach,
> int blk,
> - int inc);
> + int inc,
> + int use_xbrle,
> + int64_t xbrle_cache_size);
>
> int tcp_start_incoming_migration(const char *host_port);
>
> @@ -85,7 +89,9 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon,
> int64_t bandwidth_limit,
> int detach,
> int blk,
> - int inc);
> + int inc,
> + int use_xbrle,
> + int64_t xbrle_cache_size);
>
> int unix_start_incoming_migration(const char *path);
>
> @@ -94,7 +100,9 @@ MigrationState *unix_start_outgoing_migration(Monitor *mon,
> int64_t bandwidth_limit,
> int detach,
> int blk,
> - int inc);
> + int inc,
> + int use_xbrle,
> + int64_t xbrle_cache_size);
>
> int fd_start_incoming_migration(const char *path);
>
> @@ -103,7 +111,9 @@ MigrationState *fd_start_outgoing_migration(Monitor *mon,
> int64_t bandwidth_limit,
> int detach,
> int blk,
> - int inc);
> + int inc,
> + int use_xbrle,
> + int64_t xbrle_cache_size);
>
> void migrate_fd_monitor_suspend(FdMigrationState *s, Monitor *mon);
>
> @@ -134,4 +144,11 @@ static inline FdMigrationState *migrate_to_fms(MigrationState *mig_state)
> return container_of(mig_state, FdMigrationState, mig_state);
> }
>
> +void do_migrate_set_cachesize(Monitor *mon, const QDict *qdict);
> +
> +void arch_set_params(int blk_enable, int shared_base,
> + int use_xbrle, int64_t xbrle_cache_size, void *opaque);
> +
> +int xbrle_mig_active(void);
> +
> #endif
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 793cf1c..8fbe64b 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -431,13 +431,16 @@ EQMP
>
> {
> .name = "migrate",
> - .args_type = "detach:-d,blk:-b,inc:-i,uri:s",
> - .params = "[-d] [-b] [-i] uri",
> - .help = "migrate to URI (using -d to not wait for completion)"
> - "\n\t\t\t -b for migration without shared storage with"
> - " full copy of disk\n\t\t\t -i for migration without "
> - "shared storage with incremental copy of disk "
> - "(base image shared between src and destination)",
> + .args_type = "detach:-d,blk:-b,inc:-i,xbrle:-x,uri:s",
> + .params = "[-d] [-b] [-i] [-x] uri",
> + .help = "migrate to URI"
> + "\n\t -d to not wait for completion"
> + "\n\t -b for migration without shared storage with"
> + " full copy of disk"
> + "\n\t -i for migration without"
> + " shared storage with incremental copy of disk"
> + " (base image shared between source and destination)"
> + "\n\t -x to use XBRLE page delta compression",
> .user_print = monitor_user_noop,
> .mhandler.cmd_new = do_migrate,
> },
> @@ -453,6 +456,7 @@ Arguments:
> - "blk": block migration, full disk copy (json-bool, optional)
> - "inc": incremental disk copy (json-bool, optional)
> - "uri": Destination URI (json-string)
> +- "xbrle": to use XBRLE page delta compression
>
> Example:
>
> @@ -494,6 +498,31 @@ Example:
> EQMP
>
> {
> + .name = "migrate_set_cachesize",
> + .args_type = "value:s",
> + .params = "value",
> + .help = "set cache size (in MB) for xbrle migrations",
> + .mhandler.cmd = do_migrate_set_cachesize,
> + },
> +
> +SQMP
> +migrate_set_cachesize
> +---------------------
> +
> +Set cache size to be used by XBRLE migration
> +
> +Arguments:
> +
> +- "value": cache size in bytes (json-number)
> +
> +Example:
> +
> +-> { "execute": "migrate_set_cachesize", "arguments": { "value": 500M } }
> +<- { "return": {} }
> +
> +EQMP
> +
> + {
> .name = "migrate_set_speed",
> .args_type = "value:f",
> .params = "value",
> diff --git a/savevm.c b/savevm.c
> index 4e49765..93b512b 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -1141,7 +1141,8 @@ int register_savevm(DeviceState *dev,
> void *opaque)
> {
> return register_savevm_live(dev, idstr, instance_id, version_id,
> - NULL, NULL, save_state, load_state, opaque);
> + arch_set_params, NULL, save_state,
> + load_state, opaque);
> }
>
> void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque)
> @@ -1428,15 +1429,17 @@ static int vmstate_save(QEMUFile *f, SaveStateEntry *se)
> #define QEMU_VM_SUBSECTION 0x05
>
> int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
> - int shared)
> + int shared, int use_xbrle,
> + int64_t xbrle_cache_size)
> {
> SaveStateEntry *se;
>
> QTAILQ_FOREACH(se,&savevm_handlers, entry) {
> if(se->set_params == NULL) {
> continue;
> - }
> - se->set_params(blk_enable, shared, se->opaque);
> + }
> + se->set_params(blk_enable, shared, use_xbrle, xbrle_cache_size,
> + se->opaque);
> }
>
> qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
> @@ -1577,7 +1580,7 @@ static int qemu_savevm_state(Monitor *mon, QEMUFile *f)
>
> bdrv_flush_all();
>
> - ret = qemu_savevm_state_begin(mon, f, 0, 0);
> + ret = qemu_savevm_state_begin(mon, f, 0, 0, 0, 0);
> if (ret< 0)
> goto out;
>
> diff --git a/sysemu.h b/sysemu.h
> index b81a70e..eb53bf7 100644
> --- a/sysemu.h
> +++ b/sysemu.h
> @@ -44,6 +44,16 @@ uint64_t ram_bytes_remaining(void);
> uint64_t ram_bytes_transferred(void);
> uint64_t ram_bytes_total(void);
>
> +uint64_t dup_mig_bytes_transferred(void);
> +uint64_t dup_mig_pages_transferred(void);
> +uint64_t norm_mig_bytes_transferred(void);
> +uint64_t norm_mig_pages_transferred(void);
> +uint64_t xbrle_mig_bytes_transferred(void);
> +uint64_t xbrle_mig_pages_transferred(void);
> +uint64_t xbrle_mig_pages_overflow(void);
> +uint64_t xbrle_mig_pages_cache_lookup(void);
> +uint64_t xbrle_mig_pages_cache_hit(void);
> +
> int64_t cpu_get_ticks(void);
> void cpu_enable_ticks(void);
> void cpu_disable_ticks(void);
> @@ -74,7 +84,8 @@ void qemu_announce_self(void);
> void main_loop_wait(int nonblocking);
>
> int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
> - int shared);
> + int shared, int use_xbrle,
> + int64_t xbrle_cache_size);
> int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f);
> int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f);
> void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f);
> diff --git a/xbzrle.c b/xbzrle.c
> new file mode 100644
> index 0000000..e9285e0
> --- /dev/null
> +++ b/xbzrle.c
> @@ -0,0 +1,126 @@
> +#include<stdint.h>
> +#include<string.h>
> +#include<assert.h>
> +#include "cpu-all.h"
> +#include "xbzrle.h"
> +
> +typedef struct {
> + uint64_t c;
> + uint64_t num;
> +} zero_encoding_t;
> +
> +typedef struct {
> + uint64_t c;
> +} char_encoding_t;
> +
> +static int rle_encode(uint64_t *in, int slen, uint8_t *out, const int dlen)
> +{
> + int dl = 0;
> + uint64_t cp = 0, c, run_len = 0;
> +
> + if (slen<= 0)
> + return -1;
> +
> + while (1) {
> + if (!slen)
> + break;
> + c = *in++;
> + slen--;
> + if (!(cp || c)) {
> + run_len++;
> + } else if (!cp) {
> + ((zero_encoding_t *)out)->c = cp;
> + ((zero_encoding_t *)out)->num = run_len;
> + dl += sizeof(zero_encoding_t);
> + out += sizeof(zero_encoding_t);
> + run_len = 1;
> + } else {
> + ((char_encoding_t *)out)->c = cp;
> + dl += sizeof(char_encoding_t);
> + out += sizeof(char_encoding_t);
> + }
> + cp = c;
> + }
> +
> + if (!cp) {
> + ((zero_encoding_t *)out)->c = cp;
> + ((zero_encoding_t *)out)->num = run_len;
> + dl += sizeof(zero_encoding_t);
> + out += sizeof(zero_encoding_t);
> + } else {
> + ((char_encoding_t *)out)->c = cp;
> + dl += sizeof(char_encoding_t);
> + out += sizeof(char_encoding_t);
> + }
> + return dl;
> +}
> +
> +static int rle_decode(const uint8_t *in, int slen, uint64_t *out, int dlen)
> +{
> + int tb = 0;
> + uint64_t run_len, c;
> +
> + while (slen> 0) {
> + c = ((char_encoding_t *) in)->c;
> + if (c) {
> + slen -= sizeof(char_encoding_t);
> + in += sizeof(char_encoding_t);
> + *out++ = c;
> + tb++;
> + continue;
> + }
> + run_len = ((zero_encoding_t *) in)->num;
> + slen -= sizeof(zero_encoding_t);
> + in += sizeof(zero_encoding_t);
> + while (run_len--> 0) {
> + *out++ = c;
> + tb++;
> + }
> + }
> + return tb;
> +}
> +
> +static void xor_encode_word(uint8_t *dst, const uint8_t *src1,
> + const uint8_t *src2)
> +{
> + int len = TARGET_PAGE_SIZE / sizeof (uint64_t);
> + uint64_t *dstw = (uint64_t *) dst;
> + const uint64_t *srcw1 = (const uint64_t *) src1;
> + const uint64_t *srcw2 = (const uint64_t *) src2;
> +
> + while (len--) {
> + *dstw++ = *srcw1++ ^ *srcw2++;
> + }
> +}
> +
> +int xbzrle_encode(uint8_t *xbzrle, const uint8_t *old, const uint8_t *curr,
> + const size_t max_compressed_len)
> +{
> + int compressed_len;
> + uint8_t xor_buf[TARGET_PAGE_SIZE];
> + uint8_t work_buf[TARGET_PAGE_SIZE * 2]; /* worst case xbzrle is 150% */
> +
> + xor_encode_word(xor_buf, old, curr);
> + compressed_len = rle_encode((uint64_t *)xor_buf,
> + sizeof(xor_buf)/sizeof(uint64_t), work_buf,
> + sizeof(work_buf));
> + if (compressed_len> max_compressed_len) {
> + return -1;
> + }
> + memcpy(xbzrle, work_buf, compressed_len);
> + return compressed_len;
> +}
> +
> +int xbzrle_decode(uint8_t *curr, const uint8_t *old, const uint8_t *xbrle,
> + const size_t compressed_len)
> +{
> + uint8_t xor_buf[TARGET_PAGE_SIZE];
> +
> + int len = rle_decode(xbrle, compressed_len,
> + (uint64_t *)xor_buf, sizeof(xor_buf)/sizeof(uint64_t));
> + if (len< 0) {
> + return len;
> + }
> + xor_encode_word(curr, old, xor_buf);
> + return len * sizeof(uint64_t);
> +}
> diff --git a/xbzrle.h b/xbzrle.h
> new file mode 100644
> index 0000000..5d625a0
> --- /dev/null
> +++ b/xbzrle.h
> @@ -0,0 +1,12 @@
> +#ifndef _XBZRLE_H_
> +#define _XBZRLE_H_
> +
> +#include<stdio.h>
> +
> +int xbzrle_encode(uint8_t *xbrle, const uint8_t *old, const uint8_t *curr,
> + const size_t len);
> +int xbzrle_decode(uint8_t *curr, const uint8_t *old, const uint8_t *xbrle,
> + const size_t len);
> +
> +#endif
> +
>
13 years, 8 months
[libvirt] [PATCH] storage: avoid missing break in default branch of switch statement
by ajia@redhat.com
Detected by Coverity.
* src/storage/storage_backend.c: avoid missing break in default branch.
https://bugzilla.redhat.com/show_bug.cgi?id=721335
Signed-off-by: Alex Jia <ajia(a)redhat.com>
---
src/storage/storage_backend.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/src/storage/storage_backend.c b/src/storage/storage_backend.c
index 6243d1e..004bc66 100644
--- a/src/storage/storage_backend.c
+++ b/src/storage/storage_backend.c
@@ -819,6 +819,7 @@ virStorageBackendCreateQemuImg(virConnectPtr conn,
virCommandAddArgFormat(cmd, "%lluK", size_arg);
if (do_encryption)
virCommandAddArg(cmd, "-e");
+ break;
}
} else {
virCommandAddArgList(cmd, "create", "-f", type,
--
1.7.1
13 years, 8 months
[libvirt] libvirt-0.9.1 to 0.9.3-r1: managedsave/save won't start/restore at saved state
by Nicolas Sebrecht
I'm seeing strange behaviour, here. Any guests saved using both
managedsave and save commands from virsh won't restore at saved state. A
new full boot sequence happen.
- Tested against libvirt v0.9.1, v0.9.2, v0.9.3-r1 (Gentoo)
- Confirmed on three different hosts Gentoo amd64 systems.
- Tested with gentoo and ubuntu guests.
- Nothing relevant in /var/log/libvirt/libvirt.log or
/var/log/libvirt/qemu/<dom>.log
The "state file" /var/lib/libvirt/qemu/save/<dom>.save exists and is
deleted when 'virsh start' is called.
The new boot sequence is confirmed by :
- VNC console checks
- previous screen sessions lost
- uptime
I've open a bug at https://bugs.gentoo.org/show_bug.cgi?id=376333 but
had no answer.
Any idea on what could happen or how to inspect it?
--
Nicolas Sebrecht
13 years, 8 months
[libvirt] [PATCH 1/3] qemu: avoid dead store in qemuProcessStart
by ajia@redhat.com
Value stored to 'ret' is never read, in fact, 'cleanup' section will
directly return -1 when function is fail, so remove this dead assignment.
* src/qemu/qemu_process.c: kill dead assignment.
Signed-off-by: Alex Jia <ajia(a)redhat.com>
---
src/qemu/qemu_process.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 88cefd5..b0d2149 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -2908,13 +2908,11 @@ int qemuProcessStart(virConnectPtr conn,
VIR_DEBUG("Writing early domain status to disk");
if (virDomainSaveStatus(driver->caps, driver->stateDir, vm) < 0) {
- ret = -1;
goto cleanup;
}
VIR_DEBUG("Waiting for handshake from child");
if (virCommandHandshakeWait(cmd) < 0) {
- ret = -1;
goto cleanup;
}
@@ -2943,7 +2941,6 @@ int qemuProcessStart(virConnectPtr conn,
VIR_DEBUG("Labelling done, completing handshake to child");
if (virCommandHandshakeNotify(cmd) < 0) {
- ret = -1;
goto cleanup;
}
VIR_DEBUG("Handshake complete, child running");
--
1.7.1
13 years, 8 months
[libvirt] [PATCH] qemu: Avoid overwriting errors from virGetHostname
by Jiri Denemark
---
src/qemu/qemu_migration.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index 7aeea69..4d0e062 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -257,7 +257,7 @@ qemuMigrationCookieNew(virDomainObjPtr dom)
memcpy(mig->uuid, dom->def->uuid, VIR_UUID_BUFLEN);
if (!(mig->localHostname = virGetHostname(NULL)))
- goto no_memory;
+ goto error;
if (virGetHostUUID(mig->localHostuuid) < 0) {
qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s",
_("Unable to obtain host UUID"));
--
1.7.6
13 years, 8 months
[libvirt] Disk snapshot mode proposal: patch for storing the snapshot mode from .vmx to .xml
by computernews@rambler.ru
Hello,
Few days ago I have proposed to implement the snapshot mode
functionality for esx/vpx:
|Mail subject: "[libvirt] VMWare "independent disk" processing needed"
Sent: "04.07.2011"
Sender: "computernews(a)rambler.ru"
|
What I meant is to allow "libvirt" to handle (see full .vmx attached) :
|vmx: scsi0:1.mode = "independent-persistent"
vmx: scsi0:2.mode = "independent-nonpersistent"|
As none has responded I decided to start implementing that myself.
By now I have the managed to implement the .vmx -> .xml part. According
to "General tips for contributing patches"
(http://libvirt.org/hacking.html#patches) I am sending the patch back to
the community as early as it has sense as I still hope to contribute
this feature to the community. It builds and has been functionally
tested. I also tried my best to stick to coding conventions applied on
the project. So I hope my patch will not take long time to analyze for
someone experienced enough.
I saw someone already has started (put few comments about snapshot modes
in src/vmx/vmx.c). So tried to undestand an original idea and keep up
with it. I hope I was not wrong there.
I am willing to keep working in this direction. So if someone would be
so kind to take a look at my efforts and provide a feedback - it would
be very nice and, I hope, useful for the rest of the project.
Attached are:
1. patch itself
2. ".vmx" file I am testing on
3. ".xml" file produced by my changes
Looking forward to hear any feedback/criticism/advices. Thanks in advance.
Best regards
Oleh Paliy
13 years, 8 months
[libvirt] [PATCH] qemu: avoid pass null pointer as an argument
by ajia@redhat.com
In fact, 'pos' is always -1, this reason is because qemuProcessStart function
assigns -1 to 'pos' variable then call qemuProcessWaitForMonitor, meanwhile,
qemuProcessAttach function also call qemuProcessWaitForMonitor and directly
pass -1 as an argument, so if (pos != -1) statement can't been run for ever,
it also means we can't allocate memory to 'buf' variable, that is, 'buf' is
a initial value NULL, however, the function
qemuProcessReadLogFD(logfd, buf, buf_size, strlen(buf)) will be called
on 'cleanup' section, null pointer passed as an argument.
* src/qemu/qemu_process.c: avoid null pointer passed as an argument to a
'nonnull' parameter.
Signed-off-by: Alex Jia <ajia(a)redhat.com>
---
src/qemu/qemu_process.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index b0d2149..570992d 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -1189,6 +1189,11 @@ qemuProcessWaitForMonitor(struct qemud_driver* driver,
goto closelog;
}
+ if (VIR_ALLOC_N(buf, buf_size) < 0) {
+ virReportOOMError();
+ return -1;
+ }
+
VIR_DEBUG("Connect monitor to %p '%s'", vm, vm->def->name);
if (qemuConnectMonitor(driver, vm) < 0) {
goto cleanup;
--
1.7.5.1
13 years, 8 months
[libvirt] PCI devices passthough to LXC containers using libvirt
by Devendra K. Modium
Hi All
Please let me know if anyone have given access to
PCI devices for a LXC container.
I have tried getting the xml from
"virsh nodedev-dumpxml pci_device" and
added to the libvirt xml file as shown below
<device>
<name>pci_0000_03_00_0</name>
<parent>pci_0000_00_03_0</parent>
<driver>
<name>nvidia</name>
</driver>
<capability type='pci'>
<domain>0</domain>
<bus>3</bus>
<slot>0</slot>
<function>0</function>
<product id='0x06fd' />
<vendor id='0x10de'>nVidia Corporation</vendor>
</capability>
</device>
But it didn't work. I see the logs and it says
couldn't get physical and virtual functions of these devices with error
get_physical_function_linux:323 : Attempting to get SR IOV physical function for device with sysfs path '/sys/devices/pci0000:00/0000:00:00.0'
16:48:34.033: 13802: debug : get_sriov_function:270 : Attempting to resolve device path from device link '/sys/devices/pci0000:00/0000:00:00.0/physfn'
16:48:34.033: 13802: debug : get_sriov_function:274 : SR IOV function link '/sys/devices/pci0000:00/0000:00:00.0/physfn' does not exist
16:48:34.033: 13802: debug : get_virtual_functions_linux:348 : Attempting to get SR IOV virtual functions for devicewith sysfs path '/sys/devices/pci0000:00/0000:00:00.0'
If anyone got some guidelines how to debug, please let me know.
Thanks in advance
Regards
Devendra
13 years, 8 months
[libvirt] [PATCH] qemu: avoid double free in qemuProcessWaitForMonitor
by ajia@redhat.com
Reported by Eric.
* src/qemu/qemu_process.c: avoid double free 'buf' on closelog lable.
Signed-off-by: Alex Jia <ajia(a)redhat.com>
---
src/qemu/qemu_process.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index b0d2149..5ef25cb 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -1225,8 +1225,6 @@ cleanup:
}
closelog:
- VIR_FREE(buf);
-
if (VIR_CLOSE(logfd) < 0) {
char ebuf[1024];
VIR_WARN("Unable to close logfile: %s",
--
1.7.1
13 years, 8 months
[libvirt] qemu-kvm>=0.14 is unable to boot
by Zdenek Styblik
Hello,
I'm wondering what might be behind qemu-kvm>=0.14 is unable to boot
except from PXE and perhaps virtio HDD.
I mean, is somebody running qemu-kvm>=0.14 and libvirt-0.9.3/0.9.4
around here, thus is my setup broken?
Thanks,
Zdenek
--
Zdenek Styblik
email: stybla(a)turnovfree.net
jabber: stybla(a)jabber.turnovfree.net
13 years, 8 months