Hi all,

[Issue observed]
If we issue 'nova reboot <server>', we get to have the console output of the latest bootup of the server only. The console output of the previous boot for the same server vanishes due to truncation[1]. If we do reboot from within the VM instance [ #sudo reboot ], or reboot the instance with 'virsh reboot <instance>' the behavior is not the same, where the console.log keeps increasing, with the new output being appended.
This loss of history makes some debugging scenario difficult due to lack of information being available.

Please point me to any solution/blueprint for this issue, if already planned. Otherwise, please comment on my analysis and proposals as solution, below -

[Analysis]
Nova's libvirt driver on compute node tries to do a graceful restart of the server instance, by attempting a soft_reboot first. If soft_reboot fails, it attempts a hard_reboot. As part of soft_reboot, it brings down the instance by calling shutdown(), and then calls createWithFlags() to bring this up. Because of this, qemu-kvm process for the instance gets terminated and new process is launched. In QEMU, the chardev file is opened with O_TRUNC, and thus we lose the previous content of the console.log file.
On the other-hand, during 'virsh reboot <instance>', the same qemu-kvm process continues, and libvirt actually does a qemuDomainSetFakeReboot(). Thus the same file continues capturing the new console output as a continuation into the same file.

[Proposals for solution]
1. NOVA, driven by certain configuration, will backup the console file, before creating the domain, during reboot scenario. viz. doing a backup of console.log as console.log.0. How many such backups of log-file to keep, what can be the maximum size of the file, 'logrotate` to be used or not - all these can come to NOVA as configuration parameter.
    Pros - As NOVA libvirt driver is not using libvirt's reboot() functionality knowingly, this problem can be better addressed from the same layer.
    Cons - NOVA's libvirt layer building awareness of the console files is not clean from modularity.

2. virDomainCreateWithFlags() will have a new flag value to indicate logs to be appended instead of truncated, if FILE option is used. This config will be passed to QEMU, while spawning the process.
    - Changes will be not in OpenStack Code, but in libvirt and QEMU.
    Cons - We may have to do the similar implementation for all the drivers of libvirt.
    Pros - This feature's use-case is there in case of 'virsh shutdown <instance>', followed by a 'virsh start <instance>' too.

Regards,
Suro

Surojit Pathak


Refs -
[1] Snippet
# tail -f /opt/stack/data/nova/instances/cea9a3d9-f833-4ded-90b8-c85b7da3f758/console.log
...
[  OK  ] Started udev Coldplug all Devices.
[  OK  ] Started Create static device nodes in /dev.
         Starting udev Kernel Device Manager...
[   36.938075] EXT4-fs (vda1): re-mounted. Opts: (null)
[  OK  ] Started Remount Root and Kernel File Systems.
         Starting Load/Save Random Seed...
[  OK  ] Reached target Local File Systems (Pre).
         Starting Configure read-only root support...
[  OK  ] Started Load/Save Random Seed.
tail: /opt/stack/data/nova/instances/cea9a3d9-f833-4ded-90b8-c85b7da3f758/console.log: file truncated
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.11.10-301.fc20.x86_64 (mockbuild@bkernel01.phx2.fedoraproject.org) (gcc version 4.8.2 20131017 (Red Hat 4.8.2-1) (GCC) ) #1 SMP Thu Dec 5 14:01:17 UTC 2013
[    0.000000] Command line: ro root=UUID=314b4a27-3885-49e8-9415-af098db4fd2a no_timer_check console=tty1 console=ttyS0,115200n8 initrd=/boot/initramfs-3.11.10-301.fc20.x86_64.img BOOT_IMAGE=/boot/vmlinuz-3.11.10-301.fc20.x86_64
....