[libvirt-users] Zombie processes being created when console buffer is full
by Peter Steele
We have been researching stuck zombie processes in our libvirt lxc
containers. What we found was:
1) Each zombie’s parent was pid 1. init which symlinks to systemd.
2) In some cases, the zombies were launched by systemd, in others the
zombie was inherited.
3) While the child is in the zombie state, the parent process (systemd)
/proc/1/status shows no pending signals.
4) Attaching gdb to systemd, there was 1 thread and it was waiting in
write() and the file being written was /dev/console.
This write() to the console never returns. We operated under the
assumption that systemd's SIGCHLD handler sets a bit and a foreground
thread (the only thread) would see that child processes needed reaping.
While the single thread is stuck in write(), the reaping never takes
place.
So why is write() blocking? The answer seems to be that there is
nothing draining the console and eventually it blocks write() when its
buffers become full. When we attached to the container's console, the
buffer is cleared allowing systemd’s write() to return. The zombies are
then reaped and everything goes back to normal.
Our “solution” was more of a workaround. systemd was altered to log
errors/warnings/etc to /dev/null instead of /dev/console. This
prevented the problem, only in that the console buffer was unlikely to
get filled up since systemd generally is the only then that writes to
it. This is definitely a hack though.
This may be a bug in the libvirt container library (you can't expect
something to periodically connect to a container's console to empty it
out). We suspect there may also be a configuration issue in our
containers with regards to the console.
Has anyone else observed this problem?
Peter
_______________________________________________
libvirt-users mailing list
libvirt-users(a)redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users
8 years, 8 months
[libvirt-users] Buggy 1.3.2? Disconnected from qemu:///session due to I/O error
by Lars Kellogg-Stedman
I think I've hit the same problem that Predrag reported in
http://comments.gmane.org/gmane.comp.emulators.libvirt.user/8825.
With libvirt-1.3.2-1.fc23.x86_64 on Fedora 23, when I try uploading an
image with vol-upload to a user libvirtd (qemu:///session):
virsh vol-upload --pool default volume.qcow /path/to/file.qcow2
I am getting:
error: Disconnected from qemu:///session due to I/O error
error: cannot send data to volume volume.qcow2
error: Cannot write data: Broken pipe
error: One or more references were leaked after disconnect from the hypervisor
If I downgrade to libvirt-1.2.18.2-2.fc23.x86_64 (which is what
actually ships in F23), the image upload completes without a problem.
--
Lars Kellogg-Stedman <lars(a)redhat.com> | larsks @ {freenode,twitter,github}
Cloud Engineering / OpenStack | http://blog.oddbit.com/
8 years, 8 months
[libvirt-users] removing virbr0
by Andrei Perietanu
I am building a custom Linux image which includes KVM and will be installed
on multiple machines. By default when installing libvirt you get a 'default
network' which adds a 'vrbr0'.
I found several tutorials online about removing this 'virbr0' but I would
like to not have it in the first place.
I am compiling libvirt from source so I would think there is some compile
time option or some configuration file I need to change for the default
network not to be included.
Thanks for your help!
Andrei
--
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of or
taking of any action in reliance upon this information by persons or
entities other than the intended recipient is prohibited. If you receive
this in error please contact the sender and delete the material from any
computer immediately. It is the policy of Klas Limited to disavow the
sending of offensive material and should you consider that the material
contained in the message is offensive you should contact the sender
immediately and also your I.T. Manager.
Klas Telecom Inc., a Virginia Corporation with offices at 1101 30th St. NW,
Washington, DC 20007.
Klas Limited (Company Number 163303) trading as Klas Telecom, an Irish
Limited Liability Company, with its registered office at Fourth Floor, One
Kilmainham Square, Inchicore Road, Kilmainham, Dublin 8, Ireland.
8 years, 8 months
[libvirt-users] different uuids, but still "Attempt to migrate guest to same host" error
by Devin Reade
Background:
----------
I'm trying to debug a two-node pacemaker/corosync cluster where I
want to be able to do live migration of KVM/qemu VMs. Storage is
backed via dual-primary DRBD (yes, fencing is in place).
When moving the VM between nodes via 'pcs resource move RES NODENAME',
the live migration fails although pacemaker will shut down the VM
and restart it on the other node.
For the purpose of diagnosing things, on both nodes I've put SELinux
into permissive mode and disabled firewalld.
Interesting Bit:
---------------
Debugging a bit further, I put the VM into an unmanaged state and
then try with virsh, from the node currently running the VM:
[root@node1 ~]# virsh migrate --live --verbose testvm
qemu+ssh://node2/system
error: internal error: Attempt to migrate guest to the same host
node1.example.tld
A quick google points toward uuid problems, however the two nodes
are, afaict, working with different UUIDs. (Substantiating info
shown toward the end.)
I thought that since `hostname` only returns the node name and not
the FQDN that perhaps there was internal qemu confusion about using
the short node name vs FQDN. However fully qualifying it made
no difference:
[root@node1 ~]# virsh migrate --live --verbose testvm
qemu+ssh://node2.example.tld/system
error: internal error: Attempt to migrate guest to the same host
node1.example.tld
Running virsh with a debug level of 1 doesn't reveal anything interesting
that I can see. Running libvirtd at that level shows that node2 is seeing
node1.example.tld in the emitted XML in qemuMigrationPrepareDirect. I'm
assuming that means the wrong node has been calculated somewhere prior
to that.
At this point I'm grasping at straws and looking for ideas. Does anyone
have a clue-bat?
Devin
Config Info Follows:
-------------------
CentOS Linux release 7.2.1511 (Core)
libvirt on both nodes is 1.2.17-13
[root@node1 ~]# virsh sysinfo | grep uuid
<entry name='uuid'>03DE0294-0480-05A4-B906-8E0700080009</entry>
[root@node2 ~]# virsh sysinfo | grep uuid
<entry name='uuid'>03DE0294-0480-05A4-B206-320700080009</entry>
[root@node1 ~]# dmidecode -s system-uuid
03DE0294-0480-05A4-B906-8E0700080009
[root@node2 ~]# dmidecode -s system-uuid
03DE0294-0480-05A4-B206-320700080009
[root@node1 ~]# fgrep uuid /etc/libvirt/libvirtd.conf | grep -v '#'
host_uuid = "875cb1a3-437c-4cb5-a3de-9789d0233e4b"
[root@node2 ~]# fgrep uuid /etc/libvirt/libvirtd.conf | grep -v '#'
host_uuid = "643c0ef4-bb46-4dc9-9f91-13dda8d9aa33"
[root@node2 ~]# pcs config show
...
Resource: testvm (class=ocf provider=heartbeat type=VirtualDomain)
Attributes: hypervisor=qemu:///system
config=/cluster/config/libvirt/qemu/testvm.xml migration_transport=ssh
Meta Attrs: allow-migrate=true is-managed=false
Operations: start interval=0s timeout=120 (testvm-start-interval-0s)
stop interval=0s timeout=240 (testvm-stop-interval-0s)
monitor interval=10 timeout=30 (testvm-monitor-interval-10)
migrate_from interval=0 timeout=60s
(testvm-migrate_from-interval-0)
migrate_to interval=0 timeout=120s
(testvm-migrate_to-interval-0)
...
(The /cluster/config directory is a shared GlusterFS filesystem.)
[root@node1 ~]# cat /etc/hosts | grep -v localhost
192.168.10.8 node1.example.tld node2
192.168.10.9 node2.example.tld node2
192.168.11.8 node1hb.example.tld node1hb
192.168.11.9 node2hb.example.tld node2hb
(node1 and node2 are the "reachable" IPs and totem ring1. node1hb and
node2hb form a direct connection via crossover cable for DRBD and
totem ring0.)
8 years, 8 months
[libvirt-users] How to test if a console session is active?
by Peter Steele
I use libvirt to manage a container based environment. Is there an
efficient method for testing if someone has a console session open for a
given container? The approach I'm using at the moment is to use lsof and
check if there is an entry for the container's console pts device. For
example, I can determine a container's console device from its xml
definition:
# virsh dumpxml vm-00|grep -A 4 pty
<console type='pty' tty='/dev/pts/3'>
<source path='/dev/pts/3'/>
<target type='lxc' port='0'/>
<alias name='console0'/>
</console>
Once I have pts device, I can do something like
# if lsof /dev/pts/3; then echo "Container vm-00 has an active console
session"; fi
This all works fine but we've found that lsof is very expensive. Is
there a more efficient means to make this active console check?
Peter
8 years, 8 months
[libvirt-users] Problems creating a LXC rootfs to be used with virt-manager
by Markus Rhonheimer
Hello everyone,
Laine Stump over at OFTC #virt was kind enough to point me to this email
address.
I am trying to run a LXC OS container with virt-manager and found out,
that I need to create the rootfs with LXC/LXD and then point
virt-manager to the rootfs location. ( I am running Ubuntu 16.04 inside
a virtual machine [kvm] for testing)
Both "lxc-create" and "lxc launch" worked to create the rootfs and run
the VMs.
When I tried to add them to virt-manager I always get error messages
like this:
https://postimg.org/image/lkqn1ery5/
the configuration looks like this:
http://pastebin.com/fg1JFCRX
I would appreciate any help.
kind regards,
Markus
8 years, 8 months
[libvirt-users] How to measure memory utilizatin of guest when dommemstat reports "RSS" is more than "ACTUAL"?
by suyog jadhav
How to measure memory utilizatin of guest when dommemstat reports "RSS" is more than "ACTUAL"?
Following is the output on RHEL 6.6
# dommemstat 1
actual 16777216
rss 16890372
# dommemstat 2
actual 16777216
rss 16460516
I found following article on Redhat which suggests that this is normal/expected?
https://access.redhat.com/solutions/2017943
Now we use following formula to calculate the memory utilization.
mem_util = (rss/actual)*100
now this gives more that 100% as the result.
We use the libvirt library to call the virDomainMemoryStats. (which is the same used by the "virsh dommemstat" command)
If we can't use the RSS as the "used memory", what else should be used to calculate the same?
What is the ideal approach to report memory utilization of guest from the KVM host?
8 years, 8 months
[libvirt-users] Compiling qemu
by Dominique Ramaekers
Hi,
I've installed qemu 2.5 through the package manager. There is a bug which blocks making an external snapshot for the second time. A fix is released:
https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg06163.html
I thought, why not download the source, apply the patch, build + make and then copy qemu-system-x86_64 to the /usr/bin/ folder.....
But if I do 'virsh start my_vm', I get these errors:
fout: internal error: early end of file from monitor, possible problem: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
qemu: could not load PC BIOS 'bios-256k.bin'
First:
What can I do about the 'host doesn't support requested feature'?
Second:
I think the compiled binary doesn't look for the bios-256k.bin in the right place?
Help wil be much apreciated...
Greetings,
Dominique.
8 years, 8 months