(apologies for first attempt: I was using a strange mailer.)
As the principal maintainer of dnsmasq, I'm seeing increasing reports of
problems on systems which run both dnsmasq and libvirt. I'm fairly sure
I understand what's going on in these cases, and I have a few proposals
for changes in libvir and dnsmasq that should fix things.
The problem is that libvirt runs a private instance of dnsmasq: on
machines which are also running a "system" dnsmasq daemon, this can
cause problems.
Some background: dnsmasq can run in two modes.
Default mode: dnsmasq binds the wildcard address and does network magic
to determine which interface request packets actually come from, so that
the results can be sent back with the correct source address. This has
the advantage that network interfaces can come and go and change IP
address and dnsmasq will keep working. It's possible to restrict dnsmasq
to only reply to requests on some interfaces; requests from other
interfaces will be read by dnsmasq and then silently dropped. Telling
dnsmasq to use an interface which doesn't exist but might in the future
will result in a logged warning, but dnsmasq will still start and when
the interface comes up it will work.
Bind-interfaces mode: This is the traditional way to do UDP servers. At
startup dnsmasq enumerates all the extant interfaces and then opens a
socket for each one, listening on the interfaces's IP address.
Interfaces may be skipped if excluded by the --interface and
--except-interface flags, and any interface specified in --interface
which doesn't exist at start-up will generate a fatal error.
In almost all cases, default mode is better: --bind-interfaces is only
there to cope with old platforms which don't support enough socket
options to do default mode.
The only time when --bind-interfaces works better is when it's desirable
to run more than one instance of dnsmasq or have dnsmasq co-exist with
another DNS server. This is not possible in default mode, but it does
work in bind-interfaces mode, providing than _all_ instances of dnsmasq
are in bind-interfaces mode, and that they listen on a disjoint set of
interfaces.
Therefore, to allow multiple dnsmasq instances libvirt's private dnsmasq
instance is started in bind-interfaces mode: that forces one of the
dnsmasq instances to do bind-interfaces. Many of the Linux distibution
dnsmasq packages have now implemented an /etc/dnsmasq.d directory where
configuration fragments can be dropped. Their libvirt packages are
putting a file there which contains a bind-interfaces command, so that
the "system" dnsmasq is automatically forced into the same mode, and the
two can co-exist.
This works, sort-of, but there some disadvantages. Installing libvirt
drops the configuration change for the system dnsmasq, but the packages
frequently don't restart the system daemon, so that things transiently
fail until everything has rebooted. Much worse, the system dnsmasq is
forced into bind-interfaces mode and then service to transient
interfaces (usb, ad-hoc wifi) no longer works, or, because those
interfaces are mentioned in the dnsmasq configuration, dnsmasq now fails
at start-up when the interfaces don't exist.
I don't think there is a solution to this problem wholly within dnsmasq:
the nature of the BSD-sockets universe means that there never be two
instances of dnsmasq without bind-interfaces, and it's, at best,
difficult to get round the limitations of that WRT transient network
interfaces.
There is, an alternative solution, which would involve changes to both
dnsmasq, libvirt and their packaging. Hence this mail.
My proposal is to get rid of the necessity for two dnsmasq instances.
Libvirt should check for the existance of a "system" dnsmasq and, if the
system daemon exists, libvirt should drop the required configuration
into /etc/dnsmasq.d and then restart it. If the system daemon is not
installed or enabled, libvirt can start a private instance as now.
The difficulty with this scheme is that libvirt needs to create some
configuration which enables the services it needs on the virtual network
without disturbing, or being disturbed by, whatever configuration exists
for the system daemon. That's not currently possible, but it can be made
possible. I'm assuming that libvirt needs to provide a set of IP
address / MAC address mappings, and range of IP addresses on a virtual
network. It needs DHCP and DNS service on the virtual network.
The dhcp-host IP/MAC mappings are a non-problem: they will be ignored
for any other subnet where the IP addresses don't fit, and any other
dhcp-hosts in the system configuration will be similarly ignored for
DHCP on the virtual network subnet.
The dhcp-range is more of a problem. Service to particular networks in
dnsmasq is controlled by interface=<interface name"> lines in the
configuration. If there are none of these, service is provided to all
interfaces. If they exist, service is limited to the interfaces
specified. The existence of any dhcp-range line in dnsmasq's
configuration enables the DHCP server for any subnet unless explicitly
limited to particular interfaces. So a default dnsmasq installation,
(with no interface=<interface>) which provides DNS everywhere but DHCP
nowhere would be turned into one which provided DHCP on every interface
by libvirt adding a dhcp-range. Since there wouldn't be a suitable DHCP
range for most subnets, this would only result in logged errors, but it
is still not good.
Worse, there's no good answer to the question 'should libvirt include
interface=virt0"' in the configuration it supplies? If it does, then the
"enable DHCP on all interfaces" problem is solved, but a default system
configuration with no interface declaration is transformed from one
which provides DNS everywhere to one which provides DNS only to the
virtual interface. If libvirt doesn't provide "interface=virt0" and the
system configuration includes interface declarations, then there will be
no DNS or DHCP service to the virtual network.
To solve this, I propose to add an optional interface name to the
dhcp-range declaration. The semantics of this would be rather odd, but
solve the problem perfectly.
1) for DHCP, if any other dhcp-range exists _without_ an interface name,
them the interface name is ignored and and things behave as before,
otherwise DHCP is only provided to interfaces mentioned in dhcp-range
declarations.
2) for DNS, if there are no interface declarations, things work as
before. If there are interface declarations, the interfaces mentioned in
dhcp-ranges are added to the set which get DNS service.
With these rules, it should be possible for libvirt to drop eg
dhcp-range=interface:virt0,192.168.0.1,192.168.0.240
into the configuration of the system dnsmasq and get DHCP and DNS
service for virt0, irrespective of any other configuration in the system
dnsmasq, and doing so shouldn't affect the services supplied elsewhere.
The code in libvirt to make this work looks like this:
echo dhcp-range=interface:virt0,<ip range> >>/etc/dnsmasq.d/libvirt
if <system dnsmasq is not installed or not enabled>
dnsmasq --interface=virt0\
--bind-interfaces --conf-file=/etc/dnsmasq.d/libvirt
else
/etc/init.d/dnsmasq restart
(The --bind-interfaces in the private-dnsmasq instance keeps dnsmasq
from clashing with other nameservers eg BIND which may be running.)
The system dnsmasq package has to ensure that /etc/dnsmasq.d is read for
configuration fragments, and the dnsmasq package and the libvirt package
will have to co-operate to manage transitions between private and system
dnsmasq mode caused by package installation or removal.
Does that make sense? It's a long and involved explanation to come to
cold. I fear I may have over-simplified what libvirt is doing with
dnsmasq, in which case please enlighten me and I'll modify my scheme to
take that into account. If this looks good I can easily have the
necessary dnsmasq changes in the next release.
Cheers,
Simon.