[libvirt-users] Networking issues with lxc containers in AWS EC2

I've created an EC2 AMI for AWS that essentially represents a CentOS 7 "hypervisor" image. I deploy instances of these in AWS and create an number of libvirt based lxc containers on each of these instances. The containers run fine within a single host and have no problem communicating with themselves as well as with their host, and vice versa. However, containers hosted in one EC2 instance cannot communicate with containers hosted in another EC2 instance. We've tried various tweaks with our Amazon VPC but have been unable to find a way to solve this networking issue. If I use something like VMware or KVM and create VMs using this same hypervisor image, the containers running under these VMs can communicate with with each other, even across different hosts. My real question is has anyone tried deploying EC2 images that host containers and have figured out how to successfully communicate between containers on different hosts? Peter

On 03/31/2016 06:43 PM, Peter Steele wrote:
I've created an EC2 AMI for AWS that essentially represents a CentOS 7 "hypervisor" image. I deploy instances of these in AWS and create an number of libvirt based lxc containers on each of these instances. The containers run fine within a single host and have no problem communicating with themselves as well as with their host, and vice versa. However, containers hosted in one EC2 instance cannot communicate with containers hosted in another EC2 instance.
We've tried various tweaks with our Amazon VPC but have been unable to find a way to solve this networking issue. If I use something like VMware or KVM and create VMs using this same hypervisor image, the containers running under these VMs can communicate with with each other, even across different hosts.
What is the <interface> config of your nested containers? Do they each get a public IP address?
My real question is has anyone tried deploying EC2 images that host containers and have figured out how to successfully communicate between containers on different hosts?
No experience with EC2, sorry.

On 04/01/2016 02:07 PM, Laine Stump wrote:
On 03/31/2016 06:43 PM, Peter Steele wrote:
I've created an EC2 AMI for AWS that essentially represents a CentOS 7 "hypervisor" image. I deploy instances of these in AWS and create an number of libvirt based lxc containers on each of these instances. The containers run fine within a single host and have no problem communicating with themselves as well as with their host, and vice versa. However, containers hosted in one EC2 instance cannot communicate with containers hosted in another EC2 instance.
We've tried various tweaks with our Amazon VPC but have been unable to find a way to solve this networking issue. If I use something like VMware or KVM and create VMs using this same hypervisor image, the containers running under these VMs can communicate with with each other, even across different hosts.
What is the <interface> config of your nested containers? Do they each get a public IP address? Yes, they all have public IPs on the same subnet. When deployed in a VM environment on premises, the containers have no problems. Amazon clearly does something with the packets though and the containers can't talk to each other.
My real question is has anyone tried deploying EC2 images that host containers and have figured out how to successfully communicate between containers on different hosts?
No experience with EC2, sorry.
I think we'll need to go to Amazon themselves to resolve this issue. There is very little information out there about how to get lxc containers to work properly in EC2.

On 04/01/2016 07:04 PM, Peter Steele wrote:
On 04/01/2016 02:07 PM, Laine Stump wrote:
On 03/31/2016 06:43 PM, Peter Steele wrote:
I've created an EC2 AMI for AWS that essentially represents a CentOS 7 "hypervisor" image. I deploy instances of these in AWS and create an number of libvirt based lxc containers on each of these instances. The containers run fine within a single host and have no problem communicating with themselves as well as with their host, and vice versa. However, containers hosted in one EC2 instance cannot communicate with containers hosted in another EC2 instance.
We've tried various tweaks with our Amazon VPC but have been unable to find a way to solve this networking issue. If I use something like VMware or KVM and create VMs using this same hypervisor image, the containers running under these VMs can communicate with with each other, even across different hosts.
What is the <interface> config of your nested containers? Do they each get a public IP address? Yes, they all have public IPs on the same subnet. When deployed in a VM environment on premises, the containers have no problems. Amazon clearly does something with the packets though and the containers can't talk to each other.
You say they can talk among containers on the same host, and with their own host (I guess you mean the virtual machine that is hosting the containers), but not to containers on another host. Can the containers communicate outside of the host at all? If not, perhaps the problem is iptables rules for the bridge device the containers are using - try running this command: sysctl net.bridge.bridge-nf-call-iptables If that returns: net.bridge.bridge-nf-call-iptables = 1 then run this command and see if the containers can now communicate with the outside: sysctl -w net.bridge.bridge-nf-call-iptables=0
My real question is has anyone tried deploying EC2 images that host containers and have figured out how to successfully communicate between containers on different hosts?
No experience with EC2, sorry.
I think we'll need to go to Amazon themselves to resolve this issue. There is very little information out there about how to get lxc containers to work properly in EC2.
Well, if they've allowed your virtual machine to acquire multiple IP addresses, then it would make sense that they would allow them to actually use those IP addresses. I'm actually more inclined to think that the packets simply aren't getting out of the virtual machine (or the responses aren't getting back in).

On 04/02/2016 05:20 PM, Laine Stump wrote:
You say they can talk among containers on the same host, and with their own host (I guess you mean the virtual machine that is hosting the containers), but not to containers on another host. Can the containers communicate outside of the host at all? If not, perhaps the problem is iptables rules for the bridge device the containers are using - try running this command:
sysctl net.bridge.bridge-nf-call-iptables
If that returns:
net.bridge.bridge-nf-call-iptables = 1
then run this command and see if the containers can now communicate with the outside:
sysctl -w net.bridge.bridge-nf-call-iptables=0
This key doesn't exist in the CentOS 7 image I'm running. I do have a bridge interface defined of course, although we do not run iptables. We don't need this service when running our software on premise. Actually, in CentOS 7 the iptables service doesn't exist; there's a new service called firewalld that serves the same purpose. We don't run this either at present.
Well, if they've allowed your virtual machine to acquire multiple IP addresses, then it would make sense that they would allow them to actually use those IP addresses. I'm actually more inclined to think that the packets simply aren't getting out of the virtual machine (or the responses aren't getting back in).
The difference is that the virtual machine itself isn't assigned the IPs but rather containers under the AWS instance and something with how Amazon manages their stack prevents the packets from one container to the other. The very fact that the exact same software runs fine in VMs under say VMware or KVM but not VMs under AWS clearly points to AWS as the ultimate source of the problem.

On 04/07/2016 09:50 AM, Peter Steele wrote:
On 04/02/2016 05:20 PM, Laine Stump wrote:
You say they can talk among containers on the same host, and with their own host (I guess you mean the virtual machine that is hosting the containers), but not to containers on another host. Can the containers communicate outside of the host at all? If not, perhaps the problem is iptables rules for the bridge device the containers are using - try running this command:
sysctl net.bridge.bridge-nf-call-iptables
If that returns:
net.bridge.bridge-nf-call-iptables = 1
then run this command and see if the containers can now communicate with the outside:
sysctl -w net.bridge.bridge-nf-call-iptables=0
This key doesn't exist in the CentOS 7 image I'm running.
Interesting. That functionality was moved out of the kernel's bridge module into br_netfilter some time back, but that was done later than the kernel 3.10 that is used by CentOS 7. Are you running some later kernel version? If your kernel doesn't have a message in dmesg that looks like this: bridge: automatic filtering via arp/ip/ip6tables has been deprecated. Update your scripts to load br_netfilter if you need this. and the bridge driver is loaded, then that key should be available. Of course if you don't have it, that's equivalent to having it set to 0, so you should be okay regardless of why it's missing.
I do have a bridge interface defined of course, although we do not run iptables. We don't need this service when running our software on premise. Actually, in CentOS 7 the iptables service doesn't exist; there's a new service called firewalld that serves the same purpose. We don't run this either at present.
The iptables service is not the same thing as the iptables kernel module. Even firewalld uses the iptables kernel module (libvirt doesn't care about any service named iptables, but does use firewalld if it's running).
Well, if they've allowed your virtual machine to acquire multiple IP addresses, then it would make sense that they would allow them to actually use those IP addresses. I'm actually more inclined to think that the packets simply aren't getting out of the virtual machine (or the responses aren't getting back in).
The difference is that the virtual machine itself isn't assigned the IPs but rather containers under the AWS instance and something with how Amazon manages their stack prevents the packets from one container to the other. The very fact that the exact same software runs fine in VMs under say VMware or KVM but not VMs under AWS clearly points to AWS as the ultimate source of the problem.
I wouldn't be too quick to judgement. First take a look at tcpdump on the bridge interface that the containers are attached to, and on the ethernet device that connects the bridge to the rest of Amazon's infrastructure. If you see packets from the container's IP going out but not coming back in, check the iptables rules (again - firewalld uses iptables to setup its filtering) for a REJECT or DISCARD rule that has an incrementing count. I use something like this to narrow down the list I need to check: while true; do iptables -v -S -Z | grep -v '^Zeroing' | grep -v "c 0 0" | grep -e '-c'; echo '**************'; sleep 1; If you don't see any REJECT or DISCARD rules being triggered, then maybe the problem is that AWS is providing an IP address to your container's MAC, but isn't actually allowing traffic from that MAC out onto the network.

On 04/11/2016 11:33 AM, Laine Stump wrote:
Interesting. That functionality was moved out of the kernel's bridge module into br_netfilter some time back, but that was done later than the kernel 3.10 that is used by CentOS 7. Are you running some later kernel version?
If your kernel doesn't have a message in dmesg that looks like this:
bridge: automatic filtering via arp/ip/ip6tables has been deprecated. Update your scripts to load br_netfilter if you need this.
and the bridge driver is loaded, then that key should be available. Of course if you don't have it, that's equivalent to having it set to 0, so you should be okay regardless of why it's missing.
Ah, you were right. I'd forgot that the AMI I've using was one running the 4.0.5 ml kernel. We discovered that bonded interfaces running with mode 5 or 6 do not work with lxc containers (the host's ARP table does not get updated). The issue was fixed in the 4.0.5 kernel so we ran for a short time with this kernel, only to later abandon this kernel due to a bug with software RAID. I've reverted the kernel back to 3.10 on the AWS instances I'm using the net.bridge.bridge-nf-call-iptables key is now present. It's already set to 0 though so there is nothing that needs to be done here.
I wouldn't be too quick to judgement. First take a look at tcpdump on the bridge interface that the containers are attached to, and on the ethernet device that connects the bridge to the rest of Amazon's infrastructure. If you see packets from the container's IP going out but not coming back in, check the iptables rules (again - firewalld uses iptables to setup its filtering) for a REJECT or DISCARD rule that has an incrementing count. I use something like this to narrow down the list I need to check:
while true; do iptables -v -S -Z | grep -v '^Zeroing' | grep -v "c 0 0" | grep -e '-c'; echo '**************'; sleep 1;
If you don't see any REJECT or DISCARD rules being triggered, then maybe the problem is that AWS is providing an IP address to your container's MAC, but isn't actually allowing traffic from that MAC out onto the network.
I'll get this test setup. Unfortunately I'm not particularly knowledgeable with iptables; we don't use it in our product so I've never had to deal with it. I think you are right though about what's happening--AWS doesn't recognize the MAC addresses for containers running under another instance.

On 04/12/2016 01:37 PM, Peter Steele wrote:
On 04/11/2016 11:33 AM, Laine Stump wrote: I wouldn't be too quick to judgement. First take a look at tcpdump on the bridge interface that the containers are attached to, and on the ethernet device that connects the bridge to the rest of Amazon's infrastructure. If you see packets from the container's IP going out but not coming back in, check the iptables rules (again - firewalld uses iptables to setup its filtering) for a REJECT or DISCARD rule that has an incrementing count. I use something like this to narrow down the list I need to check:
while true; do iptables -v -S -Z | grep -v '^Zeroing' | grep -v "c 0 0" | grep -e '-c'; echo '**************'; sleep 1;
If you don't see any REJECT or DISCARD rules being triggered, then maybe the problem is that AWS is providing an IP address to your container's MAC, but isn't actually allowing traffic from that MAC out onto the network.
I'll get this test setup. Unfortunately I'm not particularly knowledgeable with iptables; we don't use it in our product so I've never had to deal with it. I think you are right though about what's happening--AWS doesn't recognize the MAC addresses for containers running under another instance.
I did this test and there were no REJECT or DISCARD rules being triggered. I did discover something interesting though. I had two AWS instances running with some libvirt containers on each. I did a ping from one AWS instance to an IP assigned to a container on another AWS instance. The ping failed, and when I checked the source host's arp table the mac address that was recorded for the container being pinged was that of the container's host instance's br0 interface, not the mac address of the container's eth0 interface. Doing the same test on premise using KVM based instances, when a ping was run from one VM to a container hosted on another VM, the arp table of the source VM contained the mac address of the eth0 interface bound to the container, not the mac address of its host VM. This indicates to me that AWS thinks all of the IP addresses that have been allocated to an instance will be bound to that instance and it doesn't try to go any further than that. I'm not exactly sure how to get AWS to route these addresses properly, but it doesn't seem to be an issue with libvirt per se. Peter

On 04/14/2016 03:35 PM, Peter Steele wrote:
On 04/12/2016 01:37 PM, Peter Steele wrote:
On 04/11/2016 11:33 AM, Laine Stump wrote: I wouldn't be too quick to judgement. First take a look at tcpdump on the bridge interface that the containers are attached to, and on the ethernet device that connects the bridge to the rest of Amazon's infrastructure. If you see packets from the container's IP going out but not coming back in, check the iptables rules (again - firewalld uses iptables to setup its filtering) for a REJECT or DISCARD rule that has an incrementing count. I use something like this to narrow down the list I need to check:
while true; do iptables -v -S -Z | grep -v '^Zeroing' | grep -v "c 0 0" | grep -e '-c'; echo '**************'; sleep 1;
If you don't see any REJECT or DISCARD rules being triggered, then maybe the problem is that AWS is providing an IP address to your container's MAC, but isn't actually allowing traffic from that MAC out onto the network.
I'll get this test setup. Unfortunately I'm not particularly knowledgeable with iptables; we don't use it in our product so I've never had to deal with it. I think you are right though about what's happening--AWS doesn't recognize the MAC addresses for containers running under another instance.
I did this test and there were no REJECT or DISCARD rules being triggered. I did discover something interesting though. I had two AWS instances running with some libvirt containers on each. I did a ping from one AWS instance to an IP assigned to a container on another AWS instance. The ping failed, and when I checked the source host's arp table the mac address that was recorded for the container being pinged was that of the container's host instance's br0 interface, not the mac address of the container's eth0 interface.
Doing the same test on premise using KVM based instances, when a ping was run from one VM to a container hosted on another VM, the arp table of the source VM contained the mac address of the eth0 interface bound to the container, not the mac address of its host VM.
This indicates to me that AWS thinks all of the IP addresses that have been allocated to an instance will be bound to that instance and it doesn't try to go any further than that. I'm not exactly sure how to get AWS to route these addresses properly, but it doesn't seem to be an issue with libvirt per se.
Peter
I finally got this to work, using proxy arp. I just need to apply the following settings on each EC2 instance: echo 1 > /proc/sys/net/ipv4/conf/br0/forwarding echo 1 > /proc/sys/net/ipv4/conf/br0/proxy_arp_pvlan echo 1 > /proc/sys/net/ipv4/conf/br0/proxy_arp echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects echo 0 > /proc/sys/net/ipv4/conf/br0/send_redirects With these settings my containers and hosts have full connectivity and behave just like they are on the same subnet on-premise. This works for CentOS 7 at least, but I assume the same solution would work for Ubuntu. Peter
participants (2)
-
Laine Stump
-
Peter Steele