Valid HTML 4.01 Transitional

Routing on CouchNet (Interim)

James F. Carter <jimc@jfcarter.net>, 2016-10-29

This is an interim report on changes to routing on CouchNet. Eventually it should be merged into a complete document.

Since early 2016 or maybe before, I have struggled with two behaviors seen only on Xena, the Wi-fi laptop, only on IPv6. First, the default route would expire at random intervals. And it would sometimes get into a mode where SSH would timeout during banner exchange, in other words, the connect(2) system call returns OK but no packets come back from the peer.

Network Geometry

The basic geometry goes like this:

Default Route Timeouts

The first problem, default route timeouts, occurred because 802.11 is flaky about sending multicast packets. In one test I had Router Advertisements coming from Jacinth about every 60 secs (random); in a 71 minute test I caught 65 RA's going to ff02::1 (all nodes multicast), but Xena received only 23 of them (65% lost), and there were 6 incidents where no packet was received for over 300 secs, which then was the default route lifetime, so it expired. Long intervals of lost packets seem to occur more often than expected by chance.

It was helpful to max out the lifetime at 1800 secs, but this did not give a complete cure. So I wrote a daemon that on most hosts checks if there is a default route, and if not, it sends a Router Solicitation, reliably receiving a unicast Router Advertisement. But on Wi-fi hosts it unconditionally sends the RS every 300 secs. I wanted to send the RS only when the default route was about to expire, but I could not find the expiration time in any of the /proc/net/ files. [Update: now I less efficiently do ip -6 route show, get the expiration, and wake up the daemon just before.]

In another incident, Jacinth's DNSMasq was configured to send Router Advertisements, but it didn't, causing the default route to expire on all hosts. When restarted it sent the RA's for about 5 minutes, then went back to not sending them. I reverted to using radvd to send RA's, and the default route was again reliably present. I never found out why DNSMasq stopped sending RA's. This is dnsmasq-2.71.

Black Hole Route

The SSH timeout turned out to be a routing issue. SSH was not the only affected service; HTTP also would not connect, but the clients successfully failed over to IPv4 after a while, which SSH did not do. Before interventions, these were the routes on all hosts. Where the device is shown as br0, virtual machines had eth0 and Xena had wlan0, when that is the master device on the local LAN. Jacinth's addresses end in c1 (fixed) and 3044 (EUI-64 from RFC 4862).


2001:470:1f05:844::/64 dev br0 (local LAN's prefix)
fe80::/64 dev br0 (link local +EUI-64)
default via fe80::201:c0ff:fe12:3044 dev br0 proto ra (all but Jacinth)
default via 2001:470:1f05:844::c1 dev br0 (static on all but Jacinth, Oso, Xena)
2001:470:1f05:844::/108 via fe80::201:c0ff:fe12:3044 dev wlan0 proto ra (Xena only)
2001:470:1f05:844::c3 dev wlan0 (Xena's own address)
2001:470:1f05:844:42b8:9aff:feb1:9c85 dev wlan0 (Xena's own address)

I'm not sure (after fixing stuff) where 2001:470:1f05:844::/108 came from or why it was only on Xena, but it covers incorrectly the VPNs that are on Jacinth. The actual address ranges are:

2001:470:1f04:844::0/64 192.9.200.128/25 Entire local LAN
2001:470:1f04:844::0/112 192.9.200.192/26 Static IPs on main net
2001:470:1f05:844::1:0/112 192.9.200.224/27 DNSMasq dynamic adr (on link)
2001:470:1f05:844::2:0/112 192.9.200.160/29 StrongSwan (IPSec) dynamic adr
2001:470:1f05:844::3:0/112 192.9.200.128/28 OpenVPN-443 dynamic adr
2001:470:1f05:844::4:0/112 192.9.200.144/28 OpenVPN-1194 dynamic adr

I fixed /etc/dnsmasq.d/dhcp.conf to send out

dhcp-option = option:classless-static-route, 0/0, 0.0.0.0, 192.9.200.160/29, 0.0.0.0, 192.9.200.128/28, 0.0.0.0, 192.9.200.144/28, 0.0.0.0

For IPv4 DHCP in option:classless-static-route, the default route must be included and should come first, in addition to option:router, and 0.0.0.0 is replaced by the IPv4 address of the net interface on which the DHCP response is being sent out. It would have been neater to use DHCP6 to send out IPv6 routes, but there is no corresponding option for DHCP6. There is, however, a RFC draft for DHCP6 routing which is slowly moving through the standards process.

I fixed /etc/radvd.conf to send out a prefix of 2001:470:1f04:844::/112 (on link, not autonomous because too few bits), and it has routes for the three VPNs. I could combine two of them into a /111 route, but I had a lot of trouble with the getting the bits to come out right, so I kept all three routes separate. Radvd has a nasty habit that it truncates all routes to the length of its prefix, e.g. if your prefix is 2001:470:1f05:844::/64 and you define a route to 2001:470:1f05:844::2:0/112 and dump the Router Advertisement (with radvdump) you see that the route is allegedly to 2001:470:1f05:844::/112, bits not changed but the ending :2:0 is lost. This is obviously a bug (or maybe a feature) of radvd. A workaround was to make the prefix length equal to the route length, here 112 bits. That's neither a bug nor a feature; it's a mess! But it works. And the clients ignore AdvAutonomous off and do RFC 4862 autoconfiguration: they truncate the prefix to 64 bits, append the EUI-64, and assign that address to the NIC. Which is a welcome development for me but is radically not RFC compliant.

Was the business with route truncation the cause of the SSH timeouts during banner exchange? As a test, I restored the failing route configuration on Xena, but left the routes correct on the target Iris. Again, SSH timed out during banner exchange. Xena's initial SYN packet went through Jacinth and reached Iris. Iris sends a SYN-ACK, repeated 5 times since Xena never replies. Jacinth sends the first one to Xena but not the duplicates. Xena sends a different packet to Iris, duplicated several times until the timeout, which Iris does not receive.

The route that should have attracted VPN traffic to be routed through Jacinth instead ran all traffic through Jacinth, including to on-link hosts. But Jacinth has the bridge between Wi-Fi and 802.3 wired Ethernet. There is a difference, though, between a packet routed on-link with the MAC address of the target (Iris), versus going via a router, with Jacinth's MAC address. The former packet is handled by the bridge, and is not destined for Jacinth at level 2, and everything works as if the source (Xena) were on wired Ethernet. But the latter lands on Jacinth, which has to make a routing decision. I never intended for Jacinth to handle this particular route, and either there are options that need to be set to make it work, or (more likely) the packet wanders through code paths that should never be used and which never work together reliably.

I'm concluding that the botched route probably did cause the timeouts because Jacinth sometimes forwarded the traffic and sometimes didn't. I can probably add a firewall rule that detects a packet going out the same net interface it came in on, and reject it with net unreachable, which will trigger a prompt failover to IPv4, improving useability.

Although the correct Router Advertisements were going out, the client (Xena) failed to install the advertised routes. This was caused, and fixed, by more than one setting in /proc/sys/net/ipv6.

/proc/sys/net/ipv6/conf/all/accept_ra_rt_info_max_plen

This value is the length of a route, and routes, received in Router Advertisements, longer than the value are tossed. (/usr/src/linux/Documentation/networking/ip-sysctl.txt lies, saying that routes equal to or longer than the value are tossed. See /usr/src/linux/net/ipv6/ndisc.c , look for in6_dev->cnf.accept_ra_rt_info_max_plen .) It was 0, the default, which means to accept the default route but none other. Changing it to 128 made that test accept all routes. The specific values of 0 for the various interfaces do not seem to be honored. But the routes still did not appear.

/proc/sys/net/ipv6/conf/*/accept_ra_rtr_pref

This boolean value (0 or 1) must be true to accept any routes (other than the default route?) from Router Advertisements. It was 1 for all, default and lo; 0 for wlan0 and eth0 (which should have followed the default configuration). Setting it to 1 for wlan0 allowed routes to be accepted. Except…

/proc/sys/net/ipv6/conf/*/accept_ra_defrtr

This boolean value (0 or 1) must be true to accept the default route from a Router Advertisement. it is set like accept_ra_rtr_pref: 1 for all, default and lo; 0 for wlan0 and eth0.

/proc/sys/net/ipv6/conf/*/accept_ra

This boolean value (0 or 1) must be true to accept Router Advertisements. For me it is set like accept_ra_rtr_pref: 1 for all, default and lo; 0 for wlan0 and eth0. I have a line in /etc/sysctl.conf to set it to 1 for all at boot, but something else -- the finger of blame points at NetworkManager -- changes it to 0 again.

I put a test in my Router Solicitation daemon that turns on these settings, and now the routes are accepted reliably.

Conclusion

Routing is kind of a black art on IPv6 because the services that send out the Router Advertisement packets, radvd and DNSMasq, appear to each have their own bugs that have to be worked around. However, it is possible to get a route design that radvd will send out without mangling it, and that the clients will accept and install.