Some info about WireGuard, a new VPN:
Project website: www.wireguard.com
Inception 2015.
Lead developer: Jason A. Donenfeld. WireGuard
and service
marks are registered trademarks of Jason A. Donenfeld.
Website sponsors: ZX2C4 (Jason A. Donenfeld) and Edge Security.
I have just gone through yet another audit of my VPNs, making sure that
they work for all relevant clients and that the vpn-tester program can
competently report if they are or aren't working. Currently my two servers run
StrongS/WAN IPSec (strongswan-ipsec-5.9.3 on SuSE Tumbleweed) and OpenVPN
(openvpn-2.5.3 on SuSE Tumbleweed). The clients have Linux (same versions)
and Android: strongSwan VPN Client
(version 2.3.3,
org.strongswan.android) and OpenVPN for Android
(version 0.7.25,
de.blinkt.openvpn). Both VPNs work well when properly configured, but they
have a number of less than wonderful features:
The learning curve is steep for routing packets properly through the tunnel (of course excluding bearer packets) and for providing credentials in the required form to authenticate the two ends.
The network paradigm for IPSec is, if an IPv6 packet has an IPSec header, the corresponding Security Association has the information (crypto algo, key, etc.) which the kernel can use to decrypt it. The headers and payload thus revealed are than processed normally. Outgoing packets are selected by a Traffic Selector (different from normal routing) and the inverse transformation is performed, after which the encrypted packet is sent out through normal routing. IPv4 uses ESP and AH protocol packets instead. I found that it was often a challenge to get the Traffic Selector right, and it was also a challenge to extract the cert's Distinguished Name in the format Charon wants. (Using a SAN turns out to be easier.)
OpenVPN uses a tun/tap device, and payload packets pop out of it or are stuffed into it by normal routes, as if it were a physical net interface. It's a lot easier to handle routing in this context, which WireGuard shares.
IPSec connects fairly promptly, initially or after a net disruption, but OpenVPN takes several seconds to do this.
Both IPSec and OpenVPN have a lot of code in the packages: around 400,000 and 600,000 lines of code. This doesn't affect the end user directly, but I remeber a quote from Weitse Venema, the author of the Postfix mail transport agent: he says his code has about one bug per thousand lines, and if you introduce complexity (he was talking about TLS for mail transport) you should think about exploits against the bugs and accidental loss of service.
Responding to shortcomings in existing VPN software, Jason A. Donenfeld in 2015 began to develop WireGuard, a new VPN. The project website describes these features; whether they're scored as good or bad depends on the user's goals.
Drastically reduced complexity; features not absolutely essential were sacrificed for this goal. He claims only 4000 lines of code.
A lot fewer configurable aspects; for example there is [currently] only one crypto algo, ChaCha20Poly1305 by Daniel Bernstein, so no configuration and no negotiation with the peer. Negotiation is very complex for both IPSec and OpenVPN.
An ED25519 (Elliptic Curve Diffie-Hellman) public key, locally generated, serves as both the authentication credential and the foundation of the tunnel's encryption, similar to the style of SSH.
WireGuard does UDP only, no TCP and no unusual protocols like ESP. Check out udptunnel and udp2raw for add-on layers if you need TCP (which I do).
Very fast connection setup: the initiator sends one handshake packet, the responder sends one back, whereupon they both can infer the symmetric keys to encrypt payloads. The CPU time needed to do this is obviously minimal.
The ChaCha20Poly1305 symmetric crypto algo (AEAD type) is faster than the competitors.
The protocol isn't chatty: the only packets sent are payloads and key establishment (and rekeying). You can configure keepalive packets (zero-length payloads) if your net needs them. The responder doesn't reapond and doesn't expend resources on unauthorized initiators.
Some details of the Elliptic Curve Diffie-Hellman key establishment procedure are interesting. See this Wikipedia article about ECDH, which I've summarized. See also the EdDSA (Edwards Curve Digital Signature Algorithm) article, the section on ED25519.
Parameters for ECDH are agreed on in advance; WireGuard has only one set of parameters built in. It uses a modular field. NSA guidance says that a field of size in the range of 2^256 is sufficient for protecting Top Secret data; that is, to crack the crypto the Black Hats would have to run billions of dollars worth of computers for a year or more to crack one key (and WireGuard re-keys about every 2 minutes). The actual modular field size is 2^255-19.
A private (secret) key for ECDH is a randomly chosen point in the modular field, basically a 255 bit random number excluding the 19 that won't fit. Call it S. For the public key, a number G is agreed on, and it is added to itself S times; that is, G is multiplied by S. Call the product Q.
An attacker can recover the private key by dividing Q by G. This would be easy if the operands were integers, but in the modular field, doing the division will need similar effort as in cracking a 128 bit symmetric key, half the ECDH bit length. This is the effort level currently considered adequate for protecting Top Secret data.
For each connection (or re-key), an ephemeral Diffie-Hellman key
pair is created by each peer. Generating the private key would
normally include one or more whitening
steps through a
pseudorandom generator, requiring a 256 bit multiplication, plus
another multiplication for the public key, but this is a lot less
effort than generating a prime factor key pair (RSA).
The initiator sends its static (permanent) public key and the ephemeral public key that it just generated; neither is encrypted, so the attackers know them. The responder sends back its ephemeral public key, but the initiator is supposed to have the responder's static public key in a configuration file. Each peer also sends a counter which prevents replay attacks, and an encrypted dummy payload which, if successfully decrypted, assures each peer that the other one holds the private keys that correspond to both public keys that it proffered or that was configured.
Each peer, for each of the static and ephemeral keys, multiplies the other end's public key by its own private key. Since multiplication in modular rings (including fields) is commutative, both will get the same answer: the Diffie-Hellman shared secret. The peers hash up the shared secrets with an agreed-upon algo to produce the symmetric key which they will use to encrypt or decrypt payloads.
For authentication, the dummy payload includes a HMAC, or the symmetric encruption algo is AEAD type which includes a HMAC, so each end can tell authoritatively whether it decrypted the payload successfully, and if so, they know that the other end used the private key corresponding to the public key that was proffered, in other words, that security judgments about the public key do apply equally to the peer that proffered that key.
For authorization, the initiator has the responder's static public
key in a configuration file. The responder has a list of public keys
of every initiator authorized to connect. It knows on the first packet
who the initiator claims to be, and will only respond if that public
key is on its list. The list can be added to or pruned on the fly by
the provided wg
utility, which would be called by out-of-band
facilities that aren't part of WireGuard, analogous to the
charon
key management daemon of StrongS/WAN (IPsec) and related
functions in OpenVPN.
What are my goals for the VPNs, and how much hassle will it be to make WireGuard deliver what I need, so I can add it to my collection?
Resistance to bit-rot: changes in the system configuration tend to have a bad effect on operation of the VPNs, and I hope WireGuard will be affected less than IPSec and OpenVPN.
While I always use UDP if feasible, I've found that hotel Wi-Fi often blocks UDP in general and VPN ports in particular. Thus I support and use TCP on port 443, which the Wi-Fi access points have got to pass. (Note: some authoritarian nations block VPN ports nationally. 443/TCP can bypass this, but the Secret Police could recognize that it was fake, with baleful consequences for the perpetrator.)
There are four VPN routes that need to work:
The segment tunnel, from the main router Jacinth to a host in the cloud. See A Server in the Clouds for what is being accomplished here. Basically the local net's default route for IPv6 goes out via the cloud host.
Xena and Petra to Jacinth: Xena is my laptop, which roams, and Petra is a virtual machine on it for development. See Network for Xena's Virtual Machine. But how do other hosts know where to send packets destined for Xena and Petra? Solution: they always send via Jacinth, and Xena always initiates a VPN to Jacinth even when it's not roaming.
Selen to Jacinth. Selen is a cellphone with Android, and it also roams. To get to the webmail and PIM server from off-site it needs a VPN. See Jimc's Awesome PIM Server for the packages I'm using. Unlike Xena, Selen would prefer to not use the VPN unless it is both roaming and using PIM, or is working with local LAN hosts, or is doing something for which privacy or integrity is critical.
For testing the VPNs, two local LAN hosts are chosen (not necessarioy the same every time, depending on which are up or down). Host A connects to Jacinth's VPN server and sends packets to B; the tester checks if the packets go direct (the NoVPN case) or via Jacinth, and whether their content is really encrypted. For this to work, every LAN host has to be able to connect to Jacinth's WireGuard. The test is done daily; most of the time the LAN host is not using WireGuard.
For authorization, VPNs can be set up two ways. In the historic design each connection has individual credentials installed, typically in the form of pre-shared symmetric keys. Modern versions, such as IPSec and OpenVPN (starting in version 2.x), install a credential (normally a X.509 certificate and private key) on the server and on each client; the client certs are all signed by one Certificate Authority (or intermediate cert) which the server requires in the client's trust chain. The server doesn't need the clients' certs individually. I don't really have a big herd of users and I can handle either arrangement, but X.509 certs are what I'm using now. Preview: for WireGuard, each connected pair of peers needs to be configured at each end, but the credentials are Elliptic Curve Diffie-Hellman public keys of 255 bits (32 octets) that are also used for the crypto, not pre-shared symmetric keys.
There's an issue that makes a lot of trouble for designing a net with VPNs:
some clients always use the VPN and some don't. I'm implicitly assuming a
central server that all the clients work through. For the always
VPN
case the right
routing setup is to assign the client's hostname
to a fixed address on the VPN tunnel device. The server always has, and
advertises, a route to this client through its own VPN tunnel endpoint, and the
rest of the LAN sends to the client via the server. My laptop and my cloud
server operate this way.
The harder case is when the client sometimes uses the VPN, and sometimes doesn't, like my VPN tester and my cellphone. It's a total can of worms to set up a route via the server when the client connects, and to make this route go away when it disconnects, particularly when other LAN members need to originate connections to the VPN client, directly or via the server, depending. The way I'm handling this on the other VPNs is, the client's name is assigned to a fixed IP on its egress interface: Wi-Fi or Ethernet. Peers on the LAN connect to this non-VPN address, except that isn't possible if the cellphone is roaming (because peers don't know the cellular assigned address and I'm not going to mess with dynamic DNS on my cellphone). When the client turns on the VPN, it puts a separate IP address on the VPN endpoint, and the server has a permanent route to this address (or pool) via its own VPN endpoint, which it advertises all the time. Other LAN hosts can originate connections to the client's VPN address, but only when the client has the VPN turned on.
Let's make this design into something a little more concrete that I can turn into a WireGuard conf file.
The clients' WireGuard interfaces have fixed IPs in a range which is disjoint from the local LAN. That is, my assigned address ranges (IPv4+6) have one subnet to be the local LAN, and other subnets for the various VPNs to be on. The clients also have fixed IPs in a different subnet for non-VPN traffic like bearer packets. If the client is roaming it will send bearer packets from a wild-side address assigned by the carrier, which changes when the phone changes protocol (LTE, UMTS, Edge).
If the client uses WireGuard all the time, LAN peers send to its fixed address on WireGuard, because that's what its hostname resolves to. If the client uses WireGuard some of the time, peers send to its non-VPN (ethernet or Wi-Fi) fixed address, which the hostname resolves to.
The main router, called Jacinth, has the IPv4 default route to the wild side and the tunnel for IPv6 (no native IPv6 yet on Verizon or Frontier FIOS, hiss, boo), and the server instances of all the VPNs.
The rest of the local LAN hosts use Jacinth as their default route;
thus if they need to send a packet to a VPN client's endpoint address, they
automatically send it via Jacinth. While some LAN hosts have static
routes, Jacinth's DHCP and radvd announce a default route (IPv4+6) through
Jacinth. This is the actual implementation of the route
advertisements
mentioned earlier.
Jacinth has AllowedIPs and matching routes (courtesy of wg-quick) that send and accept traffic to/from each client's WireGuard fixed IP. This is set up at boot time when WireGuard is started, and is supposed to continue forever. If the client has a subnet inside, e.g. my laptop and its VM, the subnet is configured aa a special case.
The client at the very least needs an AllowedIP and route to the server's WireGuard endpoint address. Some clients will want to send their default routes through the tunnel, but in my use case it turns out that the cloud server, the laptop and the cellphone want traffic to the local LAN (plus other VPN clients and the server) to go through the tunnel, but wild-side traffic should go via their default route on the wild side.
Bearer packets, i.e. the encrypted VPN payloads, need their own route, because they can't go through the tunnel that they are bearing. OpenVPN creates a special route sending traffic destined for the server (i.e. bearer packets) directly there. However, other traffic like SSH and HTTP(S) also goes direct, causing endless customer support issues and information leakage (for the insecure protocols). For WireGuard, if the policy routing Table is configured, and if the default route is sent through the tunnel, wg-quick can create a policy route that diverts only the outgoing bearer packets to that table, into which the original default route is transplanted.
However, that's not quite what I'm doing; my clients don't send the default route through the tunnel except for special hacks. First I save the route that the bearer packets would take before WireGuard messes with routes. After wg-quick sets up the routes, I re-determine the route of the bearer packets, and if they are being sent down the tunnel I divert them back to the original route, copying the method that wg-quick would have used if the default route had been sent down the tunnel.
The server can staticly set up the policy routes to send via normal
routing any bearer packets addressed to the client's local IP. The
right
way to do this is inside-out, i.e. everything that's addressed
to the client's IP and isn't a bearer packet is flipped into the policy
routing alternate table, which routes such packets down the tunnel. I
believe that one WireGuard interface can distinguish packets addressed to
members of a set of clients and can send them to the correct one. Bearer
packets are not flipped in and are routed as if WireGuard were not
operating; they would be sent out to the wild side if the client has
connected from the wild side, i.e. is roaming. WireGuard on the server
will accept a connection from any IP as long as a known public key is
presented, and as long as the packets get through the firewall
(normally promiscuous on a server).
There are several detail issues that bit me:
If the client is configured with the alphabetic hostname of the server, wg-quick will resolve that name and will prefer the IPv6 address. But the client, if roaming, probably can't connect on IPv6. Cure #1: use the server's fixed IP4 address. But my server is residential and its wild-side address is aleatory, though it lasts several weeks before changing. Cure #2: a wrapper script will resolve the endpoint name the way I want and insert the IPv4 address in the configuration file, much like wg-quick removes its commands when doing wg setconf.
When the client is actually not roaming, but even so sends bearer packets to the server's wild side address, WireGuard replies to them with the wild side address as the source (correct), but uses that source to determine the interface to send from, which is on the wild side, where the client isn't. Other daemons use normal routing to send from the interface that can reach the client. Actually when the client's IPv6 address is replied to, the packets are not lost, but are sent to the cloud server, which sends them back on the segment tunnel (taking 30 msec for the round trip), whereupon normal routing on the server gets them to the client. But this kind of routing is not possible with IPv4.
Cure #3: the wrapper will have to switch to the server's LAN address when the client is not roaming, and WireGuard will have to be restarted when it changes between home and roaming.
When the wrapper or wg-quick does DNS domain name resolution, the client needs a non-VPN address and interface that the server (or other DNS source) can send to without involving the WireGuard tunnel that hasn't been established yet.
Conclusion after a fair amont of testing: If WireGuard is up on both Jacinth and the client (Oso, on the LAN), and if the client configures Jacinth's wild side address (IPv4) as the endpoint, then communication with osowg (the WireGuard addresses, IPv4+6) from LAN clients including Jacinth is perfect with no dropped packets. Jacinth is sending bearer packets from its LAN (not wild) address, and the client updates its peer endpoint to this address, as the docs describe. I previously tested setting Oso's peer endpoint to Jacinth's IPv6 wild address and got some weird behavior involving sending bearer packets to the wild side; I need to check if this is still happening. Communication tests included ping, ssh and w3m (web), IPv4+6. I have SSHFP set up for Oso, but osowg is not considered to be the same host, and needs either a known_hosts entry or its own SSHFP records.
For the symmetric cipher on the main channel, WireGuard uses only ChaCha20Poly1305, for which hardware acceleration is very rare. On the Intel Core® i5-10210U, jimc's tests score it as half as fast as hardware accelerated AES-256 (Rijndael), and twice as fast as software AES-256. This difference would only be significant for a server with thousands of clients.
https://www.wireguard.com/quickstart/
ip link add dev wg0 type wireguard #Pick a name for the tunnel device ip address add dev wg0 192.168.2.1/24 [ peer 192.168.2.2 ] if only 1 peer wg setconf wg0 myconfig.conf (wg utility is provided) --or-- wg set wg0 listen-port 51820 private-key /path/to/private-key peer $itsname \ allowed-ips 192.168.88.0/24 endpoint 209.202.254.14:8172 ip link set up dev wg0
wg (with no args) is equiv to wg show (for all interfaces e.g. wg0) wg-quick [up|down|etc] ctlfile
Wireguard wants ECDH (Elliptic Curve Diffie-Hellman) private and public
keys; each is 255 bits (32 bytes) long, or 43 bytes base64 encoded. The
configuraton file may contain the base64 key itself, or the name of a file
containing it. The provided wg
utility can generate them for yous, like
this:
wg genkey | tee privatekey | wg pubkey > publickey
Wireguard does not use X.509 certificates to authenticate/authorize the
peers; authorized keys are preinstalled for each client-server pair. But they
can be installed on the fly by wg
.
You may test with their demo server.
So let's try to set something up. For testing, I'm starting this at 2021-10-07 18:00. I'm going to use these basic steps:
Make sure there's a client for Android. Install it first but don't try to
use it yet. Yes there is one, called WireGuard, with the serpent logo
(®). Inception 2019-10-13, most recent update 11 days ago, 5e5 downloads,
offered by WireGuard Development Team
. You could import a configuration
from a file, or a QR code (!), or create it by hand. I looked at the required
info but didn't create my connection. 7 mins including reading the product
info.
https://wiki.archlinux.org/title/WireGuard
How to get the QR code that the Android client can import. This is from
the Arch Linux wiki
article about WireGuard.
On the Linux desktop host that has the conf file:
qrencode -o outfile -t ansiutf8 -r client.conf
If you omit -o outfile
or specify -o -
the result is on
standard output, and if this is a terminal that can display ANSI UTF-8
characters (see the -t option), the QR code itself becomes visible. You may
need to make the window wider and/or higher to avoid wrapping lines. Suppress
long comments; the maximum size is 4000 characters
. qrencode is from
package qrencode on OpenSuSE Tumbleweed.
The required kernel module is called wireguard.ko and it is in the standard
kernel, version 5.14.11 and likely quite a bit earlier. To pass configuration
information to it (plus displaying connection info and generating keys) you
need wireguard-tools (current version as this is written is 1.0.20210914) from
the OpenSuSE Tumbleweed main distro. Older versions are available for Leap
15.3 and 15.2. 72Kb to download, 145Kb installed. No dependent packages; it
only requires systemd and libc. The package only contains the wg
and
wg-quick
commands, and documentation.
wg-quick is a wrapper around wg for simple configurations. When either
command is given just an interface name such as wg0
, the corresponding
configuration file is sought in /etc/wireguard/wg0.conf, whereas if an absolute
pathname is given the interface is inferred from the basename of the conf file.
The interface name may be up to 15 bytes of [a-zA-Z0-9_=+.-] . (You don't
specify the interface name inside the conf file.)
On Xena I also installed NetworkManager-wireguard plus
NetworkManager-wireguard-gnome (you need both for the GUI). These are
experimental
packages, not in the main distro. Find them with the
SuSE package searcher.
Depends on wireguard-tools. Most likely you don't have the developer's
package signing public key; either get it, or ignore Zypper's security warning.
About 20min to install the packages and read the man pages.
A prerequisite is, what port am I going to use? WireGuard doesn't have an IANA port assignment, but documentation often shows 51280 and forum posts and howto's usually show this one. But this port range (all above 32768) is for aleatory ports, and a collision could occur. The BSD Daemon whispered in my ear that since OpenVPN has 1197 assigned, WireGuard should use xx96. Unassigned and stealable port numbers are 2196 4196 4296 4496 4696 4796 4896 4996 5096 and most candidates above this. 42xx is completely vacant and appears to be intended for private use, and I have a local policy to put nonstandard ports in this range, so 4296 is what I will use. I will need to set my firewall to pass 4296/udp in the same cases as it passes 1197/udp.
On the other hand, for the initial tests (that might fail) I don't want to mess with the firewall, so I'll use 4886, the unofficial wakeup port for Android, which my firewall passes from+to the local LAN so the Android hosts can wake each other up.
Here is the client's configuration file for testing. See the genkey
subcommand of wg
for producing your keys. The conf file contains your
private key (not encrypted), so it should have appropriately restrictive
permissions, mode 600. /etc/wireguard is insalled with mode 700, but I set
the individual conf files to 600 anyway. See the man page for wg
for
a small number of additional configurable parameters such as the keepalive
interval, if your net needs it.
[Interface] PrivateKey = qwerty...= # 43 base64 bytes, about 256 bits. Keep the =. ListenPort = 4886 # Android wakeup port, which my firewall # allows, but I'll have to change this later. [Peer] PublicKey = asdfgh...= # 43 base64 bytes, about 256 bits. Endpoint = [2600:3c01:e000:306::8:1]:4886 # IPv6 in [], port after colon AllowedIPs = 147.75.79.213/32,2604:1380:1:4d00::5/128 # www.zx2c4.com. # There can be multiple peers.
About 25min + to figure out the conf file.
Starting about 16:10
The SuSE package wireguard-tools does not include the scripts mentioned in the quick start guide for contacting the demo server.
When wg is used to bring up the connection, it loads the wireguard kernel module, nine crypto modules (that the documentation says it actually uses), udp_tunnel and ip6_udp_tunnel.
Debugging Petra's networking took extra time, but once I switched to test on Xena it took about 10 minutes to turn on WireGuard and do the tests.
I repeated these steps on Surya. The two test activities succeeded.
Given how my VPN tester is designed, it's a whole lot easier if every host has WireGuard installed, specifically wireguard-tools. Doing that now.
OpenVPN and StrongS/WAN assign the client an IP address from a pool, similar to DHCP. But my tunnels are very predictable, so I pre-assigned IPs to potential WireGuard participants, all on the same subnet. Instead, I'm making new address ranges for WireGuard tunnel endpoints: 192.9.200.112/28 (16 addresses) and 2600:3c01:e000:306::9:0/112. The addresses are assigned according to a pattern, but most likely I will get them into /etc/hosts soon.
Each host gets a key pair and a generic conf file with Jacinth as its peer (server) (except Jacinth itself).
This turned into a long and time-consuming learning experience. I'm condensing a lot of failures and listing the high points:
The Quick Start guide is written for a client using wg-quick to control the interface. To the sample conf file shown under Configuration Files I added an Address line (just on Surya, for now); the value is a comma separated list of the IPv4 and IPv6 addresses to be assigned to Surya's wg0.
Starting up first on Surya: wg-quick up wg0
It prints the commands it is executing.
[#] ip link add wg0 type wireguard [#] wg setconf wg0 /dev/fd/63 # It's feeding wg0.conf minus Address etc. [#] ip -4 address add 192.9.200.118 dev wg0 [#] ip -6 address add 2600:3c01:e000:306::9:8 dev wg0 [#] ip link set mtu 1420 up dev wg0 [#] ip -6 route add 2600:3c01:e000:306::7:0/112 dev wg0 [#] ip -4 route add 192.9.200.176/29 dev wg0
I captured them into a script wg0.up
. The routes to Xena
(the last 2 lines) needed a lower metric than the existing ones via
Jacinth, reached through the OpenVPN segment tunnel.
I did analogous setup at Xena's end.
Now here's a nasty issue which I didn't solve in this step: this all looks fine assuming Xena and Surya have their WireGuard connections running. But suppose Xena is connected somewhere else, like Jacinth where it's supposed to be? How does Surya know to not route via wg0?
Starting the tests: no communication. For several days. Payload packets departed from Xena; bearer packets left Xena and arrived on Surya; payload packets were decrupted on Surya and were emitted from wg0; and they weren't answered. The reason was that the firewall needed to be told that wg0 was a tunnel with security implications similar to being on the local LAN, not a minion of the global hacking community. The payloads were reported by tcpdump, and then hit the iptables rules in the firewall, and were tossed. With that fixed, I was able to ping in both directions between Xena and Surya.
This confirms my reading of the man page for wg
: for each
peer (here Xena), AllowedIPs is a list of subnets, packets from that
peer whose source address is in that range are allowed by Surya's
WireGuard to emerge from wg0, and packets on Surya routed to wg0
because their destination address is in that range will
be internally routed to that peer and not to some other peer connected
at the same time. This is similar to an Iroute in OpenVPN.
This interpretation implies that the IP that the peer is using on its wg0 has to be inside the AllowedIPs address range, and the IPs of the other peers have to be outside. If there's a subnet that the peer expects to use the tunnel, as on Xena, it has to be in AllowedIPs. If other hosts on the local LAN expect to connect to this peer, they need to use an address in AllowedIPs and they need to route the traffic via the server (Surya).
It remains to be seen whether two of Surya's peers connected at
the same time can talk to each other, and where the hairpin
routing
occurs: before or after emission from wg0. Both OpenVPN
and IPSec can do this.
Documentation for a
robotics class at the high school level. The organization is the
FRC 3512 Software Team, based in Orcutt, California, USA, a little
north of Vandenberg Air Force Base and the Diablo Canyon nuclear power
plant. (Author and date are not obvious.) They show the WireGuard
configuration file that the students are supposed to use on their
at-home clients. Under [Interface] the Address (for wg-quick) is
assigned uniquely per student. Under [Peer] the endpoint has an
alphabetic hostname (and numeric port). The AllowedIPs are the address
range of the student clients (I think the key occupant is the gateway
that lets them off that subnet), and the subnet that contains the
servers, VMs, etc. that they're supposed to learn to use. According to
them, when the client reports required key not available
, it
means that you sent down the tunnel a packet to an address that the
peer's AllowedIPs did not include, which the peer reported by an ICMP
packet coded for Destination Host Unreachable
(which is not a
lie).
The key lesson for jime is, at each end, AllowedIPs (describing the peer, the other end) has to include the address(es) on the peer's tunnel device, from which outgoing traffic is sent; otherwise this end will reject traffic from the peer.
I wrote a script to generate conf files and up
scripts on each host.
It follows the design plans for the special features on particular hosts.
This way, issues are not forgotten and chewed-up configurations can be
regenerated at will. All hosts now have their proper keys, configurations
and up
scripts.
Petra to Jacinth: no response.
Claude to Jacinth: Routes: 192/26 dev en0; 128/25 dev wg0;
to Surya, pings to $pfx::8:2 are answered but not to $pfx::8:1
Xena to Claude: IPv6 only. Ditto Surya
Jacinth + Iris to Claude: pings IPv4+6
Can't tell if offsite connections are dnatted to Claude via WG or vnet0.
Holly to Jacinth: pinging claude diamond iris jacinth via main LAN: works
pinging petra xena surya via WireGuard: no answer.
xena->holly trcr -6: ov_u_j.cft.ca.us (1:1), holly (i.e. via WG)
xena->holly trcr -4: ov_u_j.cft.ca.us (129), nothing thereafter.
IPv4 on Jacinth sends this via br0.
Got to implement "if client is using WG, route to it; if not, route via br0".
Method 1: every bearer packet on the WG port of type 1 (content inspection)
is cloned with mirred to some netlink socket.
I have two types of clients: those that always use WireGuard, and those
that sometimes use WireGuard. To deal with routing issues, Xena <->
Jacinth and Jacinth <-> Surya always need the VPN, whereas Selen
(Android) uses it only when roaming (and when access to the local LAN is
wanted). The latter scenario is the natural one for OpenVPN and IPSec, so I've
been focused on that so far, but making it work is going to be hard with
WireGuard, so I've decided to switch over to the always on
paradigm, at
least at first. Xena, Jacinth and Surya are the most important hosts on my
net, and it's not acceptable to knock them out with VPN experiments. Among my
other VM's, Claude (the webserver) is also mission-critical, and Petra is
hosted on Xena and is affected by its networking. So to get this project
moving, I revived a disused VM called Oso, hosted on Iris (a leaf node) with
bridge networking, so it is effectively an independent leaf node.
For the first try I'm going to have, for each client, an individual interface (wg-$PEER) with individual addresses from 192.9.200.96/28 and 2600:3c01:e000:306::10:0/112. Later I'll try doing the tunnels on a shared interface like I originally planned.
For the first try on Oso I set up Oso with AllowedIPs = 192.9.200.106, 2600:3c01:e000:306::10:10 (just Jacinth's WireGuard interface addresses for Oso), and Jacinth had AllowedIPs = 192.9.200.122, 2600:3c01:e000:306::9:10 (Oso's WireGuard interface addresses). Oso's firewall was rejecting bearer packets on 4296/udp. This fixed, I could ping the peer's interface addresses, both families, both directions.
Next try is to add Oso's own addresses to AllowedIPs on Jacinth, and just Xena's subnet on Oso. For reconfiguring I'm going to take down WireGuard on both ends first, rather than trying to run wg-quick with a running configuration, since I'm expecting trouble on this one. Yes, Jacinth and Oso can't ping each other, because Jacinth tries to send the bearer packets to Oso via the tunnel that they're bearing. wg-quick has a limited ability to activate policy routing for the bearer packets, but this configuration is not recognized as needing it.
Next try: Jacinth AllowedIPs = Oso WG addresses + 2600:3c01:e000:306::d4/128 (Oso's own IP); Oso is unchanged with Jacinth's WG addresses + Xena subnet. Jacinth can ping all the Oso AllowedIPs mentioned, So can Oso. Xena and Petra can ping Oso's IPv4+6 WG address, but Xena needs to specify its public IP in the -I option of ping (source address) because that's what's in the AllowedIPs on Oso, vs. the endpoint of Xena's tunnel to Jacinth. For traceroute this would be the -s option.
Next try: a script that implements the Wireguard Evolution item for bearer packets down the tunnel. Trying it first on Oso. It works, but didn't solve my problems.
Here are the key principles that I finally worked out, for making a configuration file that gets the packets through.
All participating hosts need a fixed IP address (IPv4+6) that will go on the WireGuard interface, which the other end has to designate as AllowedIPs, by number. This is the Address parameter in the Interface section. On hosts that always use WireGuard, the host's own name will normally resolve to this number.
All participating hosts need another fixed IP which does not go through the VPN, to which bearer packets will be addressed, or which will be the source address of outgoing bearer packets. A host that often omits the VPN should use this number as the referent of its own hostname. The other end will configure this number as the peer's Endpoint. If a host, e.g. the server, never initiates a connection to this peer, specifying the Endpoint is optional and does not have to be accurate, e.g. if the client is roaming, it's impossible for the server to know in advance the client's IP.
I use port 4296 as the ListenPort and Endpoint port on all hosts, to simplify maintenance of the configuration files, and debugging. This should be changed to the official IANA assignment, if one ever materializes.
In addition, the Interface section needs the PrivateKey of this host as a string (44 bytes including ending padding with one '='). The Peer section needs the peer's PublicKey. The configuration file needs to not be publicly readable, to protect the private key. (I wish the private and public keys could be read out of files with mode 600, obviating the restrictive permissions on the conf file, but though this may have been supported in the past, it's not supported now (wireguard-tools-1.0.20210914).)
The server's Peer section needs AllowedIPs for the peer's WireGuard address(es), by number. Omit the CIDR bits; this is a host route. If the client is routing for a subnet (like Xena which has a VM, Petra), the server needs to allow the subnet also (with CIDR bits). The non-VPN address on the client must not be an AllowedIP because otherwise bearer packets would be sent down the tunnel that they are bearing; this caused me endless grief.
In the client's Peer section I put AllowedIPs for my LAN address range, including other VPNs' endpoints but excluding WireGuard addresses. This actually worked and didn't need policy routing; bearer packets did not go down the tunnel. Many people's use case involves AllowedIPs for the default route 0.0.0.0/0 and ::/0 but that's now what I'm doing.
Using the newly installed NetworkManager plugin for WireGuard. Get Xena back on the net.
WireGuard has a big problem if the client sometimes has WireGuard running, and sometimes expects to be contacted on the local LAN.
Tunnel from Jacinth to Surya, change from OpenVPN to WireGuard. Fix bugs; get our IPv6 back on the net.
Jacinth's role on OpenVPN and IPSec is as a generic server: potentially a variety of clients could connect at the same time, authenticating with an X.509 certificate with an acceptable trust chain. This isn't going to fly with WireGuard, since the server has to know the client's public key before it can connect.
Brainwave:
mirred egress mirror dev $IFB, the latter being an Intermediate Functional Block (synthetic interface) which the daemon can listen to. See man tc-mirred for documentation.
WireGuard needs the equivalent of OpenVPN's explicit-exit-notify. When the
kernel module detects that a connection is going down (e.g. ip link del dev
wg0
) it should notify the peer. The rekey timeout seems to be short, under
1 minute, but the rekey attempt only occurs if the non-dead peer sends a
packet, and it's not clear how much state it's keeping for the dead peer and
how significant that is. It just seems neater to notify the surviving peer if
you're closing the connection.
Cryptographic algorithms can't be relied on to last forever, although Rijndael (AES) has lasted with only minimally effective attacks up to 2021 since 1997 (inception, or 2001, anointment in FIPS pub. 197), and ChaCha20 has been widely deployed from 2008 to 2021. It would be a very smart move to add algo negotiation, with the needed info in the dummy payload in the initial handshake packet.
In this scenario you have a chicken and egg situation that results in an omelet. wg-quick already recognizes when the default route is sent through the tunnel and puts in a policy route to divert bearer packets to their original (presumably default) route. But a more limited omelet route is not recognized, nor is the case where such a policy route has already been set up.
The very first step for wg-quick should be to do ip route get
$EndpointIP
, with the IP it's actually going to use (IPv4 or 6), This
route should lead to the peer's non-tunnel address. When wg-quick finishes
setting up routes, including running PostUp and PreDown scripts that might set
routes, it should again do ip route get $EndpointIP
, and if the route
goes through the WireGuard interface, it should do the policy routing thing
that diverts bearer packets via the route that it initially discovered.
As much as possible of this route should be preserved, specifically the
metric and the source address, if available.
On a server with multiple peers you may need an individual diversion route for some or all of the peers.
I'm looking carefully again at the
network design on my net. I think I need to refactor routes to/via the VPNs
(with WireGuard added). In the table below, leaves
means all the
hosts not explicitly mentioned. $pfx
represents the first three octets
of the IPv4 address range. See below for Xena's default route, indicated by *.
There are analogous addresses and routes for IPv6.
Host | VPN or Route | Presently | Change To |
---|---|---|---|
— Address Ranges — | |||
Vacant | $pfx.0/25 | $pfx.0/26+64/27 | |
Jacinth | OpenVPN 1194/udp | $pfx.128/29 | $pfx.96/29 |
Jacinth | OpenVPN 443/tdp | $pfx.144/29 | $pfx.104/29 |
Jacinth | IPSec | $pfx.160/29 | $pfx.112/29 |
Jacinth | WireGuard | (none) | $pfx.120/29 |
Surya | OpenVPN 1194/udp | $pfx.136/29 | $pfx.128/29 |
Surya | OpenVPN 443/tdp | $pfx.152/29 | $pfx.136/29 |
Surya | IPSec | $pfx.168/29 | $pfx.144/29 |
Surya | WireGuard | (none) | $pfx.152/29 |
Surya | Segment tunnel | $pfx.184/29 | $pfx.160/29 |
Xena | Xena+Petra subnet | $pfx.176/29 | $pfx.168/29 |
Vacant | (none) | $pfx.176/28 | |
Leaves | Main LAN | $pfx.192/26 | $pfx.192/26 (same) |
DHCP | In main LAN | $pfx.240..254 | No change |
— Routes — | |||
Leaves | Default route | Jacinth $pfx.193 | (Same) |
Jacinth | Default route IPv4 | Its wild side (en1) | (Same) |
Jacinth | Default route IPv6 | Surya $ofx.185 | Surya $ofx.161 |
Surya | Default route both | Its wild side (en0) | (Same) |
Xena | Default route | Jacinth $ofx.193* | (Same) |
Petra | Default route | Xena $pfx.177 | Xena $pfx.169 |
Jacinth | Main LAN | dev br0 | (Same) |
Jacinth | Jacinth OV 1194/udp | dev tun0 | (Same) |
Jacinth | Jacinth OV 443/tcp | dev tun1 | (Same) |
Jacinth | Jacinth IPSec | Already on Jacinth | (Same) |
Jacinth | Jacinth WireGuard | (none) | dev wg0 |
Jacinth | Surya VPNs+subnets | (Combined) | dev tun9/wg9 to Surya |
Jacinth | Surya OV 1194/udp | Surya $pfx.185 | (Combined) |
Jacinth | Surya OV 443/tcp | Surya $pfx.185 | (Combined) |
Jacinth | Surya IPSec | Surya $pfx.185 | (Combined) |
Jacinth | Surya (segment tnl) | dev tun9 (to surya) | (Combined) |
Jacinth | Xena + Petra | VPN(Xena) $pfx.130 | VPN(Xena) $pfx.106 |
Surya | Jacinth VPNs+subnets | (Combined) | dev tun9/wg9 to Jacinth |
Surya | Jacinth OV 1194/udp | Jacinth $pfx.186 | (Combined) |
Surya | Jacinth OV 443/tcp | Jacinth $pfx.186 | (Combined) |
Surya | Jacinth (segment tnl) | dev tun9 to Jacinth | (Combined) |
Surya | Jacinth IPSec | Jacinth $pfx.186 | (Combined) |
Surya | Surya OV 1194/udp | dev tun0 | (Same) |
Surya | Surya OV 443/tcp | dev tun1 | (Same) |
Surya | Surya IPSec | Already on Surya | (Same) |
Surya | Xena + Petra | Jacinth $pfx.186 | (Combined) |
Surya | Main LAN | Jacinth $pfx.186 | (Combined) |
Xena | (finish this) |