Unbound, DNSSEC and SSHFP

James F. Carter <jimc@jfcarter.net>, 2019-12-28

The goals of this project are:

To provide SSHFP DNS records (RFC 4255, 6594, 7479). This record type gives a hash (fingerprint) of a host's SSH public key, so that the client can know authoritatively that the host key proffered by the server actually belongs to that host. For the client to believe in it, the record must be signed with DNSSEC. SSHFP supplements the known_hosts file, which is hard to keep in sync.
To sign CouchNet's DNS records with DNS Security Extensions (DNSSEC). By using real trust versus wishful thinking, we avoid a lot of possible exploits that direct us to a fraud server.
To switch from the venerable Berkeley Bind (named) DNS server, to the new one, Unbound. Bind is still alive and functional, but it's kind of heavyweight for my needs.
To de-kludge my current multi-layer DNS implementation for client apps. Making four DNS sources work together is kind of getting out of hand, particularly for this foundation service.

Junking the Current DNS Kludge
Installing Unbound
Configuration Files for Unbound
unbound.conf Line by Line
Plan A: Local Zone Files
More Stuff for the Chroot Jail
Procedure to Install Unbound
Testing Unbound
Signing the Zone
Testing Unbound's DNSSEC, and Why It Failed
Plan B: A Separate Authoritative DNS Server
Setting Up the Authoritative Server
Setting Up SSHFP

Junking the Current DNS Kludge

I have been using Berkeley Bind (named, name daemon) as a DNS (Domain Name Service) server for 32 years (since 1987). It has performed reliably (except for occasional exploits like the great Internet meltdown of 1988) and has grown to meet present day challenges. However, hostname issues have grown up in parallel with the global DNS system, and integrating them has turned into a multi-layer kludge. Specifically on CouchNet, a DNS query goes through these stages (summarized):

Apps call gethostinfo or equivalent, which consults host to IP translation sources in order as listed in /etc/nsswitch.conf.
/etc/hosts is first: if the host is found, the IP is returned. (Cached.)
mDNS (multicast DNS) is next, producing an IP for $HOST.local.
An unanswered query is then passed to systemd-resolved. It also listens on 127.0.0.53 port 53 for apps that don't use gethostinfo; interloping on this port causes complications later which are hard to get around.
systemd-resolved also looks up in /etc/hosts (cached) and does mDNS.
systemd-resolved ought to forward to a real DNS server next, but it is incapable of forwarding to a nonstandard port, so instead I have a proxy to handle forwarding from systemd-resolved. There is also a fallback to a generic public recursive DNS server (Google's 8.8.8.8) for my laptop when roaming.
dns-forward.J.service listens on 127.0.0.253 port 53 (UDP and TCP) (more integration conflicts) and forwards to Jacinth port 53 (the master site), with fallback to backup directory servers Diamond and Xena port 4253.
Jacinth port 53 belongs to dnsmasq, which creates DNS records for hosts to which it has assigned IP addresses. Queries for other than aleatory hosts are forwarded to named on Jacinth port 4253.
On all three directory servers, Berkeley Bind (named) runs on port 4253. It is authoritative for the CouchNet zones. It handles other queries recursively, forwarding them to off-site nameservers.
This nine-step process usually works, but has a lot of moving parts, potential race conditions, and conflicts over port 53, particularly if one of the components has to be restarted.

So I decided to scrap the whole pile and redesign from the beginning. Basically this means that I'm giving up DNS from dnsmasq, losing the _gateway DNS name from systemd-resolved, and relying on avahi-daemon rather than systemd-resolved for multicast DNS (names like $HOST.local). I rarely if ever use these features.

The decommitted services were systemd-resolved, dns-forward.J, and dnsmasq (DNS only).

Installing Unbound

My distro, OpenSuSE Tumbleweed, has recently introduced unbound, which is a total rewrite of Bind itself. According to the product hype it has these good features:

Project web site
Unbound implements DNSSEC (RFC 4033,4,5 as amended), DNS Security Extensions, both as a client and as a server, assuming the relevant zone data is signed.
Recursive, can find the off-site server to send a query to.
Caching, saves query results and returns them for a repeat query, until the TTL expires.
Lightweight yet potent, suitable to deploy on leaf nodes as well as high volume public servers.
RFC compliance: Almost 100 RFCs are complied with; see NLNetLabs' list of RFCs with links.
Configuration defaults are such that simple deployments require only simple configuration.
Unbound source code has recently (2019) been formally audited and vulnerabilities fixed.
Unbound is written in compiled code ('C'). It has a Python execution module for hook scripts.
Unbound has open source: BSD license.
NLNetLabs, the developer of Unbound, is funded by support contracts and similar paid arrangements for its software.

Configuration Files for Unbound

These configuration issues were dealt with. I have a locally written configuration management system and when its control files are mentioned they are tagged with (LCM). The Unbound configuration directory is referred to as /etc/unbound, which is actually a symbolic link into the chroot jail, /var/lib/unbound/etc/unbound .

/etc/resolv.conf: Formerly this was a symbolic link to the one provided dynamically by systemd-resolved, attracting queries to that service. Resolv.conf was changed to refer directly to localhost port 53 where Unbound is listening.
/etc/dnsmasq.d/dns.conf: dnsmasq runs on Jacinth (master site) and Xena (laptop). Its DNS configuration was changed to suppress DNS entirely. port=0 was all that was needed.
/etc/unbound/unbound_{control,server}.{key,pem}: These are self-signed certificates and private keys, to be used by unbound-control to authenticate to the server. Create them with unbound-control-setup.
/etc/unbound/root.key: This is the trust anchor for the whole global Domain Name Service. Every DNS resolver that verifies DNSSEC needs to know this trust anchor by some means other than DNS that it is going to verify. The program unbound-anchor is this "other means"; it uses the procedures from RFC 5011 and RFC 7958 to get the trust anchor. When a trust anchor is revoked or replaced (e.g. expiry), RFC 5011 gives a procedure by which the client can obtain the new trust anchor and can propagate trust from an old anchor to the new one. RFC 7958 gives the URL at which a XML file (and other formats) can be downloaded, the procedure to verify its authenticity, and the format of the contained data, from which a DS record could be constructed that can be used to verify the DNSKEY that signed the root zone (see RFC 5011) without reference to an existing trust anchor that is actually trusted.
/etc/unbound/icannbundle.pem: This is a concatenated set of three certificates including the ICANN Root CA, used to sign the XML file that contains the DS record that can verify the root DNSKEY record saved in /etc/unbound/root.key . It can be downloaded from data.iana.org and the certs can be verified online by normal means from the IANA Certificate Authority. It is included with the unbound-anchor package.
/etc/unbound/root.hints: This is a set of NS (nameserver) records giving the names and IP addresses of the root nameservers. When Unbound starts up, with CouchNet configuration it would have recursive forwarders to which it could send queries for arbitrary DNS objects like the list of root server NS's, but if all the forwarder(s) were inoperative, Unbound would need to know where to send that query by other means, like the root.hints file. It can be downloaded from ftp.internic.net. (In plan B, root.hints is moved into the config directory.)
/etc/unbound/dlv.isc.org.key: This is the trust anchor for DNSSEC Lookaside Validation (DLV) (RFC 5074). It comes with the unbound-anchor package. But now that the root zones are all signed, DLV is no longer necessary, and it has been deprecated since 2017. Just ignore this file and leave it alone, until it is removed from the unbound-anchor package.

unbound.conf Line by Line

These were specific configuration issues in /etc/unbound/unbound.conf:

directory: "/var/lib/unbound/etc/unbound": Where the configuration files are. Relative pathnames are relative to this directory. If you are doing chroot (recommended), the pathname should have the chroot jail path prepended, and since other programs look for configuration in /etc/unbound, that should be a symbolic link to the directory in the jail.
chroot: "/var/lib/unbound": Path of the chroot jail directory. Mode 755 owned by unbound:unbound.
username: "unbound": The user to drop privileges into, "" to not drop privileges. This is orthogonal to chroot; both are recommended.
verbosity: 2: 1 produces startup messages (and errors) only. 0 = errors only, 2 = more operational details, 3 = per query reports, and on upward.
interface: 0.0.0.0 interface: ::0: Listen for queries on the interfaces having these addresses. You need separate lines for IPv4 and IPv6. The values shown cause listening on all interfaces: if a query can get through the firewall, Unbound will hear it, which I need for monitoring if both Unbound and the firewall are working properly. The default is to listen only on localhost.
access-control: 0.0.0.0/0 allow access-control: ::0/0 allow access-control: 127.0.0.1 allow_snoop: Anything that can get through the firewall can query recursively or can get local (authoritative) data; only localhost can query the cache.
interface-automatic: yes: UDP replies are made from the address to which the query was sent. Usually this is good practice, and is important with interface: 0.0.0.0 .
# port: 53: Listen port; this is the default. You can only have one.
outgoing-port-permit: 32768-65535 outgoing-port-avoid: 0-32767: Origin ports for outgoing recursive queries. These are what's in the distro's provided unbound.conf; the default is 1024-infinity.
root-hints: "root.hints": See the discussion above of the root.hints file.
harden-glue: yes harden-dnssec-stripped: yes harden-below-nxdomain: yes harden-referral-path: no harden-algo-downgrade: yes use-caps-for-id: no max-udp-size: 3072 (resists DDos exploit) unwanted-reply-threshold: 10000000 (resists cache poisoning): See the man page for unbound.conf, for what these defensive measures do and for the tradeoffs if you enable them. Mostly these are kept at either the compile-time default or the distro-provided setting.
private-address: 192.168.0.0/16 (and 6 others): Enforce special use rules for these address ranges per RFC 2606 and numerous supplements. The major rule is that such addresses must never be seen by off-site queriers, since the remote client would route the addresses to its own LAN, not our internal host. And if an off-site recursive query returns such an address, we will toss it to avoid sending possibly dangerous connections to our internal hosts. See the distro's unbound.conf for the recommended list.
private-domain: "cft.ca.us" private-domain: "d.f.ip6.arpa" (ULA addresses, RFC 4193): Allows private addresses in these zones.
local-zone: "254.168.192.in-addr.arpa" transparent local-zone: "1.168.192.in-addr.arpa" transparent local-zone: "1.0.0.0.c.6.4.6.3.b.8.4.8.1.d.f.ip6.arpa" transparent domain-insecure: "1.0.0.0.c.6.4.6.3.b.8.4.8.1.d.f.ip6.arpa": This one is very touchy. CouchNet uses ULA-type addresses in these ranges and has corresponding zone files. local-zone allows them to be sent out despite RFC 2606 special use rules; transparent means that the zone content should be looked up and sent out in the normal way. domain-insecure turned out to be essential also, because the d.f.ip6.arpa zone cannot be signed, and Unbound needs permission to serve sub-zones without DNSSEC verification.
minimal-responses: no: Unbound will fill the Additional section of its response with everything it knows about the query, some of which the client may not ever use. This obviates re-queries (for the RR's that are used), but it takes CPU time and net bandwidth to send the stuff: a tradeoff. The default is yes (make them re-query).
trust-anchor-file: "root.key" auto-trust-anchor-file: "root.key" # trust-anchor-file: "/var/lib/unbound/root.key" # trusted-keys-file: "keys.d/*.key" include: "local-anchors.incl": The trust anchors required for DNSSEC verification. Pick one or the other root key; the unbound-anchor program looks for auto-trust-anchor-file so use that. trusted-keys-file(s) is/are the trust anchors for your own zone's signatures (if I used them). Instead I include a concatenated list of DS records (as trust-anchor "text" commands) generated with the DNSKEYs.
val-log-level: 2: Makes it log an error message if it receives a response which fails DNSSEC verification. 0 = don't log; 1 = one line report; 2 = includes the reason and the bad IP.
remote-control: control-enable: yes: Makes Unbound listen to the unbound-control program, for reloading Unbound, or for realtime jiggering of configuration parameters like the verbosity. See above for the required keys and certificates to authenticate.
include: "/etc/unbound/conf.d/*.conf": In Plan A, CouchNet has local zone definitions in conf files in this directory, formerly different for the master site, slave DNS servers, and leaf nodes. In plan B I've simplified the design so the leaf servers have the same forwarders to the authoritative server instances, written in the main unbound.conf, and the authoritative servers are all slaves and have the local zone stanzas also in the main unbound.conf.

Plan A: Local Zone Files

(Plan A is doomed to failure. Mitigations in plan B are noted briefly.)

A local zone definition includes a zone type keyword, the zone's name in DNS, and information telling where Unbound should get data (Resource Records) for that zone. The type keyword is auth-zone: forward-zone: or stub-zone: (a section title, with no value). The name parameter always has the form name: its.name.tld (ending dot not required). The data source varies with the type:

Master site: keyword is auth-zone
zonefile: "/var/lib/unbound/master/cft.zone"
This is an absolute path to the original zone file. A relative path might be feasible also with a '..'. But it has to be inside the chroot jail so Unbound can read it.
Since Unbound is not capable of sending AXFR/IXFR zone updates, it's kind of useless to have a master site, and in plan B all the dirsvrs are slaves.
Slave servers: keyword is also auth-zone
master: 192.9.200.193
url: http://192.9.200.193/unbound-master/cft.zone
zonefile: "/var/lib/unbound/slave/cft.zone"
Unbound listens for notify messages from the master, and sucks new versions of the zonefile from it. The result becomes or replaces the listed zonefile. There can be multiple masters (not on CouchNet).
Gotcha: if the master site runs Unbound (v1.9.6 on my system), it is not capable of emitting AXFR or IXFR responses. You need to set up a webserver (simplest if on the master site) that can serve the zone files. Zone file content (RRsets) are public record and are served to anyone who asks (who satisfies access control restrictions), but it is probably not best practice to offer the complete zone file to the global hacking community; i.e. mind the access control on the webserver too.
Unbound checks on the master for an updated zone file upon receiving a NOTIFY request or periodically per the times in the SOA. If given master:IP, Unbound retrieves the master's SOA and compares serial numbers, and exits the procedure if the master is not newer. If newer or if no master:IP, it then attempts each URL (if any) and then each master:IP (AXFR/IXFR) until one of them delivers the zone. If none of the transfers succeed, Unbound tries again according to the times in the SOA. Failures in this process do not produce error messages in the logs.
In the typical case, if you put the master's hostname in the URL, Unbound will need to resolve it to an IP using RRs in the zonefile that it is going to download from that server. So use an IP in the URL. It's not clear how you make this work with TLS (HTTPS), but the data transport path on my net is either local 802.3, or from my laptop on a VPN, so I don't need TLS.
Leaf node: keyword is stub-zone
stub-addr: 192.9.200.193
stub-prime: yes
Queries in this zone are forwarded to the DNS server on the given address; there can be several. stub-prime means that at startup Unbound will ask at the given address for the list of NS (nameservers) for the zone, and subsequently will use that instead of the configured address.
It turns out that a stub zone is intended to copy (and cache) data from an authoritative server. This means that the leaf Unbound will not validate the data, which is fatal when you try to use a SSHFP to validate a server. In plan B, the leaf nodes forward to the authoritative dirsvr slaves and then validate the response.
Forwarders: keyword is forward-zone
name: "."
forward-addr: 192.9.200.193
forward-first: yes
Although Unbound is capable of looking up any domain name by itself, and to verify DNSSEC for the answer if it's signed, it's a best practice in an enterprise deployment for the leaf and slave nodes to forward queries for off-site data to a trusted master site, because it can cache the results including the DNSSEC records and outcome, speeding up service for leaf nodes that repeat someone else's query, and reducing the load on external servers.
There can be several forward-addr's, though this dilutes the effectiveness of forwarding. forward-first means that if a SERVFAIL response comes back, Unbound should attempt to do the query by itself in the normal way. Thus DNSSEC verification failures can be logged locally, or a defective forwarder doesn't kill your DNS. Empirically, a timeout also triggers self-verification.
Should the master site have a forwarder? There are tradeoffs here:
- Ability to reach authoritative servers is the same for the master site and for its hypothetical forwarder, because (at least in my case) the master site has a generic Internet connection. So the issue of reachability is not relevant for deciding if a forwarder should be used.
- The mantra is, see to your own security. You trust the stub resolver on localhost to do DNSSEC verification honestly, and you don't trust outside servers. But the forwarder is not verifying (and you would ignore the AD bit if set, which it isn't), it is sending you the RRSIGs by which the resolver on localhost can verify and can be aware if the payload (or the RRSIGs) are corrupt due to fraud or accident.
- The forwarder has the answers ready to go out promptly and with no effort required by the target's DNS server (except for the first query for that site after the TTL expires). If a zillion individuals and enterprises use the forwarder, the effort saving could be significant.
- If the master site (or any node) has no forwarder, when it starts up or reloads or the TTL expires, it needs to retrieve the NS, DS, DNSKEY and RRSIG records for the root, the TLDs, etc, directly from the root servers. Every possibility of not talking to the root servers should be implemented, to avoid loading them up.
- Some forwarders have special features like parental controls and blacklists of fraud sites. A truly paranoid security professional would not tolerate having the outside forwarder make these kinds of judgments, but for the rest of the population, this kind of forwarder could be attractive.
- You are feeding to the forwarder a continuous stream of domain names that you are using, and it is not feasible for you to ascertain what (beyond DNS replies) the forwarder is doing with this data. Google's public forwarders are often mentioned on this point. A truly paranoid sysadmin would be very nervous about such an exposure.
- Jimc's conclusion: I'm not paranoid enough to refuse to use a forwarder. I'm already trusting Hurricane Electric to host my public DNS data, and advertising is not the foundation of their business model. So I'm going to use their public forwarder.
- All Unbound instances (except the master site itself) will forward to my master site. The three directory servers will also forward to Hurricane Electric. These are the master site, the laptop (which needs it when roaming), and the other dirsvr, which will quickly learn that the master site gives the best service. I'm trying to minimize the number of configurations that I need to maintain.

More Stuff for the Chroot Jail

Some of this stuff is modified or deleted in plan B.

/var/lib/unbound/master

Contains the master site's zone files. Don't make a symlink elsewhere; this has to be in the chroot jail so Unbound can read it.

In plan B, all the dirsvrs are slaves, so this directory is gone.

/var/lib/unbound/slave

Unbound on the slave servers writes copies of the zone files here, so they can persist across restarts.

/var/lib/unbound/run

See /var/lib/unbound/dev/log below, which is a symlink to a socket in /run. Also, when exiting, the chrooted Unbound will not be able to remove the PID file unless /run is accessible, which makes trouble when you start Unbound again. So I bind-mount the real /run into the jail.

/var/lib/unbound/backup.pln

Back up everything except ./run

/var/lib/unbound/bin/unbound-start.J

Startup script; pretty simple but still too much stuff for systemd to deal with in unbound.service. It does these steps:

Loads configuration variables from /etc/sysconfig/unbound .
Only for the recursive (leaf) server instances, makes a symlink from perhost.incl to perhost.generic, except that perhost.$HOSTNAME is used if existing. This allows special cases according to the host's role:
- Jacinth (master) forwards nonlocal queries offsite; all others forward to Jacinth.
- Xena (laptop) also forwards offsite (and to Jacinth) because it roams and Jacinth is only available if the VPN is running.
Bind-mounts /run onto /var/lib/unbound/run (if not already done).
Checks the configuration file, bails if there's an error.
Runs unbound-anchor to update the root trust anchor.
Execs unbound -d, which means to not daemonize.

/var/lib/unbound/dev/random /var/lib/unbound/dev/urandom

Random number source (blocking). There's a lot of discussion about whether /dev/urandom is just as good as /dev/random, on modern hardware. Unbound will use whichever is available. These are independent instances of character device 1,8 and 1,9, not bind-mounted from /dev.

/var/lib/unbound/dev/log

A symlink to the socket /run/systemd/journal/dev-log, which is another reason why /run is bind-mounted in the jail.

Procedure to Install Unbound

Once the LCM files and master configuration storage were gotten correct, the procedure to convert a host to Unbound went like this:

zypper install unbound — 7 packages to install.
rm -r /etc/unbound — so it can be replaced by a symlink into the chroot jail. Some programs like unbound-control look for configuration information and/or keys in this directory.
On Diamond (LCM master): /home/post_jump/sync_jump -p -C -a $HOST — Check results, then re-run with -c (install) instead of -C (compare). This LCM script installs the standard configuration (not just Unboound) onto $HOST with rsync doing most of the work.
audit-scripts -v -k -n — Check results, then re-run with -c (install) instead of -n. This LCM script enables and disables (un)wanted services. -k lets it kill services (like bind) that are no longer in the list of wanted services.
systemctl stop unbound-anchor.timer dns-forward.J.service systemd-resolved.service
systemctl start unbound #Also unbound2 on dirsvrs in Plan B
systemctl status unbound |& less
/usr/diklo/lib/functest/unbound -v -t -e 0 — I have a collection of about 90 functional test scripts for most services that I use. This one checks if Unbound is actually listening on port 53; maps 6 random hostnames to their IPv4+6 addresses and back to the FQDN; and checks the SOA record of each locally configured zone. Test passed.
checkout.sh > /tmp/check.out 2>&1 ; less /tmp/check.out — Runs all the functional tests to detect baleful effects of the switchover. Tests all passed.

Testing Unbound

Unbound will do DNSSEC out of the box, unless you sabotage it in the configuration. For testing DNSSEC: With dig, +dnssec is needed to display the RRSIG. internetsociety.org (and lots of others) have validly signed data. dnssec-failed.org has an invalid signature. When you dig its 'A' record through a DNSSEC verifying server you get SERVFAIL (with or without +dnssec). If you specify +cdflag (checking disabled) then the server will deliver this site's 'A' record. Tidbit: if you specify +multi, dig will wrap long lines more readably and will show an interpretation of some of the arcane fields in the DNSSEC records.

Try at internet.nl to test IPv6 support. Conclusion: Hurricane Electric's forwarder does validate the test site's domain name signatures using IPv6 transport.

rootcanary.org tests most or all algorithm variants that are representable in DS and DNSKEY records. Our outcome: pairing each DS algorithm from SHA-1, 256 and 384, but not GOST, with each signing algorithm from DSA, RSA variants, ED25519, ED448, ECDSA variants, but not ECC-GOST or RSA-MD5, Hurricane Electric's forwarder can verify a RRSIG with each combination of algos. GOST is the Russian suite of crypto algorithms. The MD5 algo is deprecated due to known weaknesses.

To test the local Unbound itself, I temporarily turned off all forwarding (by changing the forward-zone's name to "su.") and reloading. Outcome: local Unbound can verify internet.nl's signatures by itself, gives SERVFAIL as it should on dnssec-failed.org, and can verify using all the algorithms for which tests are provided on rootcanary.org, except not GOST or RSA-MD5.

Signing the Zone

Tutorials on zone signing:

DNSSEC Key Management and Zone Signing by Olaf Kolkman (RIPE, about 2010-12-xx, historical and unmaintained).
DNSSEC: Signing, Validating, and Troubleshooting by Michael Sinatra (ESCC, 2012-summer). Tidbit from this tutorial: LDNS is a companion to Unbound that signs zones. Jimc research: in OpenSuSE it is from the ldns package which is a dependency of unbound, i.e. ldns is installed when you install Unbound. There are corresponding tools provided with Berkeley Bind (named) in the bind-utils package, a dependency of bind.
ICANN's executive summary of DNSSEC, titled DNSSEC — What Is It and Why Is It Important?.

Zone signing steps:

You will generate two keys for each zone. The Zone Signing Key (ZSK) is used to sign each of the sets of records (RRsets) in your zonefile, while the Key Signing Key (KSK) signs the ZSK, and is used to produce a DS record containing this signature which you can send to your parent zone for inclusion, thus establishing a link in your chain of trust from your parent to your own zone.
It's recommended, and not too burdensome, to use a separate set of keys for each zone.
You need to start by choosing the algorithm and key length of your two zone keys. It's apparently not too easy to roll over to a new key with a different algorithm.
From the Sinatra tutorial (2012), one example shows RSASHA256 as the algo, with a key length of 1024 for the ZSK and 2048 for the KSK. Currently (2020), NSA recommends, since quantum computing may become practical during the lifetime of keys created presently, that 2048 bits be the minimum for a RSA key. Also, an elliptic curve in a prime field of modulus 2²⁵⁵-19, here labelled ED25519, seems to be winning the confidence of the cryptographic community, and this algo is getting widespread deployment. As elliptic curve algos are substantially more efficient and more compact than RSA, I'm inclined to use ED25519 for my keys. Quantum computing attack algos do not have an advantage over brute force when applied to elliptic curves, as they do against RSA keys.
An authoritative introduction to ED25519 may be found at Ed25519: high-speed high-security signatures by Daniel J. Bernstein (2017-01-22). A brute force attack takes about 2¹²⁸ trials, so ED25519 has equivalent strength to a 3000 bit RSA key, or a 128 bit symmetric block cipher such as Rijndael (AES). Signatures are 64 bytes (512 bits) long, and public keys are half that size.
See also this Wikipedia article about Curve25519. See also RFC 7748. Daniel J. Bernstein first released this elliptic curve in 2005, but later paranoia about the NSA's recommended elliptic curves led to a lot of interest in this curve and widespread deployment and adoption in standards.
ldns-keygen generates a key pair. There are three output files: the one with extension .key contains the (public) DNSKEY record; the one ending in .private contains the private key; and the one ending in .ds contains a DS record which could be sent to and inserted in your parent zone; this is a hash of the DNSKEY. (DS is only put out when you generate a KSK.) The basename of the files is K${name}+${algo}+${keyID} .
The private keys should go in a directory outside Unbound's chroot jail. The directory has to be backed up and you need to take the usual precautions to protect the secret key in the backup files. It would normally be on the master site, i.e. the one with the master nameserver, and the master site would be on a secure subnet of the local LAN that is not accessible to the global hacking community. For me it's file://jacinth/home/hostdata/dnssec.
Key generation is instantaneous, unlike a RSA key. The files are written in the current directory. The records are printable; the payload of the DS record is a hex string, while the DNSKEY and the private key are base64 encoded. The DNSKEY has mode 644 (everyone can read) and the private key is 600 (only owner can read-write). The KSK and ZSK have different key IDs (avoiding overwriting). The KSK has a distinguishing flag: the 1 bit in the first integer (flags) in the text representation. (The ZSK has a flag field of 256 while the KSK has 257.) Also, only the KSK has an accompanying DS file. The program can be run as a non-root user; make sure that the user unbound can read the files afterward.
Jimc's command line (done manually at setup or rollover):
ldns-keygen -a ED25519 [-k] cft.ca.us
Option interpretation:
- -a ED25519 — The algorithm; '-a list' makes it print a list of recognized algo names.
- (-b bits) — The strength of an elliptic curve algo is not adjustable; this one is equivalent to 3000 bits in a RSA key.
- -k — Specify this to put out the flag for the KSK and the DS record; omit it for the ZSK.
- The (only) non-option argument is the domain for which the key is being generated. This becomes the name substring in the output filename. I successfully gave my zone with an ending dot, but I get the impression from documentation that the ending dot is usually omitted.
To sign a zone (recommended to put these in a script):
ldns-signzone -b -e YYYYMMDD -i YYYYMMDD -n -f ../master/cft.ca.us.zone /home/hostdata/cft.ca.us.zone /home/hostdata/Kcft.ca.us+015+12345 …
Option interpretation:
- -b — Add helpful comments and interpretations of the various records.
- -e date — When the signatures expire. The DNSKEY is not a certificate; the date is part of each signature.
- -i date — When the signature's validity begins (inception).
- -n — Use NSEC3 instead of NSEC (more efficient way to prove that domain names do not exist). See RFC 5155.
- -f outfile — Write the result here. The default is the input filename with '.signed' appended. In plan B there is no master site and they are written in a web directory.
- -o domain — Specify the origin of the zone. If the input file has an ORIGIN statement and -o is absent, ldns-signzone does use the correct origin.
- The first non-option argument is the input filename, i.e. a zonefile with no signatures or keys.
- Subsequent arguments are filenames of keys; omit the .private extension. Include all your KSKs and ZSKs for this zone; the program knows which records should be signed with which key. In the case of key rollover, there may be more than one KSK or ZSK, and a signature with each one will be generated.

Testing Unbound's DNSSEC, and Why It Failed

This command line, executed on my host Petra, makes a simple test query for Petra's IPv4 address on Petra's leaf server, on which cft.ca.us. (plan A) is a stub zone forwarding to the master site Jacinth. (+multi folds long lines for easier reading.)

dig @localhost +dnssec +multi petra.cft.ca.us. A |& less

An 'A' record and a RRSIG (Resource Record set Signature) are returned. You will see in the header flags the AD bit, which means that Petra's leaf server certifies to the client (dig) that the RRSIG was made with the ZSK for that zone (DNSKEY record, not included), the ZSK was signed with the KSK, and there is a chain of trust from one of the various trust anchors which the leaf server has, to the KSK. In this case the trust anchor is the DS record for the KSK which I installed with the leaf server, but normally the chain of trust would start with the root key in /etc/unbound/root.key.

This outcome is a success. Now let's change localhost to Jacinth. The flags now include AA because Jacinth truly is authoritative for this zone, but AD is gone. The authoritative data will not be accepted as authentic, and in particular, non-authentic SSHFP records are not accepted for authenticating the server being connected to. This is a showstopper.

So why is the authoritative data not authentic? I don't have a good reference to a discussion of this point, but I can provide some of my own hot air. The difference comes from the trust relation between the client (the one making the DNS query) and whichever DNS server turned on the AD bit. The main use case for DNSSEC is for the client to obtain DNS data that it can actually trust, such as the IP address of a bank, a brokerage, a mail server, or a VPN endpoint. The client has no trust relation with the foreign DNS server that provides this data, nor with the various forwarders that may provide the data out of their caches, so if any of those servers alleges that the data is authentic by turning on the AD bit, the client's software should not believe and should turn off AD again. The client needs software that it trusts, in my case an instance of Unbound running on my own machine and set up and supervised by me or by my I.T. staff who I trust to not have gone over to the Dark Side. So only my own recursive and validating DNS resolver should do the work to determine the validity of DNS data.

It is not normal for a local resolver to have authoritative data for anything. It is also not too easy for the resolver to know for sure that a particular query is coming from a local user who is going to trust an AD bit, or from elsewhere where the AD bit will be considered an attempt at fraud. I'm guessing here, but I would say that the Unbound developers (and similarly for other software) decided that these issues are a can of policy worms that they didn't want to deal with, given the rarity of the situation. Therefore authoritative (AA) data is always sent with the AD bit turned off.

Plan B: A Separate Authoritative DNS Server

How can I recover from this design problem? By doing what the offsite servers do: I'll provide the authoritative data from a server separate from the recursive and validating resolver. Leaf nodes will continue to have only that server (on port 53), and directory servers will have the same configuration, again on port 53. The dirsvrs will have a second DNS server on port 4253, configured as authoritative but never used recursively. Instead of stub zones, the leaf servers will forward queries for local data to all three dirsvrs on port 4253. Offsite queries will be forwarded to the master site's leaf server, so we get a local cache of all such queries.

The leaf server has trust anchor(s) configured by me by which it can validate data it receives from otherwise untrusted foreign (or local) servers. One of these trust anchors is the public KSK of the root server; all validating resolvers need this, and there is a procedure (RFC 5011 and RFC 7958) by which they can obtain and validate the root key, implemented in Unbound's unbound-anchor helper program. In addition, the local server gets public keys (actually DS records) for zones in my island of trust, served by my organization's authoritative servers, which cannot be validated by reference to the global root. (The DNSSEC Lookaside Validation (DLV) service of RFC 5074 is an alternative, but it has been deprecated since 2017.)

Clients send their queries with the DO bit on, signifying that the local recursive server should validate the requested data working from the provided trust anchors, and the client hopes that the AD bit will be set in the response, meaning that all the signatures matched the payload data. The leaf server requests the data from untrusted sources without the DO bit, and it does not expect the AD bit, which it would not trust, in their responses.

Setting Up the Authoritative Server

What software should I use for the authoritative servers? I'm relying on this Wikipedia article on Comparison of DNS Server Software. I limited the software to those that are authoritative, slave-capable, with DNSSEC, with IPv6, and free software. Then I read the detailed descriptions to determine whether they could emit AXFR and IXFR, and other noteworthy aspects. All packages that are slave-capable can read both AXFR and IXFR. Several of these packages can rely on a backend database, e.g. MySQL, with its own replication, so AXFR/IXFR are not used (though available).

Big-IP DNS — Part of a commercial net appliance product line.
Bind — The big old dragon, can emit AXFR and IXFR.
PowerDNS — Formerly commercial, re-released under GPLv2. Can sign dynamic update RR's. Probably can emit AXFR, but definitely not IXFR.
Unbound — Our current favorite. Cannot emit AXFR or IXFR.
NSD — Can emit AXFR but not IXFR. Written by the same people who did Unbound. Authoritative only, and has been adopted by several TLD servers.
YADIFA — Developed and used by the .eu zone managers. Can emit both AXFR and IXFR.

Now let's compare Unbound with the rest of them in a pro&con format.

Unbound gets a big plus because I've already installed it and have learned how to configure it, and where some of the skeletons are buried.
Comparing IXFR vs. AXFR, how bad is it to not emit IXFR? With my small zones, IXFR's benefit is microscopic, whereas with complex and active zones like a TLD, there would be much more benefit for the slave, but the design and execution difficulties for the master to create the IXFR are incredible, and at least one package (NSD) known to be used on some TLD servers even so emits AXFR only.
Unbound has an alternative distribution method over HTTP. What's wrong with using that?
- It's not real DNS. But we should be picking mechanisms on technical merit, not purity or political correctness.
- It precludes incremental zone updates. We've already concluded that incremental updates are of little value in my use case and perhaps even in the TLD context.
- What about duplicate NOTIFY? If the slave has the master's IP, it will retrieve the SOA by DNS, and will skip the download if not newer. In my use case, duplicate NOTIFY never happens.
- There's some question whether it can be made to work over HTTPS. My data path is under my control (unlike some peoples'); the payload is public record so privacy is irrelevant; and the zones are signed so a breach of integrity could result in a denial of service but not insertion of fraudulent content.

My conclusion then is to use Unbound. I will put a slave server on all the directory server machines, and the collection of zone files via HTTP(S) will be the actual master. The slaves will not be told the master's IP (because there is no real DNS master site), so they will have to download the zones on every NOTIFY, but with my operating procedures the notifies are sent only when there actually is a new zone. To avoid chicken and egg issues, I will use the master's IP in the distribution URLs.

This design is working out: all nodes can retrieve authentic (AD, validated) SSHFP records for a host that has them, and the off-site test domains do or don't deliver authentic data as appropriate.

Setting Up SSHFP

The SSHFP record is governed by RFC 4255 6594 and 7479. See SSHFP on Wikipedia for a non-normative description of the record. Its text representation is two integers and a hex string as follows:

Key algorithm, currently 1 = RSA, 2 = DSA, 3 = ECDSA, 4 = Ed25519.
Hash algorithm, currently 1 = SHA-1 (end of life), 2 = SHA-256.
A hex string representation of the hash of the host's public key. Dig has a bug causing it to split this field into a field of 56 hex digits and one of 8 digits, for algo 4 (ED25519).

To extract from a running SSH server a SSHFP record that you can put in your DNS zone file: (Remember that the zone file has to be signed, for the SSH client to believe in it.)

ssh-keyscan -D diamond > diamond.sshfp
ssh-keyscan -4 -D -p 2222 selen > selen.sshfp
Selen is an Android cellphone that runs Dropbear; it doesn't do IPv6 and Dropbear is listening on port 2222 (to avoid the need for root access).

Alternatively use your backup of the host keys:

ssh-keygen -r selen -f id_ed25519 >> selen.sshfp

It takes some reconfiguration to get SSH to actually use the SSHFP records.

First, the SSHFP record belongs to the target's FQDN, not to the target's 1-component name. If you intend to use 1-component names (and who doesn't?) you need CanonicalizeHostname always early in the applicable Host section of /etc/ssh/ssh_config (client configuration).
You also need the CanonicalDomains statement. Its value is a space separated list of domains to append to the non-canonical name; SSH doesn't use the searchlist from /etc/resolv.conf. I put ending dots on my domains.
CanonicalizePermittedCNAMEs *:* is also important if the 1-component name plus the domain ends up at a CNAME; e.g. you do ssh backup burn-the-cd where backup.cft.ca.us is a CNAME to the actual backup server, Diamond. *:* means all CNAMEs are allowed; see the man page if you want to be more paranoid.
CanonicalizeFallbackLocal yes means that if the hostname cannot be canonicalized, SSH should continue with the non-canonical name (and without being able to find SSHFP records). No would mean that a botched canonicalizaion kills the session.
Following the canonical control statements, it is tempting to put Match canonical, meaning to process the rest of the configuration only when the canonicalization is finished. But suppose it has to CanonicalizeFallbackLocal? The given hostname would not be canonical, would not match, and the session would not get the important settings that follow.
Finally, set VerifyHostKeyDNS yes so SSH will look for SSHFP records, and if found will treat them as equivalent to entries in known_hosts.

In addition I made these changes not related to SSHFP:

StrictHostKeyChecking accept-new means to save a host key automatically in known_hosts (with a warning) if not there before. But if it's in known_hosts and is different, the session is killed with a lurid message. Formerly I had this at ask, requiring user interaction for any additions to known_hosts.
Since I want to encourage the client and the server to use ED25519, I moved these algorithms to the beginning ('^') of the respective lists.
HostKeyAlgorithms ^ssh-ed25519-cert-v01@openssh.com,ssh-ed25519,ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp256
PubkeyAcceptedKeyTypes ^ssh-ed25519-cert-v01@openssh.com,ssh-ed25519,ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp256
There's a nice new feature: you can have SSH solicit a list of all host keys that the server has, and missing ones will be added to known_hosts. Format: UpdateHostKeys yes. However, in the likely case that the server always sends the same key, there's no real saving from having the other keys, and I'd like to keep known_hosts empty, so I did not turn this on.

Now a user, with an empty or nonexistent known_hosts file, can successfully connect with SSH using these hostnames:

ssh diamond.cft.ca.us. uname -n — The FQDN
ssh diamond uname -n — The 1-component name
ssh backup uname -n — A CNAME for Diamond
ssh 192.9.200.194 uname -n — Not canonicalized
ssh 2600:3c01:e000:306::c2 uname -n — Not canonicalized

So this project has ended in success: known_hosts is no longer needed to validate a SSH server that has SSHFP records. The server's host key is not added to known_hosts if the SSHFP record was used to validate the server.

If you connect to an IP address, SSH does not look for a PTR record to canonicalize it, and per StrictHostKeyChecking=accept-new, the host key is added to known_hosts under the IP address with no user interaction but with a warning on stderr.

If UpdateHostKeys=yes, and you connect to a server not in known_hosts, client SSH asks for all the server's host keys, and adds all of them to known_hosts under the canonical hostname plus the IP used to connect, with no user interaction and no warning message. (For most of my use cases, this is very helpful to keep garbage out of log files.)

If UpdateHostKeys=yes and you connect to an IP address and it is not in known-hosts, the host key is added to known_hosts under the IP address with no user interaction but with a warning on stderr, same as without UpdateHostKeys. But the second time you connect, i.e. if the IP and key are already in known_hosts, the client will request all the server's host keys and will add them to known_hosts (except the one already known) under the IP address.

Unbound, DNSSEC and SSHFP

Table of Contents