Valid HTML 4.01 Transitional

Unbound, DNSSEC and SSHFP

James F. Carter <jimc@jfcarter.net>, 2019-12-28

The goals of this project are:

Table of Contents

Junking the Current DNS Kludge

I have been using Berkeley Bind (named, name daemon) as a DNS (Domain Name Service) server for 32 years (since 1987). It has performed reliably (except for occasional exploits like the great Internet meltdown of 1988) and has grown to meet present day challenges. However, hostname issues have grown up in parallel with the global DNS system, and integrating them has turned into a multi-layer kludge. Specifically on CouchNet, a DNS query goes through these stages (summarized):

So I decided to scrap the whole pile and redesign from the beginning. Basically this means that I'm giving up DNS from dnsmasq, losing the _gateway DNS name from systemd-resolved, and relying on avahi-daemon rather than systemd-resolved for multicast DNS (names like $HOST.local). I rarely if ever use these features.

The decommitted services were systemd-resolved, dns-forward.J, and dnsmasq (DNS only).

Installing Unbound

My distro, OpenSuSE Tumbleweed, has recently introduced unbound, which is a total rewrite of Bind itself. According to the product hype it has these good features:

Configuration Files for Unbound

These configuration issues were dealt with. I have a locally written configuration management system and when its control files are mentioned they are tagged with (LCM). The Unbound configuration directory is referred to as /etc/unbound, which is actually a symbolic link into the chroot jail, /var/lib/unbound/etc/unbound .

/etc/resolv.conf

Formerly this was a symbolic link to the one provided dynamically by systemd-resolved, attracting queries to that service. Resolv.conf was changed to refer directly to localhost port 53 where Unbound is listening.

/etc/dnsmasq.d/dns.conf

dnsmasq runs on Jacinth (master site) and Xena (laptop). Its DNS configuration was changed to suppress DNS entirely. port=0 was all that was needed.

/etc/unbound/unbound_{control,server}.{key,pem}

These are self-signed certificates and private keys, to be used by unbound-control to authenticate to the server. Create them with unbound-control-setup.

/etc/unbound/root.key

This is the trust anchor for the whole global Domain Name Service. Every DNS resolver that verifies DNSSEC needs to know this trust anchor by some means other than DNS that it is going to verify. The program unbound-anchor is this "other means"; it uses the procedures from RFC 5011 and RFC 7958 to get the trust anchor. When a trust anchor is revoked or replaced (e.g. expiry), RFC 5011 gives a procedure by which the client can obtain the new trust anchor and can propagate trust from an old anchor to the new one. RFC 7958 gives the URL at which a XML file (and other formats) can be downloaded, the procedure to verify its authenticity, and the format of the contained data, from which a DS record could be constructed that can be used to verify the DNSKEY that signed the root zone (see RFC 5011) without reference to an existing trust anchor that is actually trusted.

/etc/unbound/icannbundle.pem

This is a concatenated set of three certificates including the ICANN Root CA, used to sign the XML file that contains the DS record that can verify the root DNSKEY record saved in /etc/unbound/root.key . It can be downloaded from data.iana.org and the certs can be verified online by normal means from the IANA Certificate Authority. It is included with the unbound-anchor package.

/etc/unbound/root.hints

This is a set of NS (nameserver) records giving the names and IP addresses of the root nameservers. When Unbound starts up, with CouchNet configuration it would have recursive forwarders to which it could send queries for arbitrary DNS objects like the list of root server NS's, but if all the forwarder(s) were inoperative, Unbound would need to know where to send that query by other means, like the root.hints file. It can be downloaded from ftp.internic.net. (In plan B, root.hints is moved into the config directory.)

/etc/unbound/dlv.isc.org.key

This is the trust anchor for DNSSEC Lookaside Validation (DLV) (RFC 5074). It comes with the unbound-anchor package. But now that the root zones are all signed, DLV is no longer necessary, and it has been deprecated since 2017. Just ignore this file and leave it alone, until it is removed from the unbound-anchor package.

unbound.conf Line by Line

These were specific configuration issues in /etc/unbound/unbound.conf:

directory: "/var/lib/unbound/etc/unbound"

Where the configuration files are. Relative pathnames are relative to this directory. If you are doing chroot (recommended), the pathname should have the chroot jail path prepended, and since other programs look for configuration in /etc/unbound, that should be a symbolic link to the directory in the jail.

chroot: "/var/lib/unbound"

Path of the chroot jail directory. Mode 755 owned by unbound:unbound.

username: "unbound"

The user to drop privileges into, "" to not drop privileges. This is orthogonal to chroot; both are recommended.

verbosity: 2

1 produces startup messages (and errors) only. 0 = errors only, 2 = more operational details, 3 = per query reports, and on upward.

interface: 0.0.0.0
interface: ::0

Listen for queries on the interfaces having these addresses. You need separate lines for IPv4 and IPv6. The values shown cause listening on all interfaces: if a query can get through the firewall, Unbound will hear it, which I need for monitoring if both Unbound and the firewall are working properly. The default is to listen only on localhost.

access-control: 0.0.0.0/0 allow
access-control: ::0/0 allow
access-control: 127.0.0.1 allow_snoop

Anything that can get through the firewall can query recursively or can get local (authoritative) data; only localhost can query the cache.

interface-automatic: yes

UDP replies are made from the address to which the query was sent. Usually this is good practice, and is important with interface: 0.0.0.0 .

# port: 53

Listen port; this is the default. You can only have one.

outgoing-port-permit: 32768-65535
outgoing-port-avoid: 0-32767

Origin ports for outgoing recursive queries. These are what's in the distro's provided unbound.conf; the default is 1024-infinity.

root-hints: "root.hints"

See the discussion above of the root.hints file.

harden-glue: yes
harden-dnssec-stripped: yes
harden-below-nxdomain: yes
harden-referral-path: no
harden-algo-downgrade: yes
use-caps-for-id: no
max-udp-size: 3072 (resists DDos exploit)
unwanted-reply-threshold: 10000000 (resists cache poisoning)

See the man page for unbound.conf, for what these defensive measures do and for the tradeoffs if you enable them. Mostly these are kept at either the compile-time default or the distro-provided setting.

private-address: 192.168.0.0/16 (and 6 others)

Enforce special use rules for these address ranges per RFC 2606 and numerous supplements. The major rule is that such addresses must never be seen by off-site queriers, since the remote client would route the addresses to its own LAN, not our internal host. And if an off-site recursive query returns such an address, we will toss it to avoid sending possibly dangerous connections to our internal hosts. See the distro's unbound.conf for the recommended list.

private-domain: "cft.ca.us"
private-domain: "d.f.ip6.arpa" (ULA addresses, RFC 4193)

Allows private addresses in these zones.

local-zone: "254.168.192.in-addr.arpa" transparent
local-zone: "1.168.192.in-addr.arpa" transparent
local-zone: "1.0.0.0.c.6.4.6.3.b.8.4.8.1.d.f.ip6.arpa" transparent
domain-insecure: "1.0.0.0.c.6.4.6.3.b.8.4.8.1.d.f.ip6.arpa"

This one is very touchy. CouchNet uses ULA-type addresses in these ranges and has corresponding zone files. local-zone allows them to be sent out despite RFC 2606 special use rules; transparent means that the zone content should be looked up and sent out in the normal way. domain-insecure turned out to be essential also, because the d.f.ip6.arpa zone cannot be signed, and Unbound needs permission to serve sub-zones without DNSSEC verification.

minimal-responses: no

Unbound will fill the Additional section of its response with everything it knows about the query, some of which the client may not ever use. This obviates re-queries (for the RR's that are used), but it takes CPU time and net bandwidth to send the stuff: a tradeoff. The default is yes (make them re-query).

trust-anchor-file: "root.key"
auto-trust-anchor-file: "root.key"
# trust-anchor-file: "/var/lib/unbound/root.key"
# trusted-keys-file: "keys.d/*.key"
include: "local-anchors.incl"

The trust anchors required for DNSSEC verification. Pick one or the other root key; the unbound-anchor program looks for auto-trust-anchor-file so use that. trusted-keys-file(s) is/are the trust anchors for your own zone's signatures (if I used them). Instead I include a concatenated list of DS records (as trust-anchor "text" commands) generated with the DNSKEYs.

val-log-level: 2

Makes it log an error message if it receives a response which fails DNSSEC verification. 0 = don't log; 1 = one line report; 2 = includes the reason and the bad IP.

remote-control:
control-enable: yes

Makes Unbound listen to the unbound-control program, for reloading Unbound, or for realtime jiggering of configuration parameters like the verbosity. See above for the required keys and certificates to authenticate.

include: "/etc/unbound/conf.d/*.conf"

In Plan A, CouchNet has local zone definitions in conf files in this directory, formerly different for the master site, slave DNS servers, and leaf nodes. In plan B I've simplified the design so the leaf servers have the same forwarders to the authoritative server instances, written in the main unbound.conf, and the authoritative servers are all slaves and have the local zone stanzas also in the main unbound.conf.

Plan A: Local Zone Files

(Plan A is doomed to failure. Mitigations in plan B are noted briefly.)

A local zone definition includes a zone type keyword, the zone's name in DNS, and information telling where Unbound should get data (Resource Records) for that zone. The type keyword is auth-zone: forward-zone: or stub-zone: (a section title, with no value). The name parameter always has the form name: its.name.tld (ending dot not required). The data source varies with the type:

More Stuff for the Chroot Jail

Some of this stuff is modified or deleted in plan B.

/var/lib/unbound/master

Contains the master site's zone files. Don't make a symlink elsewhere; this has to be in the chroot jail so Unbound can read it.

In plan B, all the dirsvrs are slaves, so this directory is gone.

/var/lib/unbound/slave

Unbound on the slave servers writes copies of the zone files here, so they can persist across restarts.

/var/lib/unbound/run

See /var/lib/unbound/dev/log below, which is a symlink to a socket in /run. Also, when exiting, the chrooted Unbound will not be able to remove the PID file unless /run is accessible, which makes trouble when you start Unbound again. So I bind-mount the real /run into the jail.

/var/lib/unbound/backup.pln

Back up everything except ./run

/var/lib/unbound/bin/unbound-start.J

Startup script; pretty simple but still too much stuff for systemd to deal with in unbound.service. It does these steps:

/var/lib/unbound/dev/random
/var/lib/unbound/dev/urandom

Random number source (blocking). There's a lot of discussion about whether /dev/urandom is just as good as /dev/random, on modern hardware. Unbound will use whichever is available. These are independent instances of character device 1,8 and 1,9, not bind-mounted from /dev.

/var/lib/unbound/dev/log

A symlink to the socket /run/systemd/journal/dev-log, which is another reason why /run is bind-mounted in the jail.

Procedure to Install Unbound

Once the LCM files and master configuration storage were gotten correct, the procedure to convert a host to Unbound went like this:

Testing Unbound

Unbound will do DNSSEC out of the box, unless you sabotage it in the configuration. For testing DNSSEC: With dig, +dnssec is needed to display the RRSIG. internetsociety.org (and lots of others) have validly signed data. dnssec-failed.org has an invalid signature. When you dig its 'A' record through a DNSSEC verifying server you get SERVFAIL (with or without +dnssec). If you specify +cdflag (checking disabled) then the server will deliver this site's 'A' record. Tidbit: if you specify +multi, dig will wrap long lines more readably and will show an interpretation of some of the arcane fields in the DNSSEC records.

Try at internet.nl to test IPv6 support. Conclusion: Hurricane Electric's forwarder does validate the test site's domain name signatures using IPv6 transport.

rootcanary.org tests most or all algorithm variants that are representable in DS and DNSKEY records. Our outcome: pairing each DS algorithm from SHA-1, 256 and 384, but not GOST, with each signing algorithm from DSA, RSA variants, ED25519, ED448, ECDSA variants, but not ECC-GOST or RSA-MD5, Hurricane Electric's forwarder can verify a RRSIG with each combination of algos. GOST is the Russian suite of crypto algorithms. The MD5 algo is deprecated due to known weaknesses.

To test the local Unbound itself, I temporarily turned off all forwarding (by changing the forward-zone's name to "su.") and reloading. Outcome: local Unbound can verify internet.nl's signatures by itself, gives SERVFAIL as it should on dnssec-failed.org, and can verify using all the algorithms for which tests are provided on rootcanary.org, except not GOST or RSA-MD5.

Signing the Zone

Tutorials on zone signing:

Zone signing steps:

Testing Unbound's DNSSEC, and Why It Failed

This command line, executed on my host Petra, makes a simple test query for Petra's IPv4 address on Petra's leaf server, on which cft.ca.us. (plan A) is a stub zone forwarding to the master site Jacinth. (+multi folds long lines for easier reading.)

dig @localhost +dnssec +multi petra.cft.ca.us. A |& less

An 'A' record and a RRSIG (Resource Record set Signature) are returned. You will see in the header flags the AD bit, which means that Petra's leaf server certifies to the client (dig) that the RRSIG was made with the ZSK for that zone (DNSKEY record, not included), the ZSK was signed with the KSK, and there is a chain of trust from one of the various trust anchors which the leaf server has, to the KSK. In this case the trust anchor is the DS record for the KSK which I installed with the leaf server, but normally the chain of trust would start with the root key in /etc/unbound/root.key.

This outcome is a success. Now let's change localhost to Jacinth. The flags now include AA because Jacinth truly is authoritative for this zone, but AD is gone. The authoritative data will not be accepted as authentic, and in particular, non-authentic SSHFP records are not accepted for authenticating the server being connected to. This is a showstopper.

So why is the authoritative data not authentic? I don't have a good reference to a discussion of this point, but I can provide some of my own hot air. The difference comes from the trust relation between the client (the one making the DNS query) and whichever DNS server turned on the AD bit. The main use case for DNSSEC is for the client to obtain DNS data that it can actually trust, such as the IP address of a bank, a brokerage, a mail server, or a VPN endpoint. The client has no trust relation with the foreign DNS server that provides this data, nor with the various forwarders that may provide the data out of their caches, so if any of those servers alleges that the data is authentic by turning on the AD bit, the client's software should not believe and should turn off AD again. The client needs software that it trusts, in my case an instance of Unbound running on my own machine and set up and supervised by me or by my I.T. staff who I trust to not have gone over to the Dark Side. So only my own recursive and validating DNS resolver should do the work to determine the validity of DNS data.

It is not normal for a local resolver to have authoritative data for anything. It is also not too easy for the resolver to know for sure that a particular query is coming from a local user who is going to trust an AD bit, or from elsewhere where the AD bit will be considered an attempt at fraud. I'm guessing here, but I would say that the Unbound developers (and similarly for other software) decided that these issues are a can of policy worms that they didn't want to deal with, given the rarity of the situation. Therefore authoritative (AA) data is always sent with the AD bit turned off.

Plan B: A Separate Authoritative DNS Server

How can I recover from this design problem? By doing what the offsite servers do: I'll provide the authoritative data from a server separate from the recursive and validating resolver. Leaf nodes will continue to have only that server (on port 53), and directory servers will have the same configuration, again on port 53. The dirsvrs will have a second DNS server on port 4253, configured as authoritative but never used recursively. Instead of stub zones, the leaf servers will forward queries for local data to all three dirsvrs on port 4253. Offsite queries will be forwarded to the master site's leaf server, so we get a local cache of all such queries.

The leaf server has trust anchor(s) configured by me by which it can validate data it receives from otherwise untrusted foreign (or local) servers. One of these trust anchors is the public KSK of the root server; all validating resolvers need this, and there is a procedure (RFC 5011 and RFC 7958) by which they can obtain and validate the root key, implemented in Unbound's unbound-anchor helper program. In addition, the local server gets public keys (actually DS records) for zones in my island of trust, served by my organization's authoritative servers, which cannot be validated by reference to the global root. (The DNSSEC Lookaside Validation (DLV) service of RFC 5074 is an alternative, but it has been deprecated since 2017.)

Clients send their queries with the DO bit on, signifying that the local recursive server should validate the requested data working from the provided trust anchors, and the client hopes that the AD bit will be set in the response, meaning that all the signatures matched the payload data. The leaf server requests the data from untrusted sources without the DO bit, and it does not expect the AD bit, which it would not trust, in their responses.

Setting Up the Authoritative Server

What software should I use for the authoritative servers? I'm relying on this Wikipedia article on Comparison of DNS Server Software. I limited the software to those that are authoritative, slave-capable, with DNSSEC, with IPv6, and free software. Then I read the detailed descriptions to determine whether they could emit AXFR and IXFR, and other noteworthy aspects. All packages that are slave-capable can read both AXFR and IXFR. Several of these packages can rely on a backend database, e.g. MySQL, with its own replication, so AXFR/IXFR are not used (though available).

Now let's compare Unbound with the rest of them in a pro&con format.

My conclusion then is to use Unbound. I will put a slave server on all the directory server machines, and the collection of zone files via HTTP(S) will be the actual master. The slaves will not be told the master's IP (because there is no real DNS master site), so they will have to download the zones on every NOTIFY, but with my operating procedures the notifies are sent only when there actually is a new zone. To avoid chicken and egg issues, I will use the master's IP in the distribution URLs.

This design is working out: all nodes can retrieve authentic (AD, validated) SSHFP records for a host that has them, and the off-site test domains do or don't deliver authentic data as appropriate.

Setting Up SSHFP

The SSHFP record is governed by RFC 4255 6594 and 7479. See SSHFP on Wikipedia for a non-normative description of the record. Its text representation is two integers and a hex string as follows:

To extract from a running SSH server a SSHFP record that you can put in your DNS zone file: (Remember that the zone file has to be signed, for the SSH client to believe in it.)

Alternatively use your backup of the host keys:

It takes some reconfiguration to get SSH to actually use the SSHFP records.

In addition I made these changes not related to SSHFP:

Now a user, with an empty or nonexistent known_hosts file, can successfully connect with SSH using these hostnames:

So this project has ended in success: known_hosts is no longer needed to validate a SSH server that has SSHFP records. The server's host key is not added to known_hosts if the SSHFP record was used to validate the server.

If you connect to an IP address, SSH does not look for a PTR record to canonicalize it, and per StrictHostKeyChecking=accept-new, the host key is added to known_hosts under the IP address with no user interaction but with a warning on stderr.

If UpdateHostKeys=yes, and you connect to a server not in known_hosts, client SSH asks for all the server's host keys, and adds all of them to known_hosts under the canonical hostname plus the IP used to connect, with no user interaction and no warning message. (For most of my use cases, this is very helpful to keep garbage out of log files.)

If UpdateHostKeys=yes and you connect to an IP address and it is not in known-hosts, the host key is added to known_hosts under the IP address with no user interaction but with a warning on stderr, same as without UpdateHostKeys. But the second time you connect, i.e. if the IP and key are already in known_hosts, the client will request all the server's host keys and will add them to known_hosts (except the one already known) under the IP address.