-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOA Serial does not reflect the version of the data being served #690
Comments
@buffrr and I have discussed this a lot as well but the issue is how to handle re-orgs (more on that in a sec). Your point about getting SOA from full nodes that have not yet finished syncing however seems far more important than what we were thinking about. The reorg issue is: Imagine block 36 is found, the Urkel tree and the HNS root zone will officially be updated. We can use the height One thing we could do is just always use the current chain tip height as the serial, not just the height of the last tree update. pro: reorgs handled automatically Another thing we can use instead of chain height is the Median Time Past which is the median time of the last 11 block timestamps. It is guaranteed to always increase unlike the individual block timestamps themselves. Again, we would have to update this every block to ensure that reorgs are properly handled. So,Whats the best tradeoff? Does the axfr bridge repeatedly poll SOA serial and only transfer when its been updated? That would require more clever logic or else youre going to be downloading the same root zone every ten minutes, probably. Or,is it "okay" to have invalid data for 6 hours and then just hope the next tree update goes smoothly?
Sorry about this, we are under-staffed and the developers that are working on HNS core software have higher priorities that they feel affect more users. This is why writing a PR is often more effective than pointing things out or opening issues. |
I have an open issue about this here trying to solve this in the plugin. The idea is to wait 12 blocks before serving the zone so that we can have a semi-globally consistent SOA serial across multiple hsd instances serving the exact same zone (assuming no reorgs larger than 12 blocks i think)
Ideally, hsd shouldn't really serve any zone data before it's fully synced. Simply restarting hsd will cause all kinds of unexpected issues and break lots of sites because it serves stale data that recursive resolvers will cache for a couple of hours. In the worst case, it will serve compromised DS keys that site owners have updated, but users will still be vulnerable because they will get the old key. |
Yes ok I like this a lot, and we were already discussing the disparity between hsd and hnsd -- a good solution is for hsd to ALSO use "safe height for resolution" (12 blocks) and then use the timestamp from the tree interval block as the SOA serial. There will be some confused users as we deploy this but it does seem like that covers everything. |
we can also use Lines 2877 to 2894 in df997a4
Lines 400 to 405 in df997a4
|
I would hope, in that situation the stale
TBH, for me that would rule that out - IMHO that's terrible
Yeah - that's fair - if you can do that, it would be the best on a (non-blockchain) high volume DNS server I wrote in the past, that used a push journal stream update, you could keep journal blocks coming in while the DNS server was down & it would rush-process them (at start-up) to catch up before serving any data. It would also ask the upstream the latest journal serial number & wait until it had at least reached that serial before serving data - it meant there could be a small number of journal blocks that hadn't been processed before it starts serving, but it would only be a few & they'd get processed very quickly after serving started. Have to say, it did slightly disturb me that a new install of If
oh, sure, but as you can see from the detail of the discussion, there's no way I'd ever come up with anything suitable, just not enough background knowledge!
so long as there's near 100% chance of it always increasing - seems fine to me
Not sure about the bridge, but this is exactly what the slave will be doing - SOA polling over UDP looking for an increased Serial
Any DNS s/w should be able to automatically cope with the serial number dropping from |
I assume this was installed on HDD, syncing up on an SSD shouldn't take longer than 3 hours. |
If an attacker was in the middle it should have no problem giving DNS answers signed by the old key (acting as the TLD's authoritative server). Of course, i'm talking about the worst case here, and this only works on not yet synced hsd nodes.
Any chance of having a similar function to isReady() {
if (!this.synced)
return false;
return this.tip.time >= this.network.now() - 2 * 60 * 60;
} Hmm is there a case where this could return false forever or take way too long? |
Block timestamps are only required to be greater than MTP (Median Time Past) which is the median of the last 11 blocks timestamps and usually ends up being about an hour behind actual UTC time. They are also required to be no greater than two hours ahead of actual UTC time. Sometimes (not often but) blocks can take over an hour to find by the miners. So there is a case where an in-fact-fully-synced node will stop resolving because the chain tip timestamp is < 2 hours ago. I think 12 hours is probably ok for this, but we can still compromise on 6 hours which makes sense anyway since thats the tree interval on mainnet. Even if we started resolving 24-hour-old data the worst case is that a key is trusted that was revoked less than one day ago. Question for you DNS experts: how long do you normally expect a DNSSEC update to propagate through DNS anyway? |
yeah, HDD, but RAID - so not the worst case scenario - also 3 hrs of giving out incorrect And as time goes on, this will only get longer & longer - currently participation in this project is relatively low - for example, ICANN ROOT servers get terabytes of queries each, every day. its a real shame the DNS data couldn't be separated from all the auction & $HNS transaction data - but I can see splitting off where proof-of-ownership actually occurs is tricky without all the supporting evidence.
if the scenario is you changed the
Or you give out But if you change a
If the zone changes their CloudFlare do it correctly & will flip almost immediately - most others wait for the TTL on validated keys to expire before dropping them. Cos Obviously it also depends on |
Yeah, even 24-hour is an improvement. If we can make it 6 hours, that's even better (assuming no issues)
Yeah, a proper key roll over should be performed. If For example, you have this DS record in the root zone:
To "roll" the DS, you should first add a new one (while still keeping the old DS).
Resolvers may still have the old DS RRSet cached for For a safe DS rollover:
Rolling a DS safely requires two updates to the root zone. Alternatively, you can always have an emergency standby DS added that you keep secure somewhere. If the active DS/DNSKEYs are compromised, you can just remove them and start using the new ones immediately. This requires one update to remove the old DS. Of course, this area is still improving so some better techniques may come up. |
I think adding the new keys before adding the new Adding the new I was trying to move from one DNSSEC signing provider to another (for a large client). In the end the conclusion was that the only way to do it was to go unsigned for a while!! ... although I think I've got a plan that would work now IMHO best thing is to do is add the new keys, then add the new One RFC says so long as there is any path to validate any The powers that be™ are aware of this contradiction and plan to issue a clarification. If you read the official methodology for changing external signing provider, you'll discover that there isn't a single piece of DNS s/w that supports it! Changing KSK algorithm is also a nightmare. I can totally see why a lot of well known sites just don't use DNSSEC - there's little tangible advantage (advantages an MBA could measure), but there are all sorts of nasty corner cases that can bring your site down. PowerDNS does a good job of making it a lot easier to implement. |
Having an additional DS without a corresponding DNSKEY is okay and this was mentioned from the very early DNSSEC RFCs but it may get tricky when changing algorithms. I like the DNSKEY first more actually because it allows your new DNSKEY(s) to propagate in resolvers cache while still waiting for Handshake root zone to update. So you can do both at the same time actually especially if your DNSKEYs will propagate faster (depends on TTL). RFC-7583 is dedicated to this and explains drawbacks of different techniques but doesn't cover algorithm changes.
Yup that should be the case.
This may be tricky when considering message digest algorithm and DNSKEY algorithm downgrades. I can see why this is useful. If two trust anchors are present, one with a stronger algorithm and one weaker, a validating resolver may want to favor the stronger. Mainstream resolvers don't do this though because they have to accept any valid path.
There's a small advantage to securing A/AAAA records. WebPKI threat model doesn't rely on DNSSEC. DANE is the killer app and what makes it worth it.
Yeah there's confusion there and some resolvers try to interpret the RFCs more strictly. I think what makes DNSSEC hard is having to think about those TTLs and the global cache. Validating resolvers should perhaps try to be more lenient and request new keys when validation fails although this could increase load or introduce new forms of denial of service since any bogus answer would trigger multiple queries. |
Right - with PowerDNS you can also propagate keys "inactive" first, which is nice - its their recommended method for ZSK Rollover |
Someone should write a plugin for hsd to automate rollovers perhaps by querying CDS/CDNSKEY records ;) Since we can easily update the root zone, parent/child communication should really be automated. |
Ok so if I can boil this discussion down to a set of code changes we all agree on, I'll open a PR:
This will:
|
sounds like a typical techie comment 😆 |
Sounds good! We can use extended DNS errors RFC8914 edns option to indicate that the chain is syncing just to make it easier to differentiate other types of SERVFAIL |
Sounds like a fine plan, you would need to check the client asked using EDNS, of course Generally,
|
yup REFUSED is usually used if they don't want/not ready to answer (it really just depends on preference). For example, Knot DNS (an authoritative) will give a REFUSED answer if it's not ready to respond to an AXFR request (if transfers are temporarily paused). The REFUSED answer uses Extended DNS Errors with EDE error code 14 (Not Ready): 4.15. Extended DNS Error Code 14 - Not Ready
|
yeah, sounds better - it's a fine line RFC1035 says
|
This is effectively a duplicate, but broader description, of #559
In DNS, the purpose of the SOA Serial is to tell the clients the version of the data currently being served.
This is NOT being fulfilled by simply serving out the current date & time as it fails to take into account when a server is out-of-date & is still catching up, e.g. due to maintenance downtime or connectivity issues.
This causes a problem when running two instances of
hsd
(for failover) in conjunction with Buffrr's AXFR plug-in to feed the merged ROOT zone to one or more slaves. If onehsd
server is taken down for a day or two, then brought back up - it will immediately lie that it has the latest data, when in fact it is still catching up.This can cause the data on a downstream slave to be rolled back to an earlier version & the slave will then not be updated until the clock marches forward.
I've pointed this out varios devs at various times, but so far it's not fixed (v3.0.0)
The timestamp on the last block that was included in the most recent urkel tree update seems a reasonable choice to me, or this timestamp could be converting into
YYYYMMSSXX
format, should you prefer, but many TLDs use unixtime as the SOA Serial these days.Using any information that is always increasing, from the last block that was included in the most recent urkel tree update, will ensure that only when two servers are serving the same version of information will they return the same SOA Serial.
The text was updated successfully, but these errors were encountered: