Kitz Forum

Internet => General Internet => Topic started by: Weaver on January 27, 2017, 11:14:39 AM

Title: How big is the DNS?
Post by: Weaver on January 27, 2017, 11:14:39 AM
I'm just wondering how much space it would take to store the entire DNS? Any ideas?

It is highly compressible and for my purposes the entire thing could be sorted and delta encoded as one of a number of multiple heavy compression methods.
Title: Re: How big is the DNS?
Post by: d2d4j on January 27, 2017, 11:29:10 AM
Hi weaver

It is not a storage issue, rather the speed and accuracy of the served records that count, which the quicker the better, circa 0.2ms or better

Compressing dns records just adds to resolution of record, which is/has to be public, unless it's clustered and your talking of dns sync between NS

Many thanks

John
Title: Re: How big is the DNS?
Post by: Weaver on January 27, 2017, 09:02:16 PM
Apologies, it's a design research idea, not a question about how the dns works. How much storage is needed to fully pre-populate a complete dns cache that covering the entire space down to every leaf? And I would think later about caching such a single database file into an optimised complete download.
Title: Re: How big is the DNS?
Post by: Weaver on January 27, 2017, 09:20:22 PM
Imagine transmitting the entire DNS by broadcast one-way satellite links, on several channels. On one group of channels you transmit simply everything, in a rotating fashion, in order to populate a DNS cache. On a second channel group, you transmit all the frequently-used stuff only, otherwise it's the same as the first. On a third group of channels, you transmit all the updates as they happen. One final channel group, group four, has only one channel in it and duplicates of all the group three content combined into a single channel.

Each channel is very highly compressed, and is pre-sorted so that delta compression could be one of the early compression layers. After every so-many records, the stream of data has to be broken up in such a way that a listener can synch up with the compression scheme and get into understand it having only missed a certain number or seconds worth of entries before they hear a synchronisation symbol followed by enough info to get them started with the compression scheme

When you first become a client, you listen to as many channels in group two as you have radios, in order to get up and running with something useful as quickly as possible. If you have even more radios, you listen to group one channels as well, otherwise you switch to absorbing group one after group two. Group two content is not duplicated in group one, it is simply omitted silently, so a new client always has to listen to all of both channel groups eventually.

The groups one to three are partitioned so that a channel in a group only carries a known part of the DNS, which increases the speed of rotation, and allows faster startup time if you have multiple radios. A fast, cheap hash function tells you which channel in a group a particular domain name will be mentioned on. Info with a list of parameters to the hash function and a map of the set of channels are transmitted on every channel frequently. Each channel simply rotates, it starts transmitting again from the beginning when it gets to the end of the data.

This partially completed design has some outstanding problems, the main one currently being the synchronisation between group one and group three / four content. Including some data in group three that is slightly old - but how old? - bridges the gap between the intialisation-time ‘full’ database and the live updates’ stream. Actually though, getting it wrong is not critical, because this is only a cache and being incomplete simply means that an attached normal DNS server has to consult the real live DNS more often. Also, without an estimate of database sizes and channel throughput I don't know if this plan is useful or even feasible. Neither do I know what the rate of DNS updates is, so I don't know whether channel four might be able to keep up with the rate of change of the entire world.

If the numbers are too large however, I could always simply reduce the scope of problem, and deal with only part of the database, such as the well-known TLDs plus one or more ccTLDs.
Title: Re: How big is the DNS?
Post by: d2d4j on January 27, 2017, 09:55:28 PM
Hi weaver

Many thanks

I'm sorry, I am not sure what your idea is or it's use in the real world.

Are you proposing to replace the dns cache held on computers, or a new dns server

Many thanks

John
Title: Re: How big is the DNS?
Post by: Weaver on January 27, 2017, 10:31:39 PM
I was wondering if a performance booster for ISP or smaller networks was feasible or even possible. It is a thought experiment. Having a hugely prepopulated cache locally would slash latency for DNS queries and cut startup times for web browsing down to near zero. But it would need a source of free broadcast bandwidth, something like satellite, to fill the cache before users need it.

I have used to be working on new designs professionally for many years. I keep coming up with new / stupid / impractical ideas. But some of them occasionally turn out to be very good. If no one checks out the stupid ideas to see if they are actually worth anything then things will never advance. It doesn't mean that I am optimistic, but that doesn't matter.
Title: Re: How big is the DNS?
Post by: Weaver on January 27, 2017, 10:33:52 PM
If anyone ever sees some rough stats on the size of the DNS then pls let me know. I don't know how one might research such a thing.

Does anyone know if there are difficulties with crawling the DNS? I don't know enough about the subject. I realise that I have not considered the difficulties in obtaining the data to begin with. Perhaps that is a showstopper.
Title: Re: How big is the DNS?
Post by: d2d4j on January 27, 2017, 10:40:31 PM
Hi weaver

Sorry, I'm not having a go at you, or criticising you in any way sorry

I think your proposing to replacing pc dns cache, but this works fine from what I see/experience, my personal thoughts only

The TTL is the determining factor for record cache, but it is unclear over when TTL starts, dns server record TTL or cache TTL. I think cache TTL


You maybe better looking at dns servers and how they can be improved, but large leaps forward have been made, but the overriding factors are speed and qps (queries per second) which they can handle

Just my thoughts so apologies if I seem to come across as been negative

Many thanks

John
Title: Re: How big is the DNS?
Post by: Weaver on January 27, 2017, 10:50:00 PM
Not proposing to replace PC DNS cache at all. Indeed could never do so. This is a design for an accelerator to add on the side of an upstream proxy cache to be located one or two hops further up, solely in order to reduce query round-trip time on many normal-cache misses to near zero.

And it is merely a thought experiment, as I have stressed. I am not trying to sell the merits of this idea, just to get a feel for whether or not it is even possible. That is what a thought experiment is. You perhaps are not used to living with theoretical physicists. :-)
Title: Re: How big is the DNS?
Post by: d2d4j on January 27, 2017, 11:14:50 PM
Hi weaver

Many thanks and sorry, your correct, I do not live in the theoretical world, only the real world, with real problems and real people. Pleas do not overreact to that, I mean no disrespect to you

I have always been labeled as seeing only black and white, no grey areas, which is true, but this has never stopped my thinking of real world applications, so I do try to think out side the box, but until the penny drops and I understand what your trying to do/against what is already in use, I cannot see what benefit it would produce.

Cache dns gives near zero lookups for dns, unless it's TTL has expired, at which it should take less then 0.2ms real world lookup time, so please forgive me, but I fail to see the application other then say saving a few 0.01ms time

Once again I apologise if I am wrong, it is just me and black and white look on life

Many thanks

John
Title: Re: How big is the DNS?
Post by: Weaver on January 27, 2017, 11:32:19 PM
No problem d2d4j, and nor was there ever any, your comments were taken as helpful and valid opinion, we were just thinking along different lines. An you are also probably not full of (a) crap, and (b) NHS drugs, which kitizens are asked to make allowance for.  ;D ;D
Title: Re: How big is the DNS?
Post by: Weaver on January 27, 2017, 11:33:12 PM
 if anyone knows anything about crawling the dns, then please comment.
Title: Re: How big is the DNS?
Post by: burakkucat on January 27, 2017, 11:38:21 PM
I'm just wondering how much space it would take to store the entire DNS?

I read that question and thought "Huh?"  ???  It gave me a very funny feeling which only subsided when I consulted the Wikipedia entry for Domain Name System (https://en.wikipedia.org/wiki/Domain_Name_System).

Having stated that, I don't think I can contribute anything to your thought experiment . . . though I will certainly read whatever you are prepared to post.  :)
Title: Re: How big is the DNS?
Post by: Weaver on January 27, 2017, 11:58:39 PM
If you pick a particular DNS server, can you suck the entire contents out of it, without particular privileged or non-standardised access?
Title: Re: How big is the DNS?
Post by: adrianw on January 28, 2017, 07:37:28 AM
If you pick a particular DNS server, can you suck the entire contents out of it, without particular privileged or non-standardised access?

Some, not all, DNS servers support "zone transfer" (AXFR and an incremental version IXFR) to pull all the entries for a domain (sanitised example extracts below for my home system). This is normally used to allow slave DNS servers to catch up with the master when a change is made. Normally only permitted from certain IP addresses, such as the slaves. See https://en.wikipedia.org/wiki/DNS_zone_transfer

I can envisage an organisation's security people having fits if all the publicly addressable machines names and IP addresses were available as a neat list to all and sundry.

I can also envisage them thinking themselves under attack if you were to start doing a lot of lookups against their name servers, with reactions ranging from rate limiting you to hunting you down.

Also:

While some ISP's have good DNS servers for their customers, some have very poor ones with their wiser customers using Google, OpenDNS or other name servers. If an organization is not prepared to run a few good customer name servers, are they likely to pay out to enhance what they have?

There is not a simple 1:1 mapping between a name and address. Consider CNAMEs (a sort of alias), pools (where one name is supported by several addresses), the games played by content delivery networks, ...

An interesting project though. I seem to recall scorn being paid on the ideas of indexing and archiving the entire web, so don't be too put off.

Example zone transfer
Made on my home slave name server - a Beaglebone Black running FreeBSD and BIND.
[aw1@beaglebone ~]$ dig @localhost mydomain axfr

; <<>> DiG 9.10.3-P4 <<>> @localhost mydomain axfr
; (1 server found)
;; global options: +cmd
mydomain.     60      IN      SOA     swelter.mydomain. myemail. 2016123000 60 60 3600000 86400
mydomain.     60      IN      NS      titus.mydomain.
...
access4.mydomain. 60  IN      A       192.168.1.12
access4.mydomain. 60  IN      MX      1 access4.mydomain.
... (ad nauseaum - A and often MX entries for many hosts and all DHCP pool addresses))
mydomain.     60      IN      SOA     swelter.mydomain. myemail. 2016123000 60 60 3600000 86400
;; Query time: 12 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sat Jan 28 06:04:27 GMT 2017
;; XFR size: 325 records (messages 1, bytes 6834)

Title: Re: How big is the DNS?
Post by: Weaver on January 28, 2017, 07:50:28 AM
Yes, I thought that access restrictions and partial support would make crawling the DNS a very iffy affair. Overloading one server is hardly necessary as you have the entire DNS to access so you could and should interleaved the accesses and spread the load around across servers queried.

I suspect the entire process of gathering the data would be a nightmare unless people signed up to contribute towards an acceleration project.

One other area that might be worth looking at some time might be the use of multicast in the DNS, but I am doubtful initially.
Title: Re: How big is the DNS?
Post by: adrianw on January 28, 2017, 08:14:33 AM
Yes, interleaving would make you less noticeable.

Do you feel a RFC coming on (list of DNS related ones at http://www.bind9.net/rfc )?

Multicast DNS, limited to local networks, already exists - https://en.wikipedia.org/wiki/Multicast_DNS and https://tools.ietf.org/html/rfc6762
Title: Re: How big is the DNS?
Post by: Weaver on January 28, 2017, 08:18:42 AM
I was unclear - I am aware of mDNS and LLMNR. I meant servers pushing changes to downstream servers that subscribe, not clients using multicast to query servers or other hosts directly for the purposes of zero-config.
Title: Re: How big is the DNS?
Post by: adrianw on January 28, 2017, 08:59:18 AM
Ah. With you now, I think. No idea of how to achieve it.
Title: Re: How big is the DNS?
Post by: d2d4j on January 28, 2017, 09:07:31 AM
Hi weaver

Interworx already do this for dns to a degree, where a publisher is set and a listener is setup

Interworx can also use axfr as it is not bind, so axfr is needed for bind dns servers (which can be set either record, domain(s) or server wide for dns zones)

I can post a picture of dns sync between 2 or more services

Many thanks

John
Title: Re: How big is the DNS?
Post by: d2d4j on January 28, 2017, 09:10:51 AM
Sorry, I should have also said, this can be 1 way or 2 way direction, as some of the dns servers acting as listeners, can also be more then dns slaves, they could be primary dns as well

Many thanks

John
Title: Re: How big is the DNS?
Post by: d2d4j on January 28, 2017, 09:33:28 AM
Hi

Sorry, iPhone updated

The sync is not immediate though, it is timed sync, so if your idea was immediate sync, then interworx does not do that, but I could envisage big issues on high volume dns servers given the number of changes made

Many thanks

John
Title: Re: How big is the DNS?
Post by: Weaver on January 28, 2017, 09:59:40 AM
I really, really need to read up more on this subject. Memory is also faulty.

I wonder what log2(size of a typical ISP's cache) is? That is a question someone could answer. That would be an interesting way of getting round the problem of how to do crawling, instead use a very large organisation's cache as a source for broadcast.

But this idea if it were to be any use has to be delivered over zero-cost, unused bandwidth. It is not supposed to slow things down by competing for downstream bandwidth with important activity on your main pipes.

A lot of this would really really need to be done based on a classification of upstream servers as trusted or untrusted and their identities ought to be checked. Lying servers or redirects or men-in-the-middle or whatever would not be good.

Yahoo for example has published quite a bit on web browser performance and stresses what a killer the startup delay is due to DNS queries having to complete before a web browser can get anything else at all done in the very beginning. This suggests that this latency is never going to go away as bandwidth gets better and better, so its relative importance as a fraction of the initial webpage load time is in fact going to increase. Other items in a webpage, such as bitmaps or JS or ads on third party sites are going to be paralleliseable if there are several of them and if the browser can get to parsing enough stuff to discover references to these domain names soon enough. It would be a performance boost if pages could give prefetch domain name references very early, but that is often impossible, with dynamic stuff, rotating ads, references hidden inside material yet to be fetched.

This could be avoided by having servers take copies of third party objects and keeping them on the same server or at least on a server inside the same domain. Would stuff up ad blockers and defeat privacy champions, bad for us and good for the bad guys. Also thus kind of duplication would prevent multi-site caching of say shared JS and CSS, very bad for everyone. So I had better not mention it.

[Moderator edited to fix the damaged "Subject" line.]
Title: Re: How big is the DNS?
Post by: Weaver on January 28, 2017, 10:11:29 AM
A couple of other ideas:

1. Switching topics: cache-to-cache low-priority traffic delivered overnight over the conventional internet to populate caches from very fat upstream caches might be a useful idea, but only if the DNS entries have a long lifetime, and they are the ones that least need this help anyway. It is no use if the lifetime expires again halfway through the day.

2. (Unrelated.) I also wonder about whether or not it would be useful for a cache to refresh its entries before the lifetime expires, i.e. prefetch, but only selectively. If a cache entry has been repeatedly queried (scoring a hit) many times recently, then that means it is still useful to clients, or useful to many clients, anyway it means it is therefore worthy of being speculatively refreshed before it expires which will also get you a lifetime extension. Something could perhaps be worked out from the complete time distribution of hits on a cache entry.

3. I wonder if it would be worthwhile for intermediate upstream DNS caches to give out info about how often an entry is queried as part of a response or as part of a sync transfer. Would give downstream servers extra information about how useful a possible prefetch might conceivably be, but the downstream caches can know the pattern of their own clients’ queries, and it is that of course that is the most valuable statistical information on which to base decisions, as it is the more relevant, because it is client behaviour-specific and driven by the distribution of clients present.
Title: Re: How big is the DNS?
Post by: d2d4j on January 28, 2017, 10:13:08 AM
Hi weaver

I think I'm understanding a little more, I am rather slow sometimes sorry

Please remember pc dns cache, browser cache, isp dns cache, which all play a factor in your idea

If you go to pingdom.com, test a web page and it will show you dns lookup times. There is another web page tester which we use, but I just cannot think of it right now, but again, shows where the slowness comes from, dns lookup, file size, CDN etc

Also, you may want to look at powerDNS and cloudflare

I thought last night, if I understood correctly, you have a new compression algorithm, perhaps this may be better aimed at browser compression for internet or even for live sat nav

I know I have most likely fully misunderstood your idea sorry, but it does take time for my single brain cell to understand

Many thanks

John