Ok now I am on my PC I can be more clear.
So without prefetch or serve-expired its something like this.
First DNS lookup is not in the cache, so the DNS cache/forwarder has to send the query upstream, much slower query as a result, may have to query multiple DNS authoritative servers to get the result.
A DNS record has a TTL value, this value in seconds is how long it should be cached for, we assume it is been honoured.
If another lookup is done before the TTL hits 0, then the cached result is served with no upstream requests required.
When TTL hits 0, it expires from the cache and the next lookup will need upstream queries.
With prefetch it is similar except if the client requests a DNS record and the TTL is under 10% of time remaining, then the cached result will be served, but also the forwarder/cache will do another upstream request in the background to repopulate the cache with the full TTL again, making it more likely the next request will hit a cached result.
With serve-expired, the record will stay in the cache even when TTL has hit 0, the first request after this event will be served from the cache with no delay, but it will trigger the cache/forwarder to do a background request upstream to fetch a fresh result to repopulate the cache and reset the TTL. This pretty much ensures you always get some kind of cached results, in newer versions of unbound this feature has been expanded now that it can also serve cached results if the upstream DNS server fails to respond.
As an example my firewall (pfsense) runs unbound DNS resolver, it has both prefetch and serve-expired enabled, all my LAN dns queries go via this resolver, even if an app/software tries to hardcode a different DNS server, I have a firewall rule that forcefully redirects DNS queries to my LAN's DNS resolver. Browser's and operating systems often also have their own cache systems, to mitigate DNS latency, but if these are empty, and they will be often as many DNS records now only have TTL of under 30 seconds, then the cache will be hit on my firewall and DNS results served without going out to the internet. If queries do go upstream, my upstream DNS servers are private one's run by myself over dnscrypt so my isp (sky) cannot sniff my DNS traffic, and as backup I use DNS over TLS 1.1.1.1 (cloudflare) public DNS service.
I would be surprised if google, cloudflare etc. dont employ any of these techniques on their own DNS forwarders.
But what limits the number of calls on through to the authoritative server?
Well for a start, I think prefetch was deliberately designed to only work if someone requests inside that 10% TTL expiry window, for the purpose of restricting upstream requests, if prefetch was proactive and constantly auto refreshing cache, then yes you might consider that excessive requests.
Generally speaking a higher TTL value decreases DNS traffic, a decade ago, it was normal for a DNS record to have a TTL of at least 2 hours, not too uncommon for it to even be 24 hours. But now in the world of geo routing, ddos, and load balancing the fashion is extremely low expiries so traffic can be rerouted quickly if required. This inevitably increase traffic to authoritive servers, and if the admin's are making these decisions it is a reasonable assumption to make they are ensuring their DNS authoritative servers are capable of handling the load.
But one of the most popular DNS servers (bind), as an example is able to rate limit queries, so you can e.g. only allow an ip or an ip block only do X amount of queries per Y seconds. If the limit is exceeded they get blocked. You can also add multiple extra authoritative servers to lower the load for each individual server.