Topic: 0-RTT and ordinary TCP (Read 2667 times)

Weaver · « **on:** November 14, 2022, 05:02:25 AM »

I have been reading about 0-RTT. I had a vaguely similar idea a long time ago, or I ought to say ‘analogous’ if that’s the right word. Say you have a transport layer and you cache some of the parameters associated with performance characteristics of the link, in order to reuse those certain numbers when a new transport is created without needing to re-determine the values. There are limits to the relevance of the old data. The link or path could have changed, as could the destination machine’s characteristics. Indeed the traffic level at the local machines nearest bottleneck could well have changed and some of the number might be dependent on that. But two extremes of options for deciding on reuse of cached information as being a relevant-too match enabling reuse could be chosen. One is ‘narrowest’ where the destination address has to be the same. Widest would be a match only on the pair of the local machine’s interface plus the gateway used.

I don’t see any reason for discarding the cached data on a timeout though, unless it’s for management of RAM consumption. That is, an expiry date isn’t going to be anything more than arbitrary and non-reuse because of age will kill the benefit of the proposed change.

The idea is that with for example plain old TCP, slow start could be avoided. This would be a wondrous thing, no?

Coming back to expiry, which I’m against, what you could usefully do however is invalidate, or better, alter cache data when the performance characteristics of the first hop or nearest local bottleneck changes, or when the local interface used or the gateway used changes. A better thing though to prevent such flushing would be for a cache entry indexed by destination address to be able to contain multiple sub-records for alternative paths. For example, cache entry for wikipedia.com contains several sub-records: one for wired ethernet+site "home", one for WLAN + site "home", one for mobile+4G and one for mobile+5G. So a change of route would not then cause flushing, just a switch sideways to a different sub-record.

Messing about with randomised addresses in IPv6 on the part of a destination could be a real problem, messing up the indexing of a cache design. Using domain names instead wouldn’t help as not every transport connection uses any associated domain name. Need somthing like an L2 address of a destination machine’s relevant interface, but even MAC addresses can be bogus. Maybe the discoveries made when connection mobility was researched for QUIC would help as the QUIC designers need to get the handling of their connection ids right (or whatever they’re called).

Seems that getting cache design 100% right would be non-trivial, but maybe 100% wouldn’t matter and the idea would still give strong benefits even if this aspect is not perfect.

It seems to me that even if the whole thing goes wrong, and inappropriate, old data from the cache gets reused, then it wouldn’t be the end of the world. The transport would just have to adapt, which as far as I can see would be no worse than experiencing a sudden unpredicted high traffic level out of nowhere when you’re trying to communicate with your favourite transport and you just get caught out, with your internal transport behaviour-governing parameters/numbers/flags being suddenly wrong.

XGS_Is_On · « **Reply #1 on:** November 20, 2022, 04:47:54 PM »

Slow start can already be disabled in TCP stacks so that they move straight into congestion avoidance. Combined with high initial windows and something like HSTCP it ensures links are maxed out rapidly and requires no caching of any information.

Path characteristics can only really be assured within a network through control of the hops. Going out into the public Internet unless the networks are very simple with minimal hop count characteristics can vary per flow and even per packet.

https://en.wikipedia.org/wiki/Equal-cost_multi-path_routing
https://en.wikipedia.org/wiki/Link_aggregation

Weaver · « **Reply #2 on:** December 02, 2022, 04:03:39 PM »

Referring also back to https://forum.kitz.co.uk/index.php/topic,23994.0.html.

The slow start calculation can possibly use certain parameters that rarely change their values. In the case of a host at small / medium business or domestic LAN, the maximum bandwidth in/out may very well be limited by the first link upstream of the LAN gateway router so it’s hop 2 where hop 1 is across the LAN only or wireless lan. I keep thinking about the idea of a network info server, which could be in a gateway router if the software is upgrades or to have a new standard module plugged into it, or else in some other server discoverable by various zeroconf methods and perhaps using multicast with a well known address. This is not a million miles away from the idea of DHCP. One could query such an info server and ask it for an enum value for the current network use-case or topology situation, which could come back with "internet access bottlenecked by link, at hop 2" bandwidth = 1 Gbps inbound | 0.2 Gbps outbound. Those bandwidth bottleneck values would be the limiting values regardless of the path to the remote destination, provided that we are using NIC xx = LAN i/f or WLAN i/f. In the case of a wireless LAN though, hop 1 might be the true bottleneck so would have to combine the two potential bottlenecks of the ever-changing wifi performance and the throughput of the internet access link hop 2. A flag telling us that the hop 1 is very variable rate, in the case of a WLAN, therefore hop 2 is not the true bottleneck would mean that it is not easy to use hop 2 or hop 1 rates as guides in a slow-start equation.

I don’t know how intelligent one can be if one knows that the throughput up/downstream cannot exceed some value but otherwise the current value is either unknown or has to be obtained immediately either by some probing procedure - for which I don’t have an algorithm - or obtained immediately as an instantaneous current value from the info server. In that latter case there’s the problem of how up-to-date such dynamic performance figures might be - presumably they would be very iffy. It might just be best to only use the "cannot exceed x" type,of information and that might or might not be useful, but I don’t know of algorithms that could take advantage of such "limit" or "max possible rate figures. It could prevent a very aggressive and far from slow slow start from dumping far far too much data into a link upstream, only for it all to get dropped, and possibly forming a big queue whilst tying up a link outbound. But you presumably would still be too aggressive and be causing a problem with an initial overestimate of an upstream rate in your not very slow slow start. The alternative in the case of wifi is to forget any algorithmic ‘enhancements’ and just do normal slow start without the use of parameter caching from older sessions.

That scenario / use-case is the enum value #1 where you are a host in a domestic or similar LAN or WLAN attached indirectly to the greater internet by an internet access link. The other major scenario / use-case enum #2 is ‘direct attach server’. Here there is no internet access that is link like FTTP / FTTx or xDSL in the first scenario. Here we assume that you are a server whose throughput is limited by its own NIC(s) plus the bandwidth of the LAN that the server is living on. This has to be combined with the load of all the traffic coming into out of the NIC.

So that very variable rate limits the performance of the whole path. Either the server at the far end is ‘free’ or it’s having to also deal with a lot of other traffic from other client machines, and their traffic could be taking up an unknown fraction of the server’s capacity at its NIC.

So does that latter observation mean that cached parameters and their use in informing slow start is only doable if you have an algorithm that can make use of limits that is "max possible" values ?

One other kind of parameter that might be a candidate for exploitation in some algorithm informing slow-start using cached info is latency. To be completely clear, we probably want one-way trip time, not round trip time here. So given that the destination machine cannot be closer than our own ISP’s router first encountered when we’re going upstream to The Isle of Dogs, say, where all known servers live, and this is now assumed to be use-case. From that point we have to cross n other fast, low latency inks to get to our server of some sort on another network. Or in my case we could be coming all the way back up Britain to Skye again, which is where the remote machine lives, and it’s Janet my wife’s machine. So two light travel times up/down the north-south length of Britain for just one half of an RTT. So that’s why I need to bribe AA and BT into having POPs in Edinburgh at the internet exchange or even setting up POPs in Inverness. The latter probably isn’t worth doing just to get only the lesser improvement in latency of Skye-Inverness vs Skye-Edinburgh. It’s about 210 miles from Edinburgh to Broadford in Skye according to Apple Siri, and it’s about 85 miles from Broadford to Inverness. So I make that an improvement of quite a bit less than 1 ms, roughly a saving of 0.67 ms latency = for the one way trip.

The way that a latency figure might possibly be used is because if the latency is high then you’re safe to dump a larger amount of initial data into the network when doing a protocol CONNECT at the very beginning of a slow-start-type conversation. Also latency is reliable; unless you change NIC and with it the possible method you’re using to connect to the internet, then the latency never decreases. Having a safe minimum latency is excellent; and if it increases because of traffic or some path-change then no harm is done to our algorithmic considerations.

There is a general point here. Cached info has to be stored in multiple-instance objects, one for each tuple of ( NIC + ISP + etc ) = at a minimum thr method of attachment to the internet = the internet access link / path. There may need to be some other fields in that tuple that I’ve forgotten. Changing ISP might have all kinds of effects on the significant parameters. I ought to have included something like "per-WLAN" or per SSID, but really that should be something where you put SSID values into sets of attachment / location / site classes. Moving to a different site: home / work / hotel / café / public wifi service or some such will all involve different wifi performance limits and dynamic congestion traffic load levels plus differing ISPs too. Do changes to that tuple mean switching to a different entry in the cache of slow-start informing parameters, or no cached info at all. I seem to remember that MS Windows had classes of wireless LANs but they were three in number and were only to do with trust levels, not identifying ISPs or internet access link types or identifying performance types. That was a worthwhile design of Microsoft’s as straight away it made some righteous security decisions for the user and it was easy to understand. The enum class values were something like: "home" vs "work" vs "public-something-service" (untrusted) but perhaps a few additional types were needed. I’m thinking about Windows 7, which was when I gave up on the WinNT family for good and because a semi-bedbound Apple person.

I do think that it’s not necessary to be totally pessimistic and assume that slow-start cannot be somewhat improved give imperfect, "limit" / "max" throughput info and latency in the outbound one-way trip (ie half an RTT).

I would love to do some more work on the info-server idea. Maybe people have already said ‘just use DHCPv4’ and add extensions to it, but this has to work in IPv6 and I’m not a huge fan of DHCP and I’m not keen on the idea of stretching it that far, way way beyond what it was initially designed for. I’m not a fan of DHCPv6 either, plus the fact that few people use it. So I’d like to do a clean new design, and to keep sysadmins happy, I don’t want to be loading up the network with traffic on a large LAN. So initial ideas involve the use of multicast.

XGS_Is_On · « **Reply #3 on:** December 14, 2022, 11:48:19 AM »

If you're saying what I think you're saying how would you manage when the remote side may be taking thousands of connections per second from devices all wanting to transmit at their max or limit? For an ingress server this would be lethal. For a server that's mostly outbound the server is probably going to be better served running a slow start and ramping up than having to manage thousands of telemetry connections per second to work out the throughput. On a well connected CDN you're going to have servers within 25-50 ms of the vast majority of clients so would be better served slow starting rather than sending a query to your local info server which in turn would presumably then have to go to every info server in the chain in a heirarchical manner similar to DNS else everyone has to hold the entire table.

One of the reasons behind slow start running at a conservative value initially is to allow a new client to ramp up to the level of existing ones as time goes along rather than jumping in and immediately degrading them all, then having its own throughput degraded by an endless cycle of new clients making the exercise somewhat pointless as congestion control ensures no-one has a happy time for anything other than a very brief connection.

To properly manage bandwidth you need to control both sides of the link as without outbound shaping there will be congestion. In the case of many clients accessing the same resource a lot of congestion if flows aren't starting off slow and being managed.

This ignoring that the database of 'max' and 'limit' would be enormous: either every route between every node, including nodes behind NAT in turn exposing them, or at very least a selected 'primary' route requiring integration with routing protocols across the entire length of the link to detect path changes. The database would also expose ISPs to security issues as it would be trivial to iterate through IP space, form a map of a network and find where the bottlenecks are to attack.

Also: how do you handle cloud or virtual shared networks? These don't have a NIC to themselves, they have virtual NICs per host with various bandwidths depending what the tenant has paid for. You're going to not just need an entry for every possible route between each IP address or at least a main route with convergence triggered by a routing protocol but an entry for every possible route between every virtual NIC on every cloud and on-premises VM.

Chrysalis · « **Reply #4 on:** December 14, 2022, 08:02:09 PM »

I think what you described can happen with steam on narrow pipes, its multi threaded downloads, but these are not long lived connections they constantly stop and start only downloading short bursts of data at a time, and one of the reasons it caused me problems on DSL is they would initially burst high when first opened, which sounds like the problem you just described that can happen when been too aggressive on low latency sessions?

One way I used to help tame it was to pick a steam location with higher latency.

Weaver · « **Reply #5 on:** December 14, 2022, 10:43:19 PM »

@XGS_Is_On I agree with all you say. The database design is a nightmare. About info servers. These would be consulted rarely by hosts and the info cached. If it turns out that the numbers are useful they could report a lot of stuff such as the type of pipe going out, and the local-end bottleneck’s egress rate and ingress rate too in case there might also be some use for that. Dynamic ‘free bandwidth’ rate figures are not necessarily either doable or useful and indeed they could be very bad news. The security aspects need to be taken care of carefully, by rejecting bogus info answer packets coming from outside the LAN (or slightly larger AS-like topological unit). One thing I’m unsure about: restricting it to only use link-local addresses would perhaps help with security, but might be a shame. Mind you, if you did that, would there be any need for an IPv4 implementation, because everyone can do link-local IPv6 anyway. What do you think ? (Thinking of 15 years ago when Microsoft made their Windows instant messaging application IPv6-only, even going so far as to build-in auto-Teredo for people who didn’t have IPv6.O

I wasn’t intending to put such info servers in or near LANs in the core of the internet. They are intended for domestic users and enterprises of different sizes. The info server idea originally came up as a way to spread other general info that is not rapidly changing, and is a competitor to DHCPv4 which already does this kind of thing but extending it has its limits and I’m not a huge fan of DHCPv6. It would seem to me to be more sane for this to be one info server access protocol for both IPv4/IPv6. I might have discussed it in an old thread somewhere, I forget. But the idea would be to have a service located by multicast, and also make sure we allow multiple servers for load handling and reliability. I was also wondering about an L2-only third version of the protocol. (No IP required.)

XGS_Is_On · « **Reply #6 on:** December 15, 2022, 01:25:21 AM »

So the telemetry on end to end bandwidth is a thing but only in overlay networks and is between what could be described as routers, not individual hosts. The overlay is controlled by a single entity with all the network nodes either running in a logical mesh or hub and spoke.

The link between two nodes will have both a maximum value based on the bandwidths per interface either side, and can have a current value that may be probed for.

Monitoring capacity and signaling to peers their fair share can be done easily enough. Probing across variable capacity links that we control either side of can be done by initiating a transfer using something like LEDBAT, RFC 6817, periodically at a low QoS priority and measuring throughput. LEDBAT backs off when latency rises and before loss kicks in potentially so is a useful congestion canary.

The issue is how you get the telemetry from one side to another and, indeed, what you are trying to get from one side to the other.

This is potentially a comprehension fail on my part but just to get maximums end to end without probing is problematic.

NIC bandwidths aren't going to help: I've 25G links which then go into a 2 * 10G LAG, then a 10G link to an ONT where they go onto a 10G less about 15% FEC shared PON. A neighbour has slightly faster Internet than LAN
We need the ONT to participate in such a protocol.

To get accurate data needs participation across the link, not just at the end stations. If an intermediate hop is a bottleneck that's a problem. Only way to know is for all the ASNs involved to exchange telemetry which would have to be over BGP then disseminated into the ASN.

The ISP would have to have a telemetry flow to every customer to signal bottlenecks. The remote ASN isn't going to talk with you and intermediate transit ASNs aren't either. They talk with the ASNs they connect to.

Which brings me on to another issue: paths within a network can be very dynamic. Segment based routing is a thing. ECMP is a thing, different packets in the same flow taking different paths. LAGs are a thing, different flows between the same hosts taking different paths. Network nodes can specify the route they want their traffic to take. Traffic engineering is a thing: paths can change depending on load. The A&A network is very simple however some are quite complex.

Anycast is another issue. Cloudflare DNS is always on the other side of 1.1.1.1 but you can go to many, many different DCs to get there with many different latencies and throughputs.

I'm not quite clear on what the use case is. To remove slow start you need to know the link capacity end to end when you start to transmit. To know that you need to probe. To probe, given this is between end user equipment not routers, you would use a slow start of some kind as anything else would involve basically an iPerf.

If slow start is going to make much of an impression in the transfer time you don't have much data to shift to begin with so may as well just start the transfer rather than sending a burst to measure the link.

If what you're after is the maximum bandwidth, no use in eliminating slow start, the standard congestion control and avoidance, LEDBAT or something similar are fine. You'll find the maximum at any one moment during the course of the transfer.

Again I suspect this is a comprehension failure on my part. My experience is more in the network than on the individual hosts and there my competency lies: among many other things getting the information for path selection, finding the right path for traffic depending on needs and either avoiding congestion entirely or mitigating its effects through quality of service. This absolutely leaves me lacking in many other ways!

Weaver · « **Reply #7 on:** December 15, 2022, 01:59:37 AM »

I ought to find the old thread which is where all this started from. Broadcasting general information was the original sole idea, and the range of information published was not intended to include more ‘difficult’ rapidly changing info, in fact that might be ruled out to avoid thrashing servers or creating too much LAN traffic. Then the second idea came about solely concerned with how do we get rid of slow-start ? If such a thing is even possible, which is doubtful. I wondered if there could be maybe a middle option where slow-start can be far, far more aggressive and complete fast but without going wildly ott as it would at least know one very important bottleneck, so would not overload the internet access link unless there was competing traffic at the same time, which is something not accounted for in the protocol. Mind you one could ask the info server (assuming it is either in the gateway router or can query the gateway router) what the current traffic level is like, either getting a pair of numbers that are for traffic in both directions, or a single bit - quiet or busy. If busy then we just use more conservative or traditional slow start.

burakkucat · « **Reply #8 on:** December 15, 2022, 03:49:12 PM »

Have you ever considered writing an RFC for your idea? Or are you still in the contemplation phase? $:-\$

XGS_Is_On · « **Reply #9 on:** December 20, 2022, 12:39:27 PM »

Quote from: Weaver on December 02, 2022, 04:03:39 PM

To be completely clear, we probably want one-way trip time, not round trip time here. So given that the destination machine cannot be closer than our own ISP’s router first encountered when we’re going upstream to The Isle of Dogs, say, where all known servers live, and this is now assumed to be use-case. From that point we have to cross n other fast, low latency inks to get to our server of some sort on another network. Or in my case we could be coming all the way back up Britain to Skye again, which is where the remote machine lives, and it’s Janet my wife’s machine. So two light travel times up/down the north-south length of Britain for just one half of an RTT. So that’s why I need to bribe AA and BT into having POPs in Edinburgh at the internet exchange or even setting up POPs in Inverness.

I take from this that Janet's machine and your own are on different ISPs.

It'd take a pretty big bribe: there just aren't the people and in turn the data there to justify the infrastructure build out. Is the round trip really that problematic? We're talking perhaps 20-25 ms?

News:

Author Topic: 0-RTT and ordinary TCP (Read 2667 times)

Weaver

0-RTT and ordinary TCP

XGS_Is_On

Re: 0-RTT and ordinary TCP

Weaver

Re: 0-RTT and ordinary TCP

XGS_Is_On

Re: 0-RTT and ordinary TCP

Chrysalis

Re: 0-RTT and ordinary TCP

Weaver

Re: 0-RTT and ordinary TCP

XGS_Is_On

Re: 0-RTT and ordinary TCP

Weaver

Re: 0-RTT and ordinary TCP

burakkucat

Re: 0-RTT and ordinary TCP

XGS_Is_On

Re: 0-RTT and ordinary TCP