Kitz ADSL Broadband Information
adsl spacer  
Support this site
Home Broadband ISPs Tech Routers Wiki Forum
 
     
   Compare ISP   Rate your ISP
   Glossary   Glossary
 
Please login or register.

Login with username, password and session length
Advanced search  

News:

Author Topic: Upload performance whine - part 97 - TCP protocol traffic capture analysis  (Read 3847 times)

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick

I keep on whining on and on about poor upload performance; this is a real thing though because doing a backup of an iPad to the Apple iCloud can take over half an hour which is just stupid, and uploading photos can take several minutes.

I had a thought. If I do an upload to a test server, then do a packet capture which AA’s servers can do (and I think my WAPs can too iirc) I could look for answers in the detailed behaviour of TCP. I am hopeless at reading TCP dumps though, make my eyeballs swivel and all the drugs that I’m on don’t help on bit. Anyone any good at decoding TCP behaviour?

I did an upload test using https://testmy.net/upload and captured the results. The attached files zipped up below are firstly in pcap format and then secondly pre-decoded into plain text, so the latter can be read with eyeballs. I’m not sure if using tcpdump or wire shark myself will improve the readability of them, but AA decoded the results for me anyway, probably using tcpdump I would suppose.

The IPv6 addresses 2001:8b0:: are me and I don’t think the IPv4 traffic that may be seen is relevant but 81.187.* will be me - ask if you would like the true IPv4 range, but as I said I don’t think IPv4 is relevant. Unfortunately PPP LCP pings and things are noise polluting the results.

If I am reading this right it doesn’t look very good and it’s all TCP’s fault, as expected, unless I am getting either corruption upstream or packet loss upstream; but I need to read this through carefully not just glance at it.
« Last Edit: October 12, 2019, 01:28:00 AM by Weaver »
Logged

aesmith

  • Kitizen
  • ****
  • Posts: 1216

I've done this in the past and it's quite labour intensive.  One of the techniques I used was to use an export from Wireshark showing timestamp, segment size and Seq and Ack values.  It's quite easy to paste in a formula tracking these to pick up missed segments and the amount of data "in flight" so to speak.  Also instantaneous rate on a packet by packet basis.   You need to include the very start of the TCP connection in the trace, as window scaling is set at that time and can't be determined by reading subsequent packets.

At a cruder level using Wireshark it can flag duplicate Acks and other drop or retransmission indications. However if you look at anything that's traversed the Internet there will be loads of these and they don't necessarily affect throughput.
Logged

ejs

  • Kitizen
  • ****
  • Posts: 2078

The TCP MSS in the upstream direction appears to be 1348, which seems unnecessarily low, is there any reason for it to be set to that?
Logged

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick

The IPv6 MTU is 1408 - artificially low, set by me, because the stupid 3G dongle that I would be using with the Firebrick in a failover situation has a low MTU and I need to be able to have traffic survive a transition to failover mode without getting fragmented by the dongle. I have not got round to doing a test at full MTU though to see what would actually happen. I have to use proto 41 tunnelling for IPv6 in 3G failover mode and at the moment as things are it does seem to work beautifully. I ought to do a proper test that involves an already-established IPv6 TCP connection going into failover mode.
Logged

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick

What file format can I get wireshark to accept if feeding a captured traffic file into it ? I have a .pcap format file attached to the earlier post.

Sounds as if wireshark would be able to help with interpretation.

What is so weird is that the performance with these same speed testers hasn’t always been bad. I got 1.56Mbps not 1.25-1.3 with the https://speedtester2.aa.net.uk on 2019-07-31. I just don’t understand how it can vary so much when the sync rates haven’t varied - well in that case, they have varied, to tell the truth, but in the opposite direction! The lowest sync rates are now much higher than they were back then

Back then, at 96.5% modem loading factor the expected total IP PDU throughput was 1640384 bps (or 1699880 at 100% mlf)
which means the measured performance is 95.10% of the theoretical maximum, which is incredibly good because we have to take some off for the overhead of TCP and IP headers, so it is roughly perfection. (The rate used for egress limiting = modem loading factor * protocol efficiency factor (i.e. allowance for bloat, additional bytes) * sync rate)

= Current situation : =

The Firebrick's internet access links have the following _upstream speeds_ (expressed as IP PDU rates) right now:
  #1: 542777 bps
  #2: 459596 bps
  #3: 416745 bps
  #4: 421786 bps
Total combined rate: 1.840904 Mbps

- These are the data rates used by the Firebrick on each link, expressed as IP PDU rates. They are always well below the upstream sync rate of each link because overheads due to protocol layers below L3 have been accounted for. The reduced upstream rate given here is the theoretical maximum calculated for each link in the most optimistic possible scenario, based on sending a large packet of a certain chosen size. A further reduction, the so-called 'modem loading factor' has also been applied too. This reduction is based on experience and it has been tuned to get best upstream performance whilst avoiding overloading the modems.

Modem loading factor of 95% was used on each link.

Fractional speed contributions:
  #1: 29.484%  [█████████████ ‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒]
  #2: 24.966%  [███████████ ‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒]
  #3: 22.638%  [██████████ ‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒]
  #4: 22.912%  [██████████ ‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒]




I tried an experiment, set all the rates to the slowest rate above. Only got 1.28Mbps figure from the speedtest2.aa.net.uk. So having uneven egress rates doesn’t seem to be the cause. However, that doesn’t mean that the arrival times of the packets will be ‘right’ to make a stupid TCP receiver happy. I’m assuming that unequal actual line speeds - not egress rates - will still look like bad ‘jitter’ at the receiving end? (Because the ends of the incoming packets will arrive earlier if the link speed is faster, and I am only controlling the times at which the packets are sent, ie controlling the arrival time of the start of the packets, not the end.) I need either a more intelligent receiving TCP in every machine I ever talk to, or more intelligent scheduling of the load splitting so that the arrival time of the end of each packet can be set so as to keep stupid receivers happy ?
« Last Edit: October 14, 2019, 12:14:57 AM by Weaver »
Logged

burakkucat

  • Respected
  • Senior Kitizen
  • *
  • Posts: 38300
  • Over the Rainbow Bridge
    • The ELRepo Project

What file format can I get wireshark to accept if feeding a captured traffic file into it ? I have a .pcap format file attached to the earlier post.

The version of Wireshark that I have available will accept either pcap or pcapng files.
Logged
:cat:  100% Linux and, previously, Unix. Co-founder of the ELRepo Project.

Please consider making a donation to support the running of this site.

burakkucat

  • Respected
  • Senior Kitizen
  • *
  • Posts: 38300
  • Over the Rainbow Bridge
    • The ELRepo Project

I wonder if you are over-thinking the set-up?  :-\

Everything originating from your LAN, with a destination "somewhere on the Internet", will go via your FB2900. Far away from the "Weaving Shed", somewhere in A&A-land, is another Firebrick device. Your Firebrick and the A&A-land Firebrick "talk" to each other via bonded ADSL2 links.

Either those two Firebricks know what to do or are nothing but expensive pieces of junk. Which is correct?  :angel:
Logged
:cat:  100% Linux and, previously, Unix. Co-founder of the ELRepo Project.

Please consider making a donation to support the running of this site.

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick

> over-thinking the set-up
 :)
I’m certainly whining on insufferable and obsessing over it?  ;D It’s ridiculous behaviour. It comes and goes, that’s what drive me crazy.

Check my thinking for me. It’s true that there are two firebricks talking to one another direction as there has to be some device that handles PPP. But as for the ‘two firebricks know what to do’, I would have thought it’s only one that is relevant here as the receiving end just passes on whatever IP packet it receives, without looking at it, no? It’s surely the two TCP communicants that are relevant here, no, if it is TCP that is being used.

If I were using a speed tester that was properly designed, one that measures the speed of the link, not the speed of one particular software implementation of one transport algorithm, say TCP, then I would have nothing to whine and moan about. If say I just sent IP packets - say UDP which were distinctively marked - faster and faster very gradually, until the link started to fail, then that way, I could get the real link speed and I would be measuring what I’m supposed to be measuring. That would be an accurate and speed tester that does what it’s supposed to do. But it wouldn’t be very real-world-relevant and the latter is probably what people want because of the prevalence of the the usage of TCP. I think speed testers should use both methods; an accurate raw link measurement algorithm that is independent of particular protocols and a TCP test with one connection and many simultaneous connections, with both figures quoted.

In the upstream direction though, when the stream of outbound IP packets gets split up by the Firebrick, in the case of TCP if the communicating TCP entities gets timings from incoming packets that are such that the timings are always wrong because the time depends on which pipe the packet traversed, a fast pipe or slow pipe, then that could confuse the behaviour of TCP by putting bad values into the equations that govern TCP. Also things such as duplicate ACKs could slow things down and packets received in the ‘wrong order’ could slow things down too if the TCP entity is unforgiving. That has been my assumption. Sound reasonable ?

The other crazy thing is though that there is no problem in the other direction, downstream is really really efficient and I just don’t understand why, unless it’s just because the software itself behaves differently at the lower but rate or it is because the lines are more closely matched in link speeds in the downstream. (Although I did a test with artificially setting the egress rates to be identical, to the lowest value, although that doesn’t make the link speeds identical though in the sense that the end of a packet will still arrive sooner at a receiver on the faster links and so it doesn’t truly make the links appear identical.)
Logged

Chrysalis

  • Content Team
  • Addicted Kitizen
  • *
  • Posts: 7382
  • VM Gig1 - AAISP L2TP

the sender controls the congestion window which is what controls the transfer rate, that may explain the better behavior for downloads

--edit--

above was quick reply on mobile, but more detail.

Sender has a send window, receiver has a receive window.  The lowest of these two windows acts as an upper limit for the congestion window.  The highest is effectively ignored.
The congestion window controls the transfer rate, and wont grow to a larger size than the above stated limit.
There is different congestion providers which affects things like how quickly it grows and how it recovers from dropped packets.  Selective ack's also is a factor if its enabled or not.  With selective acks it avoids the need for the sender to have to resend a full segment of data if part of it is not acknowledged.  Without selective acks the sender doesnt know which part failed so has to resend the full segment.

Also the consideration the sender probably isnt bonding from multiple interfaces so less packet ordering issues affecting the congestion window as well.
« Last Edit: October 15, 2019, 06:13:41 PM by Chrysalis »
Logged

burakkucat

  • Respected
  • Senior Kitizen
  • *
  • Posts: 38300
  • Over the Rainbow Bridge
    • The ELRepo Project

I guess what I was trying to express is that when some process at the A-end interacts with another process at the B-end then the process is not concerned as to how the A-end and B-end devices communicate with each other.

In your case, which we are considering, the A- and B-end devices are Firebricks. Let's just say that both Firebricks have a LAN-side and a WAN-side to simplify the description. You start a process, LAN-side of the Firebrick in the Weaving Shed. (Perhaps it is a throughput speed test.) Whatever that goes into your Firebrick eventually comes out of the Firebrick in A&A-land. It comes out "LAN-side" and is routed to its destination. (Say an Ookla server.)

Whatever your Firebrick and the A&A-land Firebrick do with however many links there may be between their respective WAN-sides is immaterial to the two LAN-side processes. The Firebricks (between their respective WAN-sides) transport the data stream in a co-ordinated fashion between themselves, ensuring that the egress data stream at the receiving end is identical to the ingress stream at the transmitting end.

If the two Firebricks are unable to perform as I have stated, in the last sentence of the preceding paragraph, then they are expensive pieces of junk. If they are able to perform . . . blah, blah . . . then all is well.

ASCII art.

Weaving Shed LAN <----> Firebrick <--- WAN-side link(s) ---> Firebrick <----> A&A-land "LAN"

      ^                     ^                 ^                   ^                ^
 Your domain           Your domain     Not your concern      A&A domain       A&A domain
 Your concern          Your concern    Not A&A concern       A&A concern      A&A concern
                                       The two respective
                                       Firebrick's concerns
Logged
:cat:  100% Linux and, previously, Unix. Co-founder of the ELRepo Project.

Please consider making a donation to support the running of this site.

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick
Re: Upload performance whine - part 97 - TCP protocol traffic capture analysis
« Reply #10 on: October 16, 2019, 02:42:24 AM »

We are in agreement.  ;D
Logged

niemand

  • Kitizen
  • ****
  • Posts: 1836
Re: Upload performance whine - part 97 - TCP protocol traffic capture analysis
« Reply #11 on: December 31, 2019, 08:01:03 PM »

the sender controls the congestion window which is what controls the transfer rate, that may explain the better behavior for downloads

Reminder the receiver controls flow control and flow control, alongside congestion avoidance, control transfer rate.
Logged
 

anything