Kitz ADSL Broadband Information
adsl spacer  
Support this site
Home Broadband ISPs Tech Routers Wiki Forum
 
     
   Compare ISP   Rate your ISP
 
Please login or register.

Login with username, password and session length
Advanced search  

News:

Author Topic: Upstream performance (again) - either Firebrick packet scheduling or line 3  (Read 290 times)

Weaver

  • Kitizen
  • ****
  • Posts: 3858
  • Retd sw dev; A&A; 3 × 7km ADSL2; IPv6; Firebrick

This has been discussed in earlier threads-maybe I could find some of them. All comments in the following apply only to upstream.

As far as I can see it, I either have a Firebrick problem or a problem with upstream performance on line 3. The multi-line throughput on upstream of my firebrick appears really bad at only 77% of the expected three-line combined value after taking out all overheads due to protocol bloat such as AAL5, ATM and so on.

This has held true despite every kind of tweak I can think of and days wasted fiddling around with parameters in the config file because you are supposed to be able to rate-limit each sub-pipe to each individual PPPoE modem. If you don't want to, you just want it to do the right thing and run each line as fast as it can go, scheduling packets correctly according to the differences in line speeds between the pipes the I think you should be able to do that. I don't know what you are supposed to do if you just want intelligent maximised performance with no configuration.

I've tried simply setting all the rate limits to huge values to see if that is good. It doesn't break things, but it doesn't get you more performance than using explicit numbers that look plausible and are derived from reported DSL sync rates by some sort of guesswork. I'm assuming that the rates you are supposed to put in the config file are actual bit rates that are derived from L3 PDUs (= [eg IP] headers and payload) by ATM bitrate for PPPoEoA = L3_PDU_Bitrate / 0.85, where the latter fudge factor is for ATM overhead in expanding 1500 bytes of IP PDU into 33 ATM cells worth of bytes = 33 * 53 bytes. But fiddling about with the factor used never gets me any more throughput: pushing it harder doesn't help, although I assume that having a really low number means that I could succeed in crippling it. That is all old news.

* But there's something new: because of this week’s outage I've discovered that the speedtester-reported upstream throughput with three lines is only about ~15% better than it is with two lines active. Note that with only two lines up - cwcc@a.1 and @a.4 - the measured performance is 100% of that predicted! So could this be an n_lines == 3 thing? Or n_lines > 2? Or else it's that cwcc@a.3 is bad.

Now a word of warning: this could be to do with the behaviour of the speed tester. Perhaps it simply doesn't like something about the three line ‘packet pattern’, is stupid and doesn't cope with out-of-order packet delivery or something. If this is a general, representative effect, and normal servers taking upload from the usual kind of single TCP connection will be unhappy with the particular behaviour then this is not ideal and perhaps something that could be improved in the Firebrick’s scheduler to make it more per-TCP connection friendly, or in general more 'flow aware'. (Using the word flow as in the sense in which it is used in discussing the IPv6 flow label: meaning multiple six-tuples of addresses + L4 ports + IP protocol + IP version.) This might involve timing to order the arrival times correctly by considering line rates and packet lengths and choosing packet departure times appropriately, while of course trying to keep everything busy. And if multiple upstream flows are in use then it makes the scheduling a lot easier because it can try and assign packets to the relevant modems according to the flow they belong to. I suppose that it might be interesting trying the two-line vs three-line ratio thing with a different speed tester.

Now in case it isn't a server-end ordering sensitivity thing, but a real case of bubbles in some of the pipes, then that would be a scheduler bug. I could try and look at the utilisation of each link using clueless. Reading it off a packet capture would be confusing but I suppose that would be the hard-core option.

Really I need a test method where I do a really big long upload to be able to look at it in clueless, and then I could simply time a flat out file upload that uses TCP, say. The duration of the tests speedtesters run are way way too short to be usable with inspections on clueless which only takes a snapshot every n seconds, where n is far too large. That way I could check in case it is ignoring an entire modem.
Logged

Weaver

  • Kitizen
  • ****
  • Posts: 3858
  • Retd sw dev; A&A; 3 × 7km ADSL2; IPv6; Firebrick

What's the easiest way of getting a big upload going so I can time it? I could fire up an ftp server. I'm thinking of something even lazier.
Logged

Chrysalis

  • Content Team
  • Kitizen
  • *
  • Posts: 4256

I have only messed with bonding once and that was me bonding multiple gigabit uplinks together on a server.  I found it problematic not only in performance but also I had to be careful not to compromise stability, I have certainly never tried it at home.

My gut feeling is you have a packet ordering issue, I expect the higher amount of lines been bonded would need a higher tolerance to packets been out of order.  If the configured tolerance is exceeded then retransmits will occur.

Have you ever asked these types of questions in the aaisp irc channel where you might find other people with similar setups?
Logged
Sky Fiber Pro - Billion 8800NL bridge & PFSense BOX running PFSense 2.4 - ECI Cab

Weaver

  • Kitizen
  • ****
  • Posts: 3858
  • Retd sw dev; A&A; 3 × 7km ADSL2; IPv6; Firebrick

No, I haven't asked about the upstream thing. I just thought that I was asking too much out of it. But recently I realised how many odd things there are. Firstly that two channels work well, and the third doesn't, unless that is just a bad pipe and I should test that next.

Secondly, downstream works superbly well, so it is not that AA's routers are no good, as I'm assuming it is the exact same software just in a 6000-series Firebrick employed in the downstream direction.

Third thing is that if there is some kind of ordering phenomenon -  then only with three links, not two - then the effects would presumably depend on the design of the tester or its TCP stack if it is using TCP. Goals for a tester could either be : firstly, realism - be like normal life, so use TCP and perhaps no cheating by using multiple TCP streams to check if a bit more oomph can be squeezed out of the pipe. Or secondly : measure the link, not the software design at the tester, so try and remove all software-design-dependant aspects, so do not use TCP, just keep firing more and more packets down the link until you can't push any more through. Anyway, in this latter case, you might expect different figures depending on TCP vs non-TCP, single vs multiple connections or good stacks that tolerate out-of-order arrival well vs stacks that perform badly.

I thought that networks were supposed to handle out of order packets, but maybe the only do it they don't dig it. Perhaps no-one reacts well, I don't know. Anyway, accusing the FB 2x00 schedulers of causing problems doesn't make sense given that downstream does well, unless it is something to do with having some server o/s as a sulky receiver for upstream tests and in my case Apple iOS or the iOS tester app as a happy receiver for the downstream tests.

All very confusing.
Logged

Chrysalis

  • Content Team
  • Kitizen
  • *
  • Posts: 4256

Out of order packets can be tolerated but only to a limited amount.  Both linux and FreeBSD have tunables to make it more leniant, I am unaware of such a setting on windows.

Really you need to ask either aaisp themselves (since its their hardware doing the bonding) or in the aaisp irc channel.  As the amount of people bonding 3 lines at home is probably a really tiny number and especially on firebrick units.

Think about it, if the out of order packets was a really high limit, then the network stack has to wait XXX amount of packets before it can mark a segment as missing meaning re transmissions and so forth will be delayed.

Also I think the tunables only affect ingress traffic, so you sort of at the mercy of the recipient as to how well upstream bonding works, if the recipient only tolerates low numbers of out of order packets then you will have to retransmit when its excessive and hence get lower throughput.
« Last Edit: March 24, 2017, 02:11:59 PM by Chrysalis »
Logged
Sky Fiber Pro - Billion 8800NL bridge & PFSense BOX running PFSense 2.4 - ECI Cab

Weaver

  • Kitizen
  • ****
  • Posts: 3858
  • Retd sw dev; A&A; 3 × 7km ADSL2; IPv6; Firebrick

Thing is, I get no performance loss on downstream, so iOS for example and any other client-end receiving o/s would have to be out-of-order friendly, and the receiving speed-tester system would have to be very non-friendly and I would have to be generating an out-of-order sequence (bad scheduling). What is perplexing is the loss of performance that is not consistent, i.e. upstream-only. However perhaps some system tolerates two links but not three.
Logged