Kitz Forum

Broadband Related => ADSL Issues => Topic started by: Weaver on May 22, 2019, 12:32:24 AM

Title: Uneven Latency - Imbalanced Rates
Post by: Weaver on May 22, 2019, 12:32:24 AM
See the following graph of an upload :

Note the upload at 23:00, the dark red line. The bright green (max latency) bright blue (mean latency) and dark blue (min latency) below are way up, should be like the other ones

The latency on line #3 is massive, especially so compared with the others. I am wondering what I should do to balance them out?


My queuing model spreadsheet says (very roughly) that at this bit-rate and run at 90% of 0.884434 of sync rate, there should be minimum of ~2 full-size, 1500-byte packets in the tx queue ahead of you when you want to transmit, this is corresponding to the min latency figure, a mean of ~5 packets and ~8.8 packets max. So thatís the tx queuing time that is getting added to the natural RTT from when it gets into the tx modem proper. (Poisson dist model.)

There could be queuing in the modem or in my Firebrick router, but a personal communication from RevK has ruled the latter out. He told me that the Firebrick (unfortunately imho) doesnít maintain egress queues, it just ships everything immediately (or else drops it, presumably, according to the rate limitersí state). So if my understanding is correct, it must be overdriving that modem, and the queue in question is at the modem. Is this idea correct?

Those numbers are only approximate but the model is not that Ďdelicateí or Ďsensitiveí. Iím not at all sure I have the rates quite right. The problem might be that I somehow had the rates set wrong anyway, and that would make my spreadsheet a bit off, but not massively.

Real combined throughput upstream has dropped by 16% compared to previous tests, 1.26Mbps as compared with 1.5Mbps. (As measured by (

[Moderator edited to concatenate the first three posts of this topic into one.]
Title: Re: Uneven Latency - Imbalanced Rates
Post by: Chrysalis on May 22, 2019, 02:42:42 AM
In linux there is multiple places where packets can queue, e.g. on the interface but also within the networking stack, and on the shaper if shaping is enabled.

The way to ensure that the modem isnt queuing is really to have something else the bottleneck in front of it.  There is kernal tunables which might help, but not sure if they apply for bridge mode operation.  They also will need reapplying on every modem reboot.

as shell commands to view current sizes.

'cat /proc/sys/net/core/netdev_max_backlog'
also 'ifconfig' to check interface queues, but my zyxel defaults to a 0 queue length.

Look for a line like this in the ifconfig output, will be one for each interface.  'txqueuelen:0'

Just be aware tho generally jitter, higher latency is preferable to dropped packets.   If you shaping, you can choose which packets get dropped and then have them only drop on the low importance streams only.
Title: Re: Uneven Latency - Imbalanced Rates
Post by: Weaver on May 22, 2019, 06:39:16 AM
My question is why that line and not others? Which way have things gone wrong do you think?
Title: Re: Uneven Latency - Imbalanced Rates
Post by: Chrysalis on May 22, 2019, 12:25:09 PM
line 3 has the lowest sync rate?
Title: Re: Uneven Latency - Imbalanced Rates
Post by: Weaver on May 22, 2019, 12:51:45 PM
Yes, line three has the lowest upstream sync rate. Downstream is very good however. That seems to be part of it. But all was ok a few weeks ago, when line three had a much lower sync rate. Now the line 3 upstream SNRM has been lowered with the effect that the sync rate of line 3 is much closer to the others.

In the Statsfest thread earlier (,23401.msg396707.html#msg396707), there is a complete set of stats and when that was taken there was no problem. Since then I have lost 250kbps (of TCP-payload throughput? - who knows) from the combined effective upstream speed reported by which was 1.5Mbps before.

Right now line 3 is set at tx rate (IP PDUs) of 333520 bps, which is hopefully 90% of 0.884434 * 419kbps sync rate. The other modems have sync rates of 499 kbps or more.

Before that, the egress rate was much lower, also at 90% but of a low upstream sync rate because the upstream SNRM was way above 9dB.

Current sync rates:
    Link #1: down 2803 kbps, up 525 kbps
    Link #2: down 2843 kbps, up 499 kbps
    Link #3: down 2916 kbps, up 419 kbps
    Link #4: down 3118 kbps, up 499 kbps

Firebrick upstream rate limiters' tx speeds ::
    #1: 441111 bps
    #2: 419265 bps
    #3: 333520 bps
    #4: 419265 bps

Total combined rate: 1.613161 Mbps

Fractional speed contributions:
    #1: 27.345%   [██████████████ ‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒]
    #2: 25.990%   [█████████████ ‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒]
    #3: 20.675%   [██████████ ‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒]
    #4: 25.990%   [█████████████ ‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒]

Firebrickís upstream links : modem loading factors ::
  Link #1: 95%
  Link #2: 95%
  Link #3: 90%
  Link #4: 95%
Title: Re: Uneven Latency - Imbalanced Rates
Post by: Weaver on May 22, 2019, 01:13:12 PM
Line 3 current stats today below. Something has gone wrong and the line 3 upstream SNRM is actually right down to 3.2 dB from a 6dB target. I thought it was supposed to be 9dB target according to the AA clueless server. This problem has arisen, I think, because I commanded the modem to restart the link using a CLI command and it was done at a time when the up-down square-wave SNRM was in the high=good state/part of the day after which it later dropped to the bad/low state, down from 6dB. This was why I had been running it at a 9dB upstream target, so that it would have 9dB from which to drop to around 5dB during the bad part of the day.

Am I reading the following correctly? Looks like I am getting away with it, with the low upstream SNRM. Normally I would expect problems, based on experience from history.

But I cannot see any reason why line 3 upstream would be slow. I would expect to find evidence that it actually cannot keep up u/s in reality and is slower u/s than its upstream sync rate would suggest.

xdslctl: ADSL driver and PHY status
Status: Showtime
Last Retrain Reason:   8000
Last initialization procedure status:   0
Max:   Upstream rate = 241 Kbps, Downstream rate = 3260 Kbps
Bearer:   0, Upstream rate = 419 Kbps, Downstream rate = 2916 Kbps

Link Power State:   L0
Mode:         ADSL2 Annex A
TPS-TC:         ATM Mode(0x0)
Trellis:      U:ON /D:ON
Line Status:      No Defect
Training Status:   Showtime
      Down      Up
SNR (dB):    3.3       3.2
Attn(dB):    65.0       41.1
Pwr(dBm):    18.2       12.4

         ADSL2 framing
         Bearer 0
MSGc:      52      12
B:      33      5
M:      4      16
T:      3      8
R:      10      16
S:      1.4673   7.1680
L:      796      125
D:      2      4

         Bearer 0
SF:      17702971   994832
SFErr:   109      12
RS:      770079254      4119412
RSCorr:      6822      188468
RSUnCorr:   281      0

ReXmt:      10706      0
ReXmtCorr:   10158      0
ReXmtUnCorr:   306      0

         Bearer 0
HEC:      454      15
OCD:      3      0
LCD:      3      0
Total Cells:   1956678951      281504381
Data Cells:   32954672      8233824
Drop Cells:   0
Bit Errors:   12591      1290

ES:      36      11
SES:      13      0
UAS:      130      119
AS:      284479

         Bearer 0
INP:      26.00      2.00
INPRein:   0.00      0.00
delay:      8      7
PER:      15.95      16.12
OR:      29.07      8.92
AgR:      2932.65   426.90

Bitswap:   60695/60765      6595/6595

Total time = 3 days 11 hours 23 min 44 sec
FEC:      7380      188468
CRC:      579      12
ES:      36      11
SES:      13      0
UAS:      130      119
LOS:      1      0
LOF:      9      0
LOM:      0      0
Latest 15 minutes time = 8 min 44 sec
FEC:      1      469
CRC:      0      0
ES:      0      0
SES:      0      0
UAS:      0      0
LOS:      0      0
LOF:      0      0
LOM:      0      0
Previous 15 minutes time = 15 min 0 sec
FEC:      7      775
CRC:      0      0
ES:      0      0
SES:      0      0
UAS:      0      0
LOS:      0      0
LOF:      0      0
LOM:      0      0
Latest 1 day time = 11 hours 23 min 44 sec
FEC:      399      14691
CRC:      0      1
ES:      0      1
SES:      0      0
UAS:      0      0
LOS:      0      0
LOF:      0      0
LOM:      0      0
Previous 1 day time = 24 hours 0 sec
FEC:      1690      48967
CRC:      1      5
ES:      1      5
SES:      0      0
UAS:      0      0
LOS:      0      0
LOF:      0      0
LOM:      0      0
Since Link time = 3 days 7 hours 1 min 17 sec
FEC:      6822      188468
CRC:      109      12
ES:      25      11
SES:      2      0
UAS:      0      0
LOS:      0      0
LOF:      0      0
LOM:      0      0
NTR: mipsCntAtNtr=0 ncoCntAtNtr=0
Title: Re: Uneven Latency - Imbalanced Rates
Post by: Weaver on May 22, 2019, 11:40:44 PM
Complete stats for all lines attached, in case this helps. I canít think straight - in which direction should I line 3ís rate, up or down?
Title: Re: Uneven Latency - Imbalanced Rates
Post by: Weaver on May 23, 2019, 01:17:29 AM
I did a brief experiment with line three set at 80% modem load factor instead of 90%. The results seem encouraging, in that an upload speed test with gave 1.39 Mbps upstream at 80% instead of 1.26Mbps before at 90%. Those numbers are very accurate as I did a huge number of tests. The figure quoted is max, not arithmetic mean.

All very weird. You slow it down and it speeds it up.

I havenít done a long enough test yet to see whatís what really clearly on the CQM graphs but I will do so later.
Title: Re: Uneven Latency - Imbalanced Rates
Post by: Weaver on May 23, 2019, 03:55:49 AM
By the way, whatís the noise spike on all modems at bins ~125 - 127 ?
Title: Re: Uneven Latency - Imbalanced Rates
Post by: Weaver on May 23, 2019, 05:30:14 AM
A result. After a lot of speed testing, I had a weirder breakthrough still. As far as I can tell, going all the way down to 70% is the answer for maximum throughput. The reports 1.50 Mbps upstream combined when set at the following -
Firebrickís upstream links : modem loading factors ::
  Line #1: 95%
  Line #2: 95%
  Line #3: 70%
  Line #4: 95%

So thatís the route back to full speed. But I have no idea what is going on here.

After having reduced the L3 modem loading factor (perhaps at 80% though, not down to 70% yet), I did manage to successfully level out the latency figures so that they were much more nearly balanced. For example comparing line 2 and the problem line 3 during a sustained flat out upload I read figures of
    Min    42.9
    Avg    112.7
    Max    161.6
    Min    34.5
    Avg    57.6
    Max    95.3

Before this, the L3 latency figures were just really a bit uncomfortably high. I would rather not have things set up such that all other tasks - such as routine web browsing or DNS lookups - are a complete pain if during any upload. And certainly not if there isnít even any good reason why I absolutely need to for upstream performance.

So current state of play now:
=== * Current upstream speed rates * ===

The Firebrick's internet access links have the following _upstream speeds_ (expressed as IP PDU rates) right now:
    #1: 441111 bps
    #2: 419265 bps
    #3: 259404 bps
    #4: 419265 bps
Total combined rate: 1.539045 Mbps

- These are the data rates used by the Firebrick on each link, expressed as IP PDU rates. These figures are always below the 'upstream sync rate' of each modem because overheads due to protocol layers below L3 have been accounted for. The reduced rate given here is the theoretical maximum calculated for each modem in the most optimistic possible scenario, based on sending a large packet, of a certain chosen size, upstream. In addition, a second reduction, the so-called 'modem loading factor' has been applied too. This reduction is based on experience and it has been tuned to get best upstream performance whilst avoiding overloading the modems.

Fractional speed contributions:
    #1: 28.661%   [██████████████ ‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒]
    #2: 27.242%   [██████████████ ‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒]
    #3: 16.855%   [████████ ‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒]
    #4: 27.242%   [██████████████ ‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒‒]
Title: Re: Uneven Latency - Imbalanced Rates
Post by: burakkucat on May 23, 2019, 03:06:24 PM
By the way, whatís the noise spike on all modems at bins ~125 - 127 ?

That's 543 to 544 kHz. I'm not aware of any transmitter operating at that frequency.  ???
Title: Re: Uneven Latency - Imbalanced Rates
Post by: Weaver on May 23, 2019, 06:13:42 PM
I did a search and I saw an old thread mentioning a radio station in Ireland but it sounds like that one should be too far away and itís not incredibly powerful. That spike is what 10-12dB high and it has quite a broad base spread - what looks like the result of white noise intermodulating with the core signal? Is that right, about its shape? Like a resonance.

I was saying that I donít recall that spike being there before, but I found it in an old noise graph. See attached spreadsheet (in Apple iOS Numbers format and in Excel format, zipped up).

Last summerís graph shows there was a zero bit-loading on tone 125, instead of 3 bits. Despite the title of that document it shows noise and bit loading both. It seems to be a comparison of some sort, forgotten what vs whatÖ
Title: Re: Uneven Latency - Imbalanced Rates
Post by: Weaver on June 12, 2019, 10:45:40 AM
Recap: Iíve defined Ďmodem loading factorí as the upstream speed limiter IP PDU rate set by the Firebrick expressed as a fraction of the theoretical maximum possible IP PDU rate when protocol overheads are removed from the sync rate, and I am taking the latter to be 0.884434 * sync rate. It turns out that a modem loading factor of 100% is not a good idea, and possibly not even workable, as the theoretical maximum is calculated solely on the basis of protocol overheads, total bytes-bloat, and there may be other limiting factors that I am unaware of. Iím using 97%, based on experiment, as a practical maximum, because any higher and the latency figures go through the roof, and Iím taking this to be evidence of ingress queuing at the modem. It is undesirable to have very high latency figures anyway so there is no way I would want to get those last few percentage points of throughout even if there actually is extra true throughout to be had from pushing it harder and harder towards 100%.

Iíve now successfully completed a program to assign modem loading factors to all links dynamically based on the relative sync rates. If one modem is substantially slower than the others, that link is assigned an MLF of 70%, a number chosen by experiment and 95% fir all the others. If the sync rates are all the same then itís MLF=95% for all. If one modem is only a little slower than the others then the MLF is set dynamically according to the size of the sync rate gap, where the sync rate gap is defined as the distance between the normalised speed of the slowest link and the normalised value of the average of the speeds of the rest of the links, where all sync rates are pre-normalised, rescaled, such that 1.0=the fastest lineís sync rate.

I understand so little about what is going on here and I just donít know why the particular set of parameters that has been discovered experimentally is the right one. The optimal recipe discovered amounts to making it go slower [!] by rate limiter enforcement at a lower figure and yet this has the effect of making the overall thing faster in total effective throughout terms. What the blinky, blonky blimey.

So the code will now do the right thing if (i) a different line becomes the slowest one, or (ii) the lines have the same speed. Case (iii) the lines have nearly the same speeds, so the slowest one is only slightly slower than the rest, has not been tested and it is not known what the right strategy is. In case (iii), the closer the minimum speed lineís sync rate is to that of the others, the nearer the current implementation sets the MLF of the slowest line, and the value goes up until it approaches the MLF value used for all the other lines.

The chosen function that maps gap size to MLF is as follows: it has a linear region, outside which there is a clamp on output maximum and minimum MLF values at MLF=0.70 min and MLF=0.97 maximum. So it looks something like
   y = MLF = f(x)
   MLF = max( clamp_min, min( clamp_max, m * x + c ) )
where x is the normalised sync rate gap size and clamp_min, clamp_max, m and c are constants.

others = { sync_rates[] (  arg min ( sync_rates[] ) }
gap = avg( others[] ) - min( sync_rates[] )
gap_normalised = gap / max( sync_rates[] )

It turned out to be more convenient to calculate others using
    avg( others ) = ( sum( sync_rates ) - min( sync_rates) ) / ( n( sync_rates ) - 1 )