Kitz Forum

Broadband Related => ADSL Issues => Topic started by: aesmith on January 23, 2016, 05:41:56 PM

Title: Why does DLM slow down an erroring line?
Post by: aesmith on January 23, 2016, 05:41:56 PM: I'm battling a rotten line at the moment, so experiencing DLM sanctions first hand. The thing that occurred to me, is to wonder what's actually achieved by slowing down a line with a high error rate. In general high level protocols are pretty good at handling errors so this slow down doesn't help end user response times or speeds. Is it done to preserve BTW backhaul bandwidth, to reduce numbers of packets retransmitted?

To put some figures in place, on a 6dB target we connected at just under 4.5meg, error rate of around 30-40 CRC per minute, speed test download speed of 3.3meg. On 12dB target sync speed is just under 3meg, 3-5 CRCs per minute, download speed 1.9meg.

Tony S
Title: Re: Why does DLM slow down an erroring line?
Post by: Chrysalis on January 23, 2016, 07:34:05 PM: The answer is in the data, your CRC rate has plummeted as your line now has a higher noise margin (which is a buffer against problems).
Title: Re: Why does DLM slow down an erroring line?
Post by: ejs on January 23, 2016, 08:39:14 PM: The thing is, the 30-40 CRC per minute isn't really that bad for normal web browsing and downloading, it would hardly affect that kind of usage at all. There's not really much point in having fewer CRC errors just for the sake of having fewer CRC errors.
Title: Re: Why does DLM slow down an erroring line?
Post by: Chrysalis on January 23, 2016, 08:42:34 PM: The important data is the ES rather than the CRC regarding DLM.

Do you have the ES data at hand for both situations.

Also SES.

30-40 CRC per minute is going to be approx 25920-34560 CRC per day, which to me is on the high side.

There would be 2 ways to interpret that data, if it was averaging say 1-5 CRC per ES then I agree probably not noticeable impact, but that would also have a very high ES easily exceeding DLM limits, or of it was perhaps 1000ES a day tied in with 30000CRC a day then thats on average 30 CRC per ES which is a lot.
Title: Re: Why does DLM slow down an erroring line?
Post by: ejs on January 23, 2016, 09:01:20 PM: Well, yeah, 18 or more CRCs would make it a SES.
Title: Re: Why does DLM slow down an erroring line?
Post by: Black Sheep on January 24, 2016, 08:19:13 AM: I'm certain I've read info somewhere quite recently, that stipulated 10 ES = 1 SES ??.
Title: Re: Why does DLM slow down an erroring line?
Post by: ejs on January 24, 2016, 08:38:10 AM: I'd like to know where that's from, because in a way, it's nonsense, you can't have 10 seconds in 1 second.

There are definitions of Errored second and Severely errored second in the ITU-T G.997.1 PDF, 7.2.1.1.2 and 7.2.1.1.3. That's where I got the 18 or more CRCs from.

Something that makes more sense would be if it was about the DLM considering 1 SES as bad as 10 ES. Or it might be about classing a problem as REIN or SHINE based on the ratio of ES to SES.
Title: Re: Why does DLM slow down an erroring line?
Post by: Black Sheep on January 24, 2016, 09:01:13 AM: Found it. It's from paper documentation sent to engineers who underwent a swap on their JDSU Hand-Held Testers, from an Infineon Chipset to a Broadcom one.

As with all things sent to us, the authors have to take into account the fact that their audience consists of new starters through to seasoned engineers, so they put quite a lot into the documentation. One such example is this 'Chipset update' info which has a page that covers 'Common error-code abbreviations' , and a one-line explanation of what it is when viewed on the JDSU.

Written here word-for-word .............. SES - Severely Errored Seconds - after 10secs of ES - SES are counted.

So you are right, I got the wording wrong from memory. It isn't 10 ES = 1 SES ...... its 10 seconds of ES.
Title: Re: Why does DLM slow down an erroring line?
Post by: aesmith on January 24, 2016, 10:00:09 AM: Just for fun I've dug out the statistics for three of the steps ..

Target NM Synch FEC/Min CRC/Min ES/Hour 6dB 4256 2000 10-40 700 9dB 3680 1000 10-40 600 12dB 3104 1000 0-5 100

These are as reported by 582N router via DSLstats software. It's interesting to see that the first speed reduction made essentially no change to uncorrected errors, then the second change made no difference to FECs. No SES ever recorded, and no resynchs other than by DLM resetting NM.

I don't have data for 15dB because once we were knocked back to 12dB I swapped to a router where I can adjust noise margin, giving me a more useful connection at around 3.5meg (just fast enough for the 3.0meg profile), and I can hang onto that even when DLM is against the stops with a 15dB target as it has been for the last few days.

Edited to add that I get virtually no packets lost on my PRTG ping monitor. That suggests that although those error numbers are high in terms of count, they are in fact only a tiny fraction of the frames (cells?) passed without error.
Title: Re: Why does DLM slow down an erroring line?
Post by: Chrysalis on January 25, 2016, 12:57:46 AM: 700 ES hour is a lot and no surprise DLM took action.

Even 100 an hour is still high (by DLM standards) and gives approx 2400 per day.
Title: Re: Why does DLM slow down an erroring line?
Post by: aesmith on January 25, 2016, 08:58:19 AM: Cheers. I more or less understand the DLM triggers, what I was musing about was whether anyone or anything is expected to benefit from these DLM invoked slow downs. I could understand if a line is losing synch, as for some users a stable connection at low speed might be better than a fast line with random interruptions. In the case of just errors, no resets, the effect on the end user is simply a slower link.
Title: Re: Why does DLM slow down an erroring line?
Post by: Weaver on January 26, 2016, 06:55:34 AM: > in general high level protocols are pretty good at handling errors

The thing is, BT can not assume the existence of say TCP or SCTP. If the user is watching IP broadcast telly, then dropping a packet is bad news. I don't know anything about VoIP, but that may be another important case.

I think BT's choices are biased quite rightly in favour of the cases where TCP won't save the day as these opportunities in 'X+1 play' are the justification for predictions of greater revenue opportunities than just email and web.
Title: Re: Why does DLM slow down an erroring line?
Post by: Weaver on January 26, 2016, 07:07:13 AM: > what's actually achieved by slowing down a line with a high error rate.

The intention is not to slow the line down. (In ADSL 1 20CN typically the data tx rate does not actually decrease for small sync rate decreases because the true tx rate is in fact lower than the effective value computed from the sync rate when overheads are removed)

The line is slowed down when the overheads increase or when the target SNRM is increased. In former case, extra overhead is there to guard against errors by allowing recover by computing the true data sent. In the latter case the increased SNRM allows for a greater amount of bitswap to be performed to protect against unfortunate changes in the noise pattern vs bins. The greater SNRM is achieved by having lower bits per bin values than strictly necessary, but the extra bits can be sent by bitswap to a bin in crisis.
Title: Re: Why does DLM slow down an erroring line?
Post by: Chrysalis on January 26, 2016, 04:04:39 PM: There is also that DLM is not intelligent enough to know how the connection is affected, so they have resorted to using thresholds over a 24 hour period.

I dont like DLM but since BT dont want to micro manage lines its what we are stuck with on FTTC.
Title: Re: Why does DLM slow down an erroring line?
Post by: aesmith on January 27, 2016, 07:19:01 PM: It looks like it might be a while before OR complete whatever they need to do on our line.

Meanwhile our DSL has been knocked back by DLM to a 15dB target NM. Is that the worst that it can do? I'm running a bit faster by tweaking the router, and since the situation may drag on I was considering speeding up further to see if I can get the profile up another notch. However there's no point in doing this if it just provokes DLM to knock us down to 18dB target or to make some other slowdown action. So is 15dB the limit, or does it have more slow-down's up it's sleave? This is ADSL Max on 20CN.
Title: Re: Why does DLM slow down an erroring line?
Post by: ejs on January 27, 2016, 07:25:12 PM: 15dB is the limit, and I don't think there's anything else it can do, no banding on 20CN.
Title: Re: Why does DLM slow down an erroring line?
Post by: aesmith on January 27, 2016, 07:53:26 PM: Thanks, will see how far I can rev it up tomorrow. If I get get over 4000K I should get the 3.5meg profile.
Title: Re: Why does DLM slow down an erroring line?
Post by: sevenlayermuddle on January 27, 2016, 09:19:48 PM: Getting back to the original question, DLM does not intend to 'slow down a line'.

Its aim is to reduce the rate of errors, which is achieved by raising the SNR margin. A side-effect of the increased margin is that the modem will connect at a lower speed.

One reason it wants to reduce errors is that errors cause retransmissions, which can cause a delay noticeable as a pause or glitch, that is problematic for the user. Also, TCP retransmissions are end to end which increases traffic on the entire network, potentially leading to unnecessary congestion and reduced bandwidths for everybody else.
Title: Re: Why does DLM slow down an erroring line?
Post by: sevenlayermuddle on January 27, 2016, 09:40:21 PM: Incidentally, tweaking the router is likely to be counter-productive, as it will lead to a higher error rate, and in turn to further punitive action by DLM.

One problem is that DLM includes a kind of hysteresis in its algorithms, such that the error rate to trigger a reduction has to be much lower than the error rate that triggered the increase.

What can sometimes help, counter-intuitive as it may seem, is to 'adversely' tweak the router so you get an even slower connection speed, but the error rates are also much lower. With luck, DLM may interpret that ultra-low error rate as cause to reduce the target. Even so, it can still take forever. :(
Title: Re: Why does DLM slow down an erroring line?
Post by: aesmith on January 28, 2016, 12:44:16 PM: Quote from: sevenlayermuddle on January 27, 2016, 09:40:21 PM
Incidentally, tweaking the router is likely to be counter-productive, as it will lead to a higher error rate, and in turn to further punitive action by DLM.

That's why I was asking whether it had any more nasties up it's sleeve. If not, we're on 15dB target already so it doesn't appear there's any downside to tweaking. It's been running like that since last Thursday and gives us just over 50% more application level throughput than leaving it slowed down even to the 12dB target.

Next update from OR is due Monday so fingers crossed the underlying fault will eventually be looked at. On the other hand I had a call yesterday direct from Openreach asking me to confirm that the fault was cleared, the guy seemed quite taken aback when I said not, and he couldn't find any record of any further work being planned after the visit on the 18th. In fact he didn't even have notes from that visit, and asked me whether it had in fact taken place. I'm hoping the Plusnet have the correct information, and this caller was the one with the wrong end of the stick.
Title: Re: Why does DLM slow down an erroring line?
Post by: WWWombat on February 06, 2016, 10:06:20 PM: On SES:
My understanding was that a SES was a second-long period where the failure count (ie CRC errors) was really bad: 30% of all blocks.

Then, when a modem encounters 10 consecutive seconds that were all SES, the link is considered unavailable. The UAS counters then starts up.

My understanding of these terms predates ADSL. These terms were common in the old (but still digital) transmission mechanisms I worked with in the eighties, though they measured slightly different underlying failures.

On the reason for taking action on an erroring line.
The principle is purely to reduce errors, on the basis these are not helpful.

For UDP protocols, there is no protection. For most of us, that means anything streamed is lossy ... and too many errors make for audible and visible artifacts in video. Not a contender for "broadcast quality".

TCP protocols can cope with packet loss, but at the expense of slowdowns. See figure 2
https://blog.thousandeyes.com/a-very-simple-model-for-tcp-throughput/

What would you prefer? A 10% drop in line speed, or a 50%+ drop in throughput?
Title: Re: Why does DLM slow down an erroring line?
Post by: ejs on February 07, 2016, 07:48:26 AM: It seems like everyone is just trotting out the standard theory which states some tiny number of errors will be horrifically bad.

Quote from: WWWombat on February 06, 2016, 10:06:20 PM
What would you prefer? A 10% drop in line speed, or a 50%+ drop in throughput?

Quote from: aesmith on January 28, 2016, 12:44:16 PM
If not, we're on 15dB target already so it doesn't appear there's any downside to tweaking. It's been running like that since last Thursday and gives us just over 50% more application level throughput than leaving it slowed down even to the 12dB target.

Quite a lot of web based streaming can also be done over TCP, many sites will have a range of different protocols available, pre-recorded video often uses RTMP (TCP) or just HTTP.

The trouble is, the DLM will take action against ES counts that have minimal impact on web browsing or even bulk downloads.
Title: Re: Why does DLM slow down an erroring line?
Post by: aesmith on February 07, 2016, 09:38:32 AM: I was trying to think why real world performance doesn't follow that theory. To recap here are the error stats with the rough real-world speed test throughputs added ..
Target NM Synch FEC/Min CRC/Min ES/Hour Throughput/Meg 6dB 4256 2000 10-40 700 3.3 9dB 3680 1000 10-40 600 2.8 12dB 3104 1000 0-5 100 2.3
It would be interesting to have Wiresharked each of those, and given the current situation I might try and do that since I can easily put our line into a high error rate (making sure it's not long enough to attract the wrath of DLM of course).

In the absence of such decodes I can speculate on a few reasons. One could be that the error numbers seem high but are actually low in percentage terms. DLM "poor" threshold is given as 5>MTBE. At around 300 pps that could be only one packet in 1500 that's being lost.

Another factor may be that modern or even half modern TCP stack now set such high windows sizes that they can handle quite a few fast retransmits without slowing down at all.

In terms of IP voice I'm not sure you'd notice loss of an RTP packet every few of seconds, though not ideal.
Title: Re: Why does DLM slow down an erroring line?
Post by: aesmith on February 07, 2016, 09:43:37 AM: Another anomaly is that it appears the targets are absolute error counts per unit time. So slowing the line down may reduce those counts down to with acceptable values, while leaving the percentage data loss and therefore number of retransmissions unchanged.
Title: Re: Why does DLM slow down an erroring line?
Post by: WWWombat on February 10, 2016, 10:08:35 PM: Quote from: aesmith on February 07, 2016, 09:38:32 AM
I was trying to think why real world performance doesn't follow that theory.

My one piece of good real-world experience came from my first FTTC line. On day 1, it synced at full speed, but encountered about 4% packet loss, permanently. Throughput tests (speed tests, downloads etc) came out with a speed of about 2-3Mbps lower than it ought to.

48 hours later, DLM intervened with standard FEC and interleaving settings, and solved the packet loss. IIRC, that knocked the sync speed down by 2.5Mbps. Those throughput speeds ... stayed the same..

As luck had it, the line needed to be regraded the next day, to change upstream from 2Mbps to 10Mbps. This entailed a DLM reset, so the line went through this exact pattern a second time.

Quote from: ejs on February 07, 2016, 07:48:26 AM
It seems like everyone is just trotting out the standard theory which states some tiny number of errors will be horrifically bad.

Quite a lot of web based streaming can also be done over TCP, many sites will have a range of different protocols available, pre-recorded video often uses RTMP (TCP) or just HTTP.

My comment on video quality was focussed on the type of video BT cares about: their multicast TV service, where they need a quality that competes with Sky & VM. I'm sure they don't care about YouTube really, but catch up services are probably thought of quite highly now. This probably does help focus their minds nowadays.

Older ADSL DLM mechanisms were probably more focussed on keeping stable lines, reducing "truck rolls" as the Americans would put it.

Quote
The trouble is, the DLM will take action against ES counts that have minimal impact on web browsing or even bulk downloads.

The trouble is that DLM cannot distinguish what impact an ES count means, because it has no way to correlate it with a higher level. It could be highly significant, of little importance, or have no effect whatsoever.

BT are conservative, so will tend to err towards the "highly significant" more than otherwise.

Quote from: aesmith on February 07, 2016, 09:43:37 AM
Another anomaly is that it appears the targets are absolute error counts per unit time. So slowing the line down may reduce those counts down to with acceptable values, while leaving the percentage data loss and therefore number of retransmissions unchanged.

Aren't the targets more about "time in error" per unit time? The use of "ES" takes away the dependency on the absolute error count, and turns monitoring into a study of how many bursts of errors occurred rather than absolute error.

IIRC, a burst of errors is likely to create one batch of TCP retransmissions, so the two correlate. I think it would be very hard to eradicate most ESs but keep the same volume of data loss.
Title: Re: Why does DLM slow down an erroring line?
Post by: aesmith on February 11, 2016, 04:06:50 PM: Quote from: WWWombat on February 10, 2016, 10:08:35 PM
My one piece of good real-world experience came from my first FTTC line. On day 1, it synced at full speed, but encountered about 4% packet loss
DLM threshold of MTBE<5 sec is more like 0.06%.