Kitz Forum

Broadband Related => ADSL Issues => Topic started by: Weaver on March 30, 2016, 05:09:29 AM

Title: Sudden sustained packet loss on one line out of three
Post by: Weaver on March 30, 2016, 05:09:29 AM

Started last Thursday evening around 19:15 very suddenly, packet loss on one line out of three, as indicated by loss of responses to PPP LCP "ping" Echo Requests issued by the ISP's CQM system. See attached image, the red 'dripping blood' hanging down from the top is the indicator.

Why should it start so suddenly, and what might be behind it?

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on March 30, 2016, 05:12:20 AM

This line is approx 4.55 mi long ( est. ) by road. The BT DSL availability website gives the line length as ~6200m.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on March 30, 2016, 05:17:23 AM

I have been very out of it, so I will ask my wife to email Andrews and Arnold support tomorrow to get the ball rolling. I did a ‘copper line test’ at the weekend, which did not reveal the source of the problem.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on March 30, 2016, 05:18:25 AM

I've done obvious things such as power-cycling the modem. I haven't left it turned off across a DLM fifteen minute boundary period.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Chrysalis on March 30, 2016, 06:48:40 AM

The simple thing to do if you suspect the line is the cause is to monitor the error stats on the modem.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on March 30, 2016, 07:02:30 AM

I have plenty of spare preconfigured modems as well, that might be worth trying, just as a rule-out.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: kitzuser87430 on March 30, 2016, 08:38:16 AM

@weaver...is that the phone number in the top RHS of the graph?? We do know you only have dsl on the line but......

Ian

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on March 30, 2016, 09:25:18 AM

It is. (There's no voice service on the line. (I think.) It used to be an option as to what Andrews and Arnold would do with voice, but now I don't really understand what the story is. AA are desperate to stop all voice services and prevent users getting charged for any such thing.)

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Chrysalis on March 30, 2016, 09:44:30 AM

weaver I wonder if this is related? decided to have a read of revk's blog and seems they have had to do some work to relieve capacity issues. Although the most recent post suggests it is now resolved.

http://www.revk.uk/2016/03/growing-pains-next-step.html

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on March 30, 2016, 09:56:31 AM

There was a nasty two hour outage, not properly warned, early that Thursday morning. Then that evening one line went suddenly bad and has remained bad every since. Why it has stayed bad, and why only the one line, these are perplexing questions. The lines seem to all go to the same BRAS in ?Falkirk, so no obvious difference there.

Is it my imagination or is there some sort of repeating temporal pattern in the dripping blood?

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on March 30, 2016, 09:58:13 AM

There have been a few errored seconds per day some days, which is most unusual. The other two lines are zero errored seconds in each day, as expected.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on March 30, 2016, 10:01:48 AM

Had two faults this year so far, 1) piece of wire or solder or something fallen down somewhere it shouldn't be in exchange, and 2) knackered wiring shallowly buried near the fank at Harrapul, possibly after a lorry went over it, shorting wires out against one another or something, iirc.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on March 30, 2016, 03:13:01 PM

Andrews and Arnold had a brief look at the line this morning. Did an SNR reset to 6dB d/s instead of my dangerous 3dB, a change that has had no effect on the dripping blood at all.

And it didn't explain why the sudden change last Thursday evening, when it worked fine on a really really low target SNRM before then, just like the other two lines.

The problem is that I don't think anything is showing up in BT's tests.

Here's the CQM graph for last Thursday 2016-03-24. Scroll to the right to see the sudden onset of the problem:

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on March 30, 2016, 04:01:22 PM

We agreed that the d/s target SNRM 6dB was a waste of time, I think it's back to 3dB now. Listened to the line on a POTS phone, was all good.

Then swapped lines @a.1 and a.3 over at the wallsocket link stage, to see if the visible dripping blood would move from 1 to 3. An equipment rule-out.

Agreed to leave it in swapped over state until tomorrow morning when we reconvene.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on March 30, 2016, 05:51:57 PM

The dripping blood 'moved' when the lines were swapped over. Thank goodness. Now hopefully we can make some actual progress tomorrow.

It's my fault that this has gone on so long, I have been really out of it, very ill with overwhelming fatigue, and so I didn't notice the dripping blood until Sunday and then there was the bank holiday. AA didn't pick up on my requests for support delivered by Twitter, so I lost more time as I had to get my wife to email tech support which again I did belatedly because I was so under the weather.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on March 30, 2016, 06:04:07 PM

I wonder if someone (me) could write a program to get at clueless' live CQM data and continuously be on the look out for dripping blood, sounding an alarm in some way: sending email, an SMS or a tweet, or an alert via an SNMP trap etc.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 01, 2016, 11:41:36 AM

On Wednesday, the dripping blood ‘moved’ when the DSL cables from two of the wall sockets to the modems were swapped. Fair enough. So it's the case that we have a bad modem #1, sending its output now into line @a.3. Surely?

But - I put things back the way they were originally yesterday afternoon and the dripping blood is back on line @a.1 - no surprise there.

This morning I swapped out modem #1 for a known good spare. And the dripping blood continues. So we have a bad line 1. Sanity check.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 01, 2016, 01:00:41 PM

I've even swapped out the RJ11-Rj11 cable that was in use on line 1, just in case. No change as expected.

(Don't have any idea why Wednesday's perplexing result happened. Everything else seems to say “line1 bad” unless I'm going crackers, which is a distinct possibility.)

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 01, 2016, 01:01:23 PM

Second brain needed.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 01, 2016, 01:37:03 PM

Have now even replaced the straight-through rj11-plate with a dangly microfilter straight into the test socket.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 01, 2016, 03:45:38 PM

Andrews and Arnold have finally completed all the rule out tests that need to be done at my end. They've submitted something to BT.
===

Andrews and Arnold Ltd

BT Knowledge Based DiagnosticsAuto reload on update
Line BBEU2070004201/04/2016 14:25:02QDMNN Today 14:24:45 andy@a CLOSED COMPLETED

Problem:
The End User has a working session.
This is a BT diagnostics tool. If you need any help with understanding this please contact support. Where we go on to report a fault based on this we expect you to have answered all the questions accurately. If we are charged by BT as a result of incorrect answers we will charge you.

Today 14:25:42: No fault identified in BTW network. Do you wish to continue with further KBD diagnostics? (Note: To see the KM lite Analysis go to the Drill Down view. If you wish to continue with KBD diagnostics please note intrusive testing may be involved and please ensure the End User Modem is Switched-On and connected.)
: Yes

Today 14:25:42: Please confirm the type of problem being experienced with the service.
: Connection

Today 14:27:58: Do you have your own End User set up checks process? Please note: You must complete all CP and End User checks before (continuing with KBD diagnostics). Please confirm this have been carried out and no customer fault identified. If not please select the 'NO' option below.
: YES

Today 14:27:58: Has the issue been resolved?
: NO

Resolution: ISP15
Please confirm all CP and End User checks have been completed. KBD tests indicate no BTW network fault. This is a CCSFI enabled outcome.
Test Results / Notes
Product Info   WBC End User ACCESS
Profile Info   WBC 160K - 24M Medium delay (INP 1) 3dB Downstream, UC Medium delay (INP 2) 6dB Upstream (ADSL2+)
BRAS Profile   adsl2000-b
RRT:
RRT:Prognosis for a period of 14 days from 18-Mar-2016 to 31-Mar-2016. Line operated in Low Power consumption mode (L2) for 0% of analysis period. The circuit was in sync throughout the specified analysis period. Please refer to the other sub tests within the KBD including the Status Check to confirm whether the circuit is currently in sync and logged on. If the circuit is currently out of sync please carry out internal wiring, filters, modem/router checks with the End User at the master socket where possible.The circuit has no dropping syncs. Please carry out internal wiring, filters and modem/router checks with the End User at the master socket where possible. Also, refer to the other sub tests within the KBD and use the Performance Tester.This Line is not flapping today (01-Apr).
Down:The line rate has varied by a small amount and frequently on most of the days during the analysis period. The line rate is within acceptable limits. Please see the average value. MIN=2699 AVG=2800 MAX=2848
Down:The noise margin is constant throughout the analysis period.The average margin for this line is at the bare minimum, which can cause dropping connections. MIN=1 AVG=1 MAX=3
Down:There have been insignificant errors on the line during almost all parts of the day. This behaviour happens on almost all days during the analysis period. This is normal behaviour for a DSL product and is not affecting the service in any way. MIN=78 AVG=1859 MAX=3600
Down:This is an extremely long line, and lower line rates can be expected. MIN=65 AVG=65 MAX=65
Down:There have been a few initializations on the line during just a few parts of the day. This behaviour happens on most of the days during the analysis period. MIN=0 AVG=1 MAX=5
Down:This circuit is up for an average 85.0% of the time. MIN=900 AVG=73549 MAX=86400
Up:The line rate has varied by a small amount and frequently on almost all days during the analysis period. The line rate is very high (good). Please see the average value. MIN=525 AVG=530 MAX=534
Up:The noise margin is constant throughout the analysis period.The Noise margins are low. Please see the average value. MIN=5 AVG=6 MAX=7
Up:There have been insignificant errors on the line during almost all parts of the day. This behaviour happens on almost all days during the analysis period. This is normal behaviour for a DSL product and is not affecting the service in any way. MIN=102 AVG=2681 MAX=3600
Up:This is an extremely long line, and lower line rates can be expected. MIN=41 AVG=42 MAX=42
RADIUS:WORKED CWCC@A.1 2016-03-30T03:31:08.000
Status:Circuit In Sync NTE /PowerOn MUX Up LL=41.9 SNR=5.9 522kb/s Down LL=64.5 SNR=1.8 2809kb/s
Copper:Line Test OK - End User Equipment detected ACap=nF BCap=nF DPDist=m DNDist=m

Title: Re: Sudden sustained packet loss on one line out of three
Post by: burakkucat on April 01, 2016, 06:09:57 PM

b*cat asks the question --

Did you swap out the modem's power supply as part of your hardware checking?

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 02, 2016, 05:19:03 AM

No, I didn't swap out the power supply. I'd better do so ASAP. This will mean calling upon my long-suffering assistant once more.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: burakkucat on April 02, 2016, 03:18:28 PM

Quote from: Weaver on April 02, 2016, 05:19:03 AM

No, I didn't swap out the power supply. I'd better do so ASAP.

Ah . . .

Quote

This will mean calling upon my long-suffering assistant once more.

Nods knowingly. ;)

I have read through the A&A information (listed above) and am at a loss as to: (1) what it means (2) what they are requesting that BT (in whichever guise) do . . . Very confused. ???

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 03, 2016, 12:23:52 AM

I am assuming that this stuff I've quoted is a report by BT's 'intelligent' case analysis system sent to AA. I was told that AA have booked a BT engineer visit.

AA now have tech support formally available on Saturdays. Hurray! Before, there would often be someone hanging around in IRC just in case though.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: aesmith on April 03, 2016, 07:40:56 PM

The thing that strikes me is that you're reporting packet loss, but the BT blurb seems to be saying very low error rates. Is that the case? I'm not completely sure, for example if your drops were in bursts, and BT's averaging over 24 hours then maybe the two do tie up. If that's not the case, and you have a high drop rate but low error rate then it doesn't sound like a line fault does it?

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 04, 2016, 12:37:26 PM

It doesn't have to be a DSL fault at all. What if it's further upstream?

Title: Re: Sudden sustained packet loss on one line out of three
Post by: aesmith on April 04, 2016, 01:09:33 PM

Indeed, I was jumping to conclusions that the BT engineer was booked to look at the local line. Could be a DSLAM card issue?

Title: Re: Sudden sustained packet loss on one line out of three
Post by: burakkucat on April 04, 2016, 06:16:50 PM

My feeling is, knowing A&A's reputation, that an appropriate support request has been made to attend to the appropriate entity but the feed-back, as seen by Weaver and reproduced here, is garbled. :-\

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 04, 2016, 10:16:56 PM

One of my earlier faults this year was in the exchange, iirc.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 05, 2016, 07:45:24 AM

Packet loss in PPP LCP pings continues just the same, bright red ‘dripping blood’ from the top of the graph down.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 05, 2016, 10:26:36 AM

I've now swapped lines @a.4 and @a.1 at the wallsocket points. The fault has now moved to graph 4 in clueless. Sanity check: So this means a bad modem in line 1 doesn't it? (Brain failing due to exhaustion, apol.)

Status now: Graph 1 currently clear, graph 4 dripping blood.

Suggestion: Earlier perplexing results would just mean that it's either something to do with duff PSUs - which have not been swapped out (thanks Burakkucat, for pointing this out) - or there's more than one bad modem in the spares pool?

Title: Re: Sudden sustained packet loss on one line out of three
Post by: d2d4j on April 05, 2016, 11:32:20 AM

Hi weaver

Why not save time and change the power supply from a known good router with the perceived bad router

If issue is then seen on the good router as was with original power supply, you know it's the psu

If the issue stays with the perceived bad router, just change the router with good to see if it resolved issue (keep it on same line though)

If issue remains, can you factory default bad router and reset it back up, then test

Your theory over bad router spare seems ok, but I would test this by changing a good router with this spare and see if issue arises

Many thanks

John

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 05, 2016, 12:38:15 PM

@d2d4j good point. Will do, delayed because I have been waiting a short while to let A&A collect a little more data.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 05, 2016, 12:54:18 PM

FYI: I must not put these devices back to factory settings as they need to be in modem-only mode (bridge mode) which is a non-default setting preconfigured by AA before delivery.

(Btw And as routers they are shite, a security hazard, and in any case unusable as what I need is a straight PPPoE modem not a router. See earlier thread investigating the innards of the DLink DSL-320B-Z1:
http://forum.kitz.co.uk/index.php/topic,17065.msg313922.html#msg31392
)

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 05, 2016, 05:45:43 PM

This afternoon, I changed the modem out yet again, for yet another spare. (I have a lot of fairly new spares.) I also swapped out the PSU along with it.

I then swapped the lines back at the Wallsockets to the way they were originally.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: burakkucat on April 05, 2016, 07:15:05 PM

As long as you are keeping notes of each experiment performed and the result obtained, then you should be able to rule out (or in) any particular item of local hardware.

I have to confess that now I am completely ker-fuddled as to "what's what". ???

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 05, 2016, 07:55:25 PM

The next step was to question the Firebrick itself, as its port operations haven't been investigated. I swapped the Ethernet cables between modems #1 and #3 (line @a.4) where they go into the Firebrick's ports.

Amazingly this has almost 99% cured the problem, with a microscopic amount of dripping blood just before every hour. This time pattern suggests a software bug perhaps.

I looked back at the Firebrick software upgrade status, and I noted that there had been a software upgrade a few days before the onset of the fault, but there's no obvious way to explain with the time gap between the installation and the start of the fault.

One other thing I should look at is the network cables between Firebrick and modems just in case they could have been damaged.

So the current state is that the dripping blood is spectacularly reduced, down to acceptable levels. Because the cables are into the wrong ports compared with the line identifiers @a.1, @a.3 etc, I will have to clean up somehow.

At the moment, the Firebrick config is wrong, because it specifies an upstream traffic limit in bps for each line, and these have got scrambled. Line 1 has traditionally had a rather higher upstream traffic allowance than the others, this is now incorrectly assigned to line @a.4 and vice versa. I wonder if this upstream rate limiting system could have something to do with it. It would have to have been hacked in the most recent release.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 05, 2016, 08:27:36 PM

All of this work carried out by my beautiful assistant, thanks Janet.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Chrysalis on April 05, 2016, 08:33:14 PM

This is very hard to guess remotely what might be going on, cable swapping improving things would suggest to me tho that the line isnt at fault.

Swapping the cables on the firebrick would also perhaps serve to re-initiate the connections, flushing buffers and so forth, meaning it could be something like a memory leak which got resolved when the port was reset. So another idea is if it starts getting worse, dont swap any cables but just reboot the firebrick, to see if that also yields a temporary improvement.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: aesmith on April 06, 2016, 09:17:07 AM

I think the A&A monitoring is by LCP ping, so I was wondering whether that would preempt any traffic shaping on your PPPoE router. Depends how the shaping an queuing is done on the firebrick. Does the A&A graph show packet loss if you max out your links with real traffic?

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 06, 2016, 11:55:24 PM

No, it doesn't seem to show bright red in the period of a speed test.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 07, 2016, 12:21:55 AM

And it's back. Bad as ever, after about 22 hrs. It came back showing on line @a.4 at around 15:30. This line is the one that originally was connected to port #3 of the Firebrick, now it's connected to port #1 on the FB.

So it seems (sanity check pls?) that the trouble follows Firebrick port #1 around.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: burakkucat on April 07, 2016, 12:52:43 AM

Quote from: Weaver on April 07, 2016, 12:21:55 AM

So it seems (sanity check pls?) that the trouble follows Firebrick port #1 around.

For the first 24 hour period, connect the equipment as --

Line 1 > Port 1
Line 2 > Port 2
Line 3 > Port 3

For the second 24 hour period, connect the equipment as --

Line 1 > Port 2
Line 2 > Port 3
Line 3 > Port 1

Then, finally, for the third 24 hour period, connect the equipment as --

Line 1 > Port 3
Line 2 > Port 1
Line 3 > Port 2

For each of the three 24 hour periods note which line shows the problem. If it is Line 1, Line 3 and Line 2 then that does appear to point to Port 1 on the Firebrick.

Perhaps you should suggest to Adrian Kennard that he might like to spend a long weekend at Skye Shepherd Huts and, whilst in the vicinity, take a look at your Firebrick FB2500?

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 07, 2016, 06:16:13 AM

> Perhaps you should suggest to Adrian Kennard that he might like to spend a long weekend at Skye Shepherd Huts and, whilst in the vicinity, take a look at your Firebrick FB2500?

That's an excellent idea. I'm told it's very agreeable there!

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Weaver on April 07, 2016, 06:18:09 AM

This week I've only been awake a few hours every day, so my fault-finding abilities and the stamina-reserves of my beloved assistant have become quite limited.

Title: Re: Sudden sustained packet loss on one line out of three
Post by: Chrysalis on April 07, 2016, 03:25:05 PM

did you reboot after it came back as I advised?