Andrews and Arnold core packet loss; central badness Sunday 16:00 / 17:00

All my DSL links and my fixed location 3G link went ‘up’ and ‘down’ like a yo-yo yesterday (Sunday 2020-09-06 16:00) although not both at the same time. This would seem to be due to PPP LCP packet loss, not DSL links themselves really dropping. That is, if my router detects packet loss as defined by absence of quality monitoring PPP LCP echo request responses then it would drop the link. Also if AA sees the same thing from their end then they would drop the link and send me a report. Also the control and monitoring server lost the vital constant quality monitoring data that forms graphs of line activity and reliability for a critical part of the badness period.

Even the 3G link into my router went apparently bad from 16:00 to 17:00, although the time period is not the same as the badness of the other links that are via DSL.

AA issued this report:

AAISP Incidents: [MSO] Broadband: Some BT/TT lines dropping and packetloss since 4PM
Started: 2020-09-06 16:00:00
We're investigating line drops that started at around 4PM and is affecting various broadband lines.

Update 6 Sep 2020 18:31:40 We're seeing BT lines on three out of nineteen LNSs having high packetloss and PPP drops, we're still investigating.

Update 6 Sep 2020 18:55:52 We've just disabled one of our hostlinks to BT, this would have caused a PPP drop, and we're waiting to see if this improves service.

Update 6 Sep 2020 19:00:14 We're still seeing loss and are still investigating.

Update 6 Sep 2020 19:09:58 A large number of lines (Both BT and TalkTalk) would have just lost connection, and should reconnect shortly. Please to bear with us as we continue to investigate this fault.

Update 6 Sep 2020 19:38:06 We currently have some BT lines with packetloss and some TalkTalk lines which have PPP failures when trying to reconnect, we're still working on this.

Update 6 Sep 2020 19:52:30 Quick summary of where we are at the moment: We're having problems with our networking in our Equninx datacentre which has interconnects to BT and TalkTalk. We also BT and TalkTalk have interconnects in our Telehouse datacentre too - which are ok. The problem is, is that the Equinix side is half not working, which is causing these problems.

Update 6 Sep 2020 20:00:59 Having disabled our TalkTalk interconnect in Equinix, the affected TalkTalk lines are reconnecting now. We still have some BT lines with packet loss, which we are still investigating.

Update 6 Sep 2020 20:07:03 BT lines which were affected by packet loss are now looking better (ie normal!)

Update 6 Sep 2020 20:13:50 Service should be restored for everyone. We have moved all our BT and TalkTalk traffic away from our Equinix datacentre and service is all being routed via Telehouse (Usually traffic would be split roughly 50/50). This is a temporary fix whilst we investigate the actual cause of this evenings problems. The problem is to do with our network and not related to to datacentre or BT/TalkTalk problems specially, we will continue with our investigations. Update expected: 2020-09-07 13:00:00

First central badness of this type in - what? - nine years with AA. I probably have seen core badness due to BT, seem to recall an AA MSO due to a DDOS attack, but I don’t recall a period of badness where it was AA’s fault. Good for them that they immediately own up to it. Unfortunately I had already emailed them before I got the email quoted above, so wasting their time, overloading support.

I was in irc just after the talktalk situation exploded as I was booted from a online game.

As I was told in the channel, it was initially a packetloss problem on some BT lines, but aaisp couldnt figure out the cause so they started rebooting stuff, they expected TT lines to reconnect after they were booted but then many failed to reconnect and it then exploded into a TTB problem as well.  Its at that point when my line went down.

I am a BTW user not TTB. It was nasty.

A downside of having all lines connecting to the same provider, and why there is a place for the technology I work with as an overlay taking in multiple circuits.

If I could I would be using entirely different networks for my two services but it is what it is.


