Kitz ADSL Broadband Information
adsl spacer  
Support this site
Home Broadband ISPs Tech Routers Wiki Forum
 
     
   Compare ISP   Rate your ISP
   Glossary   Glossary
 
Please login or register.

Login with username, password and session length
Advanced search  

News:

Pages: [1] 2

Author Topic: Andrews and Arnold core packet loss; central badness Sunday 16:00 / 17:00  (Read 597 times)

Weaver

  • Addicted Kitizen
  • *****
  • Posts: 8946
  • Retd sw dev; A&A; 4 × 7km ADSL2; IPv6; Firebrick

All my DSL links and my fixed location 3G link went ‘up’ and ‘down’ like a yo-yo yesterday (Sunday 2020-09-06 16:00) although not both at the same time. This would seem to be due to PPP LCP packet loss, not DSL links themselves really dropping. That is, if my router detects packet loss as defined by absence of quality monitoring PPP LCP echo request responses then it would drop the link. Also if AA sees the same thing from their end then they would drop the link and send me a report. Also the clueless.aa.net.uk control and monitoring server lost the vital constant quality monitoring data that forms graphs of line activity and reliability for a critical part of the badness period.

Even the 3G link into my router went apparently bad from 16:00 to 17:00, although the time period is not the same as the badness of the other links that are via DSL.

AA issued this report:



AAISP Incidents: [MSO] Broadband: Some BT/TT lines dropping and packetloss since 4PM
[MSO] Broadband: Some BT/TT lines dropping and packetloss since 4PM
Posted: 06 Sep 2020 12:19 PM PDT
Started: 2020-09-06 16:00:00
We're investigating line drops that started at around 4PM and is affecting various broadband lines.

Update 6 Sep 2020 18:31:40 We're seeing BT lines on three out of nineteen LNSs having high packetloss and PPP drops, we're still investigating.

Update 6 Sep 2020 18:55:52 We've just disabled one of our hostlinks to BT, this would have caused a PPP drop, and we're waiting to see if this improves service.

Update 6 Sep 2020 19:00:14 We're still seeing loss and are still investigating.

Update 6 Sep 2020 19:09:58 A large number of lines (Both BT and TalkTalk) would have just lost connection, and should reconnect shortly. Please to bear with us as we continue to investigate this fault.

Update 6 Sep 2020 19:38:06 We currently have some BT lines with packetloss and some TalkTalk lines which have PPP failures when trying to reconnect, we're still working on this.

Update 6 Sep 2020 19:52:30 Quick summary of where we are at the moment: We're having problems with our networking in our Equninx datacentre which has interconnects to BT and TalkTalk. We also BT and TalkTalk have interconnects in our Telehouse datacentre too - which are ok. The problem is, is that the Equinix side is half not working, which is causing these problems.

Update 6 Sep 2020 20:00:59 Having disabled our TalkTalk interconnect in Equinix, the affected TalkTalk lines are reconnecting now. We still have some BT lines with packet loss, which we are still investigating.

Update 6 Sep 2020 20:07:03 BT lines which were affected by packet loss are now looking better (ie normal!)

Update 6 Sep 2020 20:13:50 Service should be restored for everyone. We have moved all our BT and TalkTalk traffic away from our Equinix datacentre and service is all being routed via Telehouse (Usually traffic would be split roughly 50/50). This is a temporary fix whilst we investigate the actual cause of this evenings problems. The problem is to do with our network and not related to to datacentre or BT/TalkTalk problems specially, we will continue with our investigations. Update expected: 2020-09-07 13:00:00
Logged

Weaver

  • Addicted Kitizen
  • *****
  • Posts: 8946
  • Retd sw dev; A&A; 4 × 7km ADSL2; IPv6; Firebrick
Re: Andrews and Arnold core packet loss; central badness Sunday 16:00 / 17:00
« Reply #1 on: September 07, 2020, 02:15:53 AM »

First central badness of this type in - what? - nine years with AA. I probably have seen core badness due to BT, seem to recall an AA MSO due to a DDOS attack, but I don’t recall a period of badness where it was AA’s fault. Good for them that they immediately own up to it. Unfortunately I had already emailed them before I got the email quoted above, so wasting their time, overloading support.
Logged

Chrysalis

  • Content Team
  • Addicted Kitizen
  • *
  • Posts: 6295
Re: Andrews and Arnold core packet loss; central badness Sunday 16:00 / 17:00
« Reply #2 on: September 07, 2020, 01:09:59 PM »

I was in irc just after the talktalk situation exploded as I was booted from a online game.

As I was told in the channel, it was initially a packetloss problem on some BT lines, but aaisp couldnt figure out the cause so they started rebooting stuff, they expected TT lines to reconnect after they were booted but then many failed to reconnect and it then exploded into a TTB problem as well.  Its at that point when my line went down.
Logged
AAISP - Billion 8800NL bridge & PFSense BOX running PFSense 2.4 - ECI Cab - LINE STATISTICS CLICK HERE

Weaver

  • Addicted Kitizen
  • *****
  • Posts: 8946
  • Retd sw dev; A&A; 4 × 7km ADSL2; IPv6; Firebrick
Re: Andrews and Arnold core packet loss; central badness Sunday 16:00 / 17:00
« Reply #3 on: September 07, 2020, 11:20:54 PM »

I am a BTW user not TTB. It was nasty.
Logged

CarlT

  • Kitizen
  • ****
  • Posts: 1672
  • Next generation network design and deployment
Re: Andrews and Arnold core packet loss; central badness Sunday 16:00 / 17:00
« Reply #4 on: September 07, 2020, 11:40:12 PM »

A downside of having all lines connecting to the same provider, and why there is a place for the technology I work with as an overlay taking in multiple circuits.

If I could I would be using entirely different networks for my two services but it is what it is.
Logged
WiFi: Nighthawk® AX12 RAX120 - 5Gb uplink
Routing: pfSense VM - 10Gb in and indeed out
Switching: 2 * Mikrotik CRS305-1G-4S-IN, 10Gb uplinks, various cheap and cheerful
Exchange: Wakefield
ISP: BT Full Fibre 900. Zen Full Fibre 900. Zoom, zoom.

Weaver

  • Addicted Kitizen
  • *****
  • Posts: 8946
  • Retd sw dev; A&A; 4 × 7km ADSL2; IPv6; Firebrick
Re: Andrews and Arnold core packet loss; central badness Sunday 16:00 / 17:00
« Reply #5 on: September 08, 2020, 06:04:42 AM »

@CarlT - Indeed. AA has suggested to customers that they consider having one TTB line and one BTW line where this is an option. Not for me because there is no TTB service here. Could you use failover to 4G/5G and tunnel over that? (Damned MTU.)

There is another way of looking at it, which is the way I have actively chosen to go. I have chosen to put all of my eggs into one basket because then there is no buck-passing between ‘component’ service providers. Whatever goes wrong, as far as possible it is AA’s problem. Even BTW / OR faults and faults with Three/AQL, AA has to manage them for me. I don’t have to diagnose them, assign blame and watch while the guilty wriggle out and try to accuse someone else.

From the evidence of the last ten years, AA is more reliable than the electricity here (which is really really good, I might say). I can use 4G/EE if all else goes pear-shaped.
« Last Edit: September 08, 2020, 06:07:33 AM by Weaver »
Logged

Weaver

  • Addicted Kitizen
  • *****
  • Posts: 8946
  • Retd sw dev; A&A; 4 × 7km ADSL2; IPv6; Firebrick
Re: Andrews and Arnold core packet loss; central badness Sunday 16:00 / 17:00
« Reply #6 on: September 08, 2020, 06:08:11 AM »

I hope that AA will let us know what the story was.
Logged

burakkucat

  • Global Moderator
  • Senior Kitizen
  • *
  • Posts: 30433
  • Over the Rainbow Bridge
    • The ELRepo Project
Re: Andrews and Arnold core packet loss; central badness Sunday 16:00 / 17:00
« Reply #7 on: September 08, 2020, 04:25:33 PM »

I hope that AA will let us know what the story was.

Yes, indeed. I'm sure we could all learn from it.
Logged
:cat:  100% Linux and, previously, Unix. Co-founder of the ELRepo Project.

Please consider making a donation to support the running of this site.

Chrysalis

  • Content Team
  • Addicted Kitizen
  • *
  • Posts: 6295
Re: Andrews and Arnold core packet loss; central badness Sunday 16:00 / 17:00
« Reply #8 on: September 15, 2020, 11:29:30 AM »

Another problem last night, again I see I wasnt the only one based on the blip graph, I had no service until I manually reinitialised about 20 minutes ago which made the outage 8 hours for me.

I have also had another couple of PPP outages (again during night) which AAISP couldnt provide an explanation for which I assume was maybe TTB maintenance work.  So they are tallying up, I am hoping they get back to how they were as before the last month or so my service was fine.

Its logged on the status page, info here.

https://aastatus.net/37299
Logged
AAISP - Billion 8800NL bridge & PFSense BOX running PFSense 2.4 - ECI Cab - LINE STATISTICS CLICK HERE

Alex Atkin UK

  • Kitizen
  • ****
  • Posts: 1524
    • My Broadband History
Re: Andrews and Arnold core packet loss; central badness Sunday 16:00 / 17:00
« Reply #9 on: September 15, 2020, 03:33:58 PM »

Quote
The initial cause of this outage was planned work on our TalkTalk interconnect in our Telehouse datacentre by TalkTalk. This would normally be OK, except that our second interconnect in a different datacentre, Equinix, was taken out of service by ourselves last week due to a separate incident: https://aastatus.net/37182 (aggggghhhhhhhhhhhhhhh)

Oh dear.  The ongoing saga also sounds like a huge cock-up by Talk Talk.
Logged
Exchange: INTAKE (ECI) ISP/Modems: Zen (Home Hub 5A running OpenWrt) + Plusnet (VMG-3925-B10B) + Three (Hauwei B535-232)
Router: pfSense (i5-7200U) WiFi: Ubiquiti nanoHD

Chrysalis

  • Content Team
  • Addicted Kitizen
  • *
  • Posts: 6295
Re: Andrews and Arnold core packet loss; central badness Sunday 16:00 / 17:00
« Reply #10 on: September 15, 2020, 05:56:40 PM »

I have since chatted to Andrew over IRC, it was unfortunate as can see on the status page, this wouldnt have happened if the issue last week didnt make them disable the second interconnect (there would have been I expect just a 1 minute drop and reconnect instead), but I gave him some info on my setup to help them work out why some people didnt connect back properly.

I have a 20 minute outage to look forward to later this month as well as there is planned work listed for TT covering my exchange.  Good to know this in advance.
« Last Edit: September 15, 2020, 05:59:46 PM by Chrysalis »
Logged
AAISP - Billion 8800NL bridge & PFSense BOX running PFSense 2.4 - ECI Cab - LINE STATISTICS CLICK HERE

Weaver

  • Addicted Kitizen
  • *****
  • Posts: 8946
  • Retd sw dev; A&A; 4 × 7km ADSL2; IPv6; Firebrick
Re: Andrews and Arnold core packet loss; central badness Sunday 16:00 / 17:00
« Reply #11 on: September 16, 2020, 04:05:33 PM »

Do I take it from https://aastatus.net/37312 that AA knows that there is a lot more to this than the TalkTalk thing though? I am not a TalkTalk user after all so I don’t have an explanation.
Logged

burakkucat

  • Global Moderator
  • Senior Kitizen
  • *
  • Posts: 30433
  • Over the Rainbow Bridge
    • The ELRepo Project
Re: Andrews and Arnold core packet loss; central badness Sunday 16:00 / 17:00
« Reply #12 on: September 16, 2020, 05:30:29 PM »

Do I take it from https://aastatus.net/37312 that AA knows that there is a lot more to this than the TalkTalk thing though?

I am under the impression that the listed work is related to the earlier, not the latest, outage. But then A&A will never make a simple statement, preferring to quote links to other events to add to the overall, general, confusion. (Why make a simple statement when one can make it confusingly complicated?)  :-X
Logged
:cat:  100% Linux and, previously, Unix. Co-founder of the ELRepo Project.

Please consider making a donation to support the running of this site.

Weaver

  • Addicted Kitizen
  • *****
  • Posts: 8946
  • Retd sw dev; A&A; 4 × 7km ADSL2; IPv6; Firebrick
Re: Andrews and Arnold core packet loss; central badness Sunday 16:00 / 17:00
« Reply #13 on: September 17, 2020, 06:10:36 PM »

AA staff were doing more investigative work, and some fixes in the early hours of this morning (Thursday morning). See https://aastatus.net/37312 :
Quote
“There are a couple of outstanding things we still need to work on, but on the whole this has been successful. The main thing being the reboot of a core switch which has fixed most of the problems we've had recently.”

AA-Andrew was kind enough to email me and he said:
Quote
“We were able to resolve some of the problems that had been part of the recent outages. In short, one of our switches was somehow upset and was made happy by rebooting it. This was the last thing to try after we had checked over configurations and tried other things, but in the end it was a turn it off and back on again.”

This morning’s maintenance work sent everything completely crazy here - 42 emails [!] from clueless 04:21-04:43 relating to bogus ‘line-down/up’ events, triggered by detection of packet loss; however failover kicked in (often ;) ) and 3G took over seamlessly, albeit unnecessarily.
Logged

Alex Atkin UK

  • Kitizen
  • ****
  • Posts: 1524
    • My Broadband History
Re: Andrews and Arnold core packet loss; central badness Sunday 16:00 / 17:00
« Reply #14 on: September 17, 2020, 07:02:50 PM »

AA staff were doing more investigative work, and some fixes in the early hours of this morning (Thursday morning). See https://aastatus.net/37312 :
AA-Andrew was kind enough to email me and he said:
This morning’s maintenance work sent everything completely crazy here - 42 emails [!] from clueless 04:21-04:43 relating to bogus ‘line-down/up’ events, triggered by detection of packet loss; however failover kicked in (often ;) ) and 3G took over seamlessly, albeit unnecessarily.

Living up to its name I see.  ::)
Logged
Exchange: INTAKE (ECI) ISP/Modems: Zen (Home Hub 5A running OpenWrt) + Plusnet (VMG-3925-B10B) + Three (Hauwei B535-232)
Router: pfSense (i5-7200U) WiFi: Ubiquiti nanoHD
Pages: [1] 2
 

anything