In the early hours of Saturday the 15th there was a clap of thunder which woke us up. Janet scrambled to pull modems out, but it was too late. One ethernet cable was blackened, and the VLAN switch that muxes the four modems together was wiped out. Had to wait until Monday to talk to AA. AA sent us a new switch, no joy, then another new switch and a new Firebrick FB2900 which arrived yesterday, Friday 28th. Still no joy. All ethernet cables and RJ11 cables were replaced with brand new ones straight out of their packets but no luck.
All week we had been running on 3G USB ‘dongle’ failover and 4G AA iPad and EE iPhone SIMs, racked up a bill of £112 which AA refunded. Tonight I got an email from AA - the engineer who had been working with us all week had found the answer; AA had sent us two MUX switches with bad config in them. I don’t quite understand how the small MUX switch config came to be wrong, because there is a config file on the AA support website and I would have though that they could just use that?
So last night the mystery was resolved. Bad news about a week of semi-downtime, or at any rate restricted use, ie. having to be very careful with 4G/3G. I had accidentally downloaded a movie before I realised that we were running in failover 3G mode, racking up a huge bill. AA are very good and send out a warning email when your expenditure goes over a certain limit though.
All week I just couldn’t work out what was going on. I told myself that AA couldn’t possibly have sent me two bad MUX switches, but that was indeed what it was. I tried my FB2500 too, with appropriate configuration tweaks but no joy with that either so it seemed nothing worked. The modems were known to be good because BT remote modem status tests executed by clueless.aa.net.uk showed the modems were up, but I couldn’t ping the modems (this was because of the bad MUX switch's config). So it was driving me mad - known good fb2500 (bar any possible fb2500 config mistakes of mine), good modems, all new cables throughout and two new AA mux switches.
Somehow our AA engineer remotely fixed the mux switch config - must have been using NAT in the Firebrick as the switch has an RFC1918 admin IP address. I had created a remote admin login account for AA’s use so they could administer the Firebrick remotely (and I have set up a firewall hole for AA staff to allow them to access anything in my world).
Janet wants to send back the new FB2900 firebrick which arrived today - we don’t have any money for such things.
At least the mystery is finally solved. But what a nightmare. How many lightning strikes with damage does that make it so far this year?
Later on on Friday evening, I managed to unfix the fix in the mux switch’s config. Speculation: perhaps Janet power-cycled the unit, and the amended config had not been ‘saved’, that is, not committed to non-volatile storage ? In support of this speculation, Burakkucat tells me this behaviour is not unknown - where changes made to a unit’s config are not committed until some special ‘save’ command is issued - a design that helps prevent locking yourself out easily due to config-editing mistakes, because you can easily recover by simply rebooting the unit.
I discovered one of the modems is bad so I replaced it. That’s a small job for me on Saturday, configuring a new ZyXEL VMG1312-B10A.