Kitz Forum

Internet => General Internet => Topic started by: Weaver on February 09, 2019, 02:40:31 PM

Title: 3G failover problem
Post by: Weaver on February 09, 2019, 02:40:31 PM
I have been trying to debug two problems plus something that is possibly a red herring.

A while back there was a thunderstorm in the distance, about 15 mi south of me. I heard the thunder in the distance. Stupid hardware lightning alarm unit didnít sound an alert and I didnít see it flash (whether or not it did so). My lightning alert all was not running, which is my own stupid fault plus Sodís law. So lucky that I just heard it.

Being very nervous about such things, I asked the poor sleeping Mrs Weaver if she would kindly unplug dsl lines to protect them. At this point my Firebrick routershould have failed over to 3G via a USB dongle automatically but it seemed that for some unknown reason the 3G link was down so the main internet connection was down as a result. I checked a few things over and decided to force reinitialisation of the 3G link by rebooting the Firebrick, which, although extreme, was the quickest and easiest way. This fixed the problem, 3G link up, failover working and main internet connection restored.

But the question then was why had the 3G link been down anyway. Talking to AA, my ISP, from the evidence of logs it seems that the 3G link had been down for several days and I had not noticed, somehow.

So that was the first problem, why had the link failed?

Other questions were: how best to detect such a problem in future? AAís clueless server should alert me, in theory anyway so that should be fine. For extra insurance I thought about adding something into the Firebrick config to continually ping-test the 3G link, but I have no idea how to do that.

I set up an external server using the ping test server (which uses a Firebrick ping tester box), thanks to a wonderful tip in another thread. This monitored the wan IPv4 address of the 3G link.

Now it turned out that ICMP-pinging the wan IPv4 address of the 3G link using the server or from other external screen addresses just failed.

The question is why? Is the link not really Ďupí.

AA staff and I tested failover to 3G by faking all the dsl lines going down, by tampering with the config file temporarily. Failover worked ok.

So the next question is: do I need to really worry about the mystery of not being able to ping the 3G dongle? Does this inability have anything to do with the 3G link actually not working when it comes to a failover situation?

I suppose I should ask if this is something unknown about Firebrick behaviour or behaviour of AAís servers at their end, or both. Since my Firebrick and the AA routers know (certainly could in theory possibly know) that the 3G link is meant to be used in failover only, then perhaps one or the other end is either dropping the link or disabling downstream routing to it after the usage period during failover is ended.

So at the moment, for reasons unknown, I cannot use the excellent ping monitoring facility as a double check that the 3G link is really working.

Iím worried that the 3G link might go down again at some point for reasons unknown. And also what if it should happen without me knowing about it and possibly without AAís server spotting it and warning me. The server should, I think, be continuously PPP LCP-ping testing that link, and that proves that the link is really working, not just claiming to be up. If that system is all good then I need have no worries about missing out on alerts.

It may be that I didnít spot an alert concerning the 3G link going down because I confused it with alerts relating to DSL modem links dropping and those are all too frequent and tend to get casually binned some times. I perhaps need to think of a way of conditionally highlighting any specific emails from AAís monitoring systems that are about that one particular link.

The other remaining problem is that if I find out that the 3G link really is down, then how do I debug it? And also how do I capture enough information about what badness it was that made it go down at that time?
Title: Re: 3G failover problem
Post by: burakkucat on February 09, 2019, 03:46:46 PM
I understand your problem but I am having trouble trying to think of a reliable solution.

The 3G link. Should that really be "up" when it is not required? Would that then be consuming the allowance? Perhaps other members, who have a failover backup link for their own situation, will be able to advise in general terms.  :-\
Title: Re: 3G failover problem
Post by: Ronski on February 09, 2019, 04:16:47 PM
Do you have a public IP address on the 3G link? Arn't most behind CGNat?
Title: Re: 3G failover problem
Post by: j0hn on February 09, 2019, 04:19:20 PM
I'm not sure how Firebrick work the 3G failover.
On 2 routers I own that have 3G failover the 3G link is completely down until required.
It would not respond to any ICMP pings until activated.

Are you able to tell if the 3G failover is connected when the DSL lines are up?
Title: Re: 3G failover problem
Post by: Weaver on February 09, 2019, 07:22:42 PM
@j0hn understood about link being down until required. Perhaps that is indeed just what it does.

> Are you able to tell if 3G failover is connected when the DSL lines are up?

No. But only because I donít have a means of testing it and because I donít exactly understand the status info that I am seeing. The Firebrick leads me to believe that the link is Ďupí whatever that might mean; it says the interface is up and we have a 3G PPP connection established

Code: [Select]
Attached USB devices
Socket Vendor Product Name    Functions
1.4 12d1 1003 Dongle-AA Memory-stick 3G(AT-ppp)
1 1a40 0101 Hub
3G/PPP Dongle Sessions
Socket T Name MTU Status
1.4 0 Dongle-AA 1440 Up tcp-fix
You do not have any 4G/eth sessions

@Ronski - I have a global routable static IPv4 address assigned to the WAN 3G dongle i/f. It is assigned by PPP NCP, not statically configured. Sanity check: I know that the IP address is correctly set up and recognised because I tried pinging it through the Firebrick from the main LAN and got a response. Actually I wonder if that is weak evidence that the 3G link state is up? Because the IP address assignment only happens once PPP NCP has done its thing, seeing as I did not hard-configure that address. Itís never mentioned in the config file.

Indeed I think a lot of 4G/3G Carriers only give out CGNAT addresses, from what Iíve heard. Iím using an AA 4G SIM though, and so Iíve set a real IPv4 address assigned to it (AA / AQL / Three). That is supposed to be permanently routed to the SIM/dongle, not conditionally fallback-routed.

Also of course, my main LAN IPv4 address block (a routable static /26) plus an IPv6 /64 is also fallback-routed to that SIM.

During failover I handle IPv6 traffic by putting it through the 3G link using an AA 6in4 proto 41 tunnel. This is necessary unfortunately because the stupid AQL / Three service doesnít speak IPv6. I just canít understand why AA has not got this fixed, as it has been going on for years and years and itís a bit of an embarrassment, surely. The Firebrick is configured with the tunnel endpoint IPv4 address and somehow magically knows to use the tunnel when needed. (God only knows how.)
Title: Re: 3G failover problem
Post by: aesmith on February 11, 2019, 08:58:46 AM
Did the 3G IP address respond to ping during your failover testing?   I always prefer a fail over path to be live at all times, so you know it's ready for when it's needed, however I wonder whether your dongle only brings up the link when there's outbound traffic. 
Title: Re: 3G failover problem
Post by: Weaver on February 19, 2019, 06:39:43 PM
Agreed. I want the failover path to be live at all times just as you do, so that I can test it. I canít think of any reason why it would be dropped but maybe it is and it isnít documented. Grrr.