A new SIM arrived. My wife having now understood the size issue fitted a large-pop out SIM into the removable SIM-carrier tray of the dongle and it filled it and fitted correctly. Straight away the Firebrick showed a PPP connection.
A lot of faffing about ensued.
1. I had to change a few things in clueless, having not got the routing—of-IPs-to-links tickboxes set correctly.
2. I also realised that I needed to make sure that I handed the dongle one IPv4 address for its local endpoint all the time to make it feel happy during ppp setup. This needed to be a full time thing, it was not just fired up when it was time for failover, it gets fired up when the Firebrick boots. This slows down the Firebrick’s lightning fast boot time by a few seconds.
There were also various things that turned out to be absolutely essential in the Firebrick XML config examples, and you can’t tell from the documentation what is essential and concerned with dongles and what is just accidental stuff which is a random part of the scenario depicted. Why some of these things are essential is a total mystery. I would like to fix the documentation if I knew what was going on.
4. XML config needed nat="false" on the dongle element, which is a non-default, otherwise, err, it does nat and mucks up all the source IPv4 addresses of your clients on your LAN. Well, who knew. Luckily woke up to things and fixed it, as of course it still worked either way, but it introduced a subtle bug in that my identity was wrong and wouldn’t match firewall ‘allow’ rules since it had altered my own iPad’s (say) IPv4 source address.
5. MTU of the dongle is only 1440 ! This is insane. It is advertised as 1500. Not happy. I noticed the same thing with the AA 3G SIM in my iPad itself. I ought to check Janet’s iPad’s AA 3G SIM as this is much older, in case this is something to do with a new batch.
To get IPv6 working unfortunately you have to use 6in4 proto 41 tunnelling which the Firebrick and AA’s routers can take care of. A mysterious command somehow activated the tunnel only when it is needed, during failover. A <route ip="::/0" gateway="81.187.81.6" /> is the magic incantation, but how does it know to involve the dongle? And how does it know to only do the tunnelling thing during the normal, non-failover state?
6. It turns out to be essential to reduce the main MTU to make tunnelled IPv6 work. Thinking about it, it’s not too surprising, since there is no option to fragment things at an intermediate router in IPv6, so just letting the Brick fragment IPv6 after failover is impossible. So this needed to be fixed. Since the 3G MTU is ridiculously low for some reason, I needed to pick a main LAN permanently reduced MTU of 3G_MTU - 20 (20 bytes for the proto 41 IPv4 header) = 1440 - 20 = 1420 or less. I picked 1408, which is less than 1420, since this = n * 48 - 32 and 32 bytes is the DSL PPPoEoA overhead so 1408 + 32 = a multiple of n 48-byte ATM cells, so when using DSL PPPoEoA as normal it has maximum ATM efficiency]
It is a real shame to have to be on a reduced MTU all the time, when I have been enjoying MTU 1500 for everything. Ironic. This is the cost of the stupid AA 3G / 4G not supporting IPv6 so that you need the header overhead of tunnelling. If they would just fix this, then this nonsense would go away. Years have gone by and no news about sorting it out.
7. For some reason I found another magic incantation to be essential. (I think. Need to retest to be certain.) The main lan subnet needed <subnet ra-dns="2001:8b0::2020 2001:8b0::2021" /> for reasons not understood. I don’t know where it was getting its IPv6 DNS servers from before, presumably from PPP. You can not specify IPv4 addresses here, it won’t let you. The documentation suggests that this is the value that the Firebrick sends out in RAs. I hope this is not true, as the Firebrick, I believed, was itself a local DNS relay caching server, so it should surely be advertising itself to clients, not telling them to go straight to AA. That would defeat the whole point of the real cache server, losing the performance gains due to caching of dns results being shared by the clients on the lan. Maybe it is just a confusing name for the attribute. Clients would always be better off using IPv4 for DNS anyway as it’s slightly faster and in this case especially you do not want any clients to be using DNS over IPv6 because of the 6in4 tunnel overhead. So I’m totally confused. If you didn’t need this before why do you need it now? The only thing that I can think of is that the dongle does not obtain IPv6 DNS server addresses by PPP at all, because it doesn’t not speak IPv6CP, which is a bit surprising, but maybe the 3G carrier gets in the way and the clients when using the dongle are not initially speaking PPP direct to AA. But also the Firebrick would have to not give out the IPv4 DNS server addresses to IPv6 users - but maybe that’s a basic protocol limitation. Maybe it starts to make sense after all. Hoping the attribute is wrongly named, maybe you have to use this technique whenever you do not have IPv6CP. I just hope the Firebrick does advertise itself still even when this is in use. I could perhaps test this by somehow checking what DNS values the iPads hear. Not sure how.
Anyway, in the end, absolutely everything worked. Even IPv6. The speed of 3G was awful, only 2.5 Mbps downstream, 0.25 Mbps upstream from the AA speedtester. This is perhaps due to the position of the dongle, stuck in the Firebrick on a shelf at the side of the office. I am hoping it will improve a lot if I move the dongle to the window where it has a direct line of sight to the base station. I have a USB extension cable on order so that I will be able to do this, and the Firebrick can stay where it is. My iPad gets 8Mbps downstream 3.5 Mbps upstream even in a very bad position at the back of the bedroom when I am in bed and it occasionally picks up 4G. The dongle that I have does not speak 4G. There are 4G-capable dongles that sort-of work with the Firebrick, but they force NAT on you which is pretty vile, so they won’t do a transparent seamless switchover maintaining existing connections unmolested as source IPv4 addresses would change. It might be possible to fix all that nonsense by using an L2TP tunnel for everything, IPv4 and IPv6, to hide the NAT nonsense and restore full routed-block capability. I don’t know how to set the Firebrick up for that and I don’t know if anyone else has successfully done this - there is no example with detailed setup for this in the docs. It’s ridiculous that 4G dongles are such a pain and so different to 3G ones - why do we even need to know or care about this? We just want a modem and that’s it.