Kitz Forum

Computers & Hardware => Networking => Topic started by: Chrysalis on December 28, 2021, 03:03:10 AM

Title: One for you guys to solve? Internet caused broadcast storm?
Post by: Chrysalis on December 28, 2021, 03:03:10 AM
So yesterday I discovered a odd problem, every 30 seconds I was getting packet loss.  At first I thought it was just the internet, but was my entire LAN.

In addition it coincided with a spike of traffic I seen on my main desktop, it was only a spike to around 700kB/sec but a spike noticeable over idle traffic.

I spent ages trying to get to the bottom of it, a few key things.

When I disconnected my second openwrt switch from my LAN, the packet loss stopped, although spikes were still going to my PC.
I also discovered disconnecting my firewall from the main switch, also stopped the packet loss but in addition also stopped the spikes.
I looked at my firewall (pfsense) and the culprit was upnp, something known to be a security risk, but it was enabled in the past as it was the only way I could get UNO to work online multiplayer.  As soon as I disabled upnp, the spikes stopped, and everything is fine again with all equipment connected.

So the conundrum I have been left with is how such a small spike of internet traffic can kill a gigabit LAN, can internet traffic cause a broadcast storm somehow?  I didnt inspect the traffic after I discovered it was upnp, so I have no information on what the payload was.  I briefly enabled it again today and the spikes have stopped, but have disabled it again given what happened.

Welcome any thoughts.
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Weaver on December 28, 2021, 05:36:18 AM
How did you see the traffic spikes ? That is, what tool did you use to see them.

It occurs to me that 700kB/s could easily be 1Gbps if a total amount of burst traffic is averaged out over a suitable time quantum; 1Gbps is transmitted for a short period of time and then ceased and then two different kinds of measurements are made at different time resolution accuracies.
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Reformed on December 28, 2021, 01:15:29 PM
You've a switching loop somewhere that caused a storm. UPnP among other things uses multicast and if your switch is either not smart or not configured properly you can have issues.

Assuming your switches are smart enable RSTP or STP on them, and check on your equipment to make sure nothing connected to both switches is bridging them.
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Chrysalis on December 28, 2021, 11:12:37 PM
How did you see the traffic spikes ? That is, what tool did you use to see them.

It occurs to me that 700kB/s could easily be 1Gbps if a total amount of burst traffic is averaged out over a suitable time quantum; 1Gbps is transmitted for a short period of time and then ceased and then two different kinds of measurements are made at different time resolution accuracies.

I have dumeter on my desktop, also confirmed in task manager. (although task manager lags).
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Chrysalis on December 28, 2021, 11:13:45 PM
You've a switching loop somewhere that caused a storm. UPnP among other things uses multicast and if your switch is either not smart or not configured properly you can have issues.

Assuming your switches are smart enable RSTP or STP on them, and check on your equipment to make sure nothing connected to both switches is bridging them.

Thats my thoughts also switch related.  Both switches are openwrt and support STP, but I read STP has a performance hit, should I ignore that and just turn it on?

There is nothing connected to both switches directly.  However my modem (in bridge mode) is connected to my firewall directly (which is connected to first switch) and also to the second switch, the second switch is a LAN connection for collecting stats (since I moved to pppoe the stat collecting via wan cable has broke), and the cable to my firewall is the WAN bridge.
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: sdawson35 on December 29, 2021, 06:29:52 PM
Spanning Tree wont cause that much of performance hit (especially if your network is small).

My home network is quite complex with multiple vlans, switches , router , firewall , multiple operating system drvices , iOT etc and all running with spanning tree enabled no issues.

I did however have an oddity quite a while back that saw big spikes ( I use netxms to monitor my network ) intermittently and randomly , turned out to be Window Updates , one of the settings was set to update from other windows devices (or some such) and every so often it poll my network for other windows devices and then try to do an update . Turned that setting off and no more issues .

Just for context I have a Cisco xdsl router , Cisco switches, TP Link Wifi access points, Palo Alto firewall , 8 iOT devices (security) and 4 servers 
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Alex Atkin UK on December 29, 2021, 08:32:58 PM
I did however have an oddity quite a while back that saw big spikes ( I use netxms to monitor my network ) intermittently and randomly , turned out to be Window Updates , one of the settings was set to update from other windows devices (or some such) and every so often it poll my network for other windows devices and then try to do an update . Turned that setting off and no more issues .

That's funny, as I always keep that option on and it NEVER ONCE updated from other machines on the network, though I have no idea if it actually looks or not.
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Chrysalis on December 29, 2021, 11:22:36 PM
Ok thank you, I will enable it on both switches and report back. :)
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Reformed on December 29, 2021, 11:48:42 PM
STP will consume notable CPU once as it builds a topology of the network. After that the resource consumption is minimal and periodic. A single path to each destination in the network is selected and other ports to the same destination are not used removing loops on the switches.
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Chrysalis on December 30, 2021, 05:38:28 AM
All seems good, copying to truenas 984mbit, copying from 977mbit.

The second switch only needed about two minutes for the CPU load to go back to idle, main switch about five minutes, I think because on that one was two bridges.
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Alex Atkin UK on December 30, 2021, 12:43:21 PM
And from the logs you can figure out which ports are bridging and fix it.
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Reformed on December 30, 2021, 10:43:34 PM
Only one device is attached to both switches to complete the loop. That device is a bridge that has to be connected to both as far as I have read so STP should keep it sane.
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Chrysalis on December 31, 2021, 04:49:06 AM
What has confused me a little, in openwrt the STP toggle is only present on interfaces that have a bridge configured, so e.g. I enabled it for both LAN and guest on my main switch/AP, but on the second switch I could only enable it for LAN as the guest is just the VLAN not also bridged to a AP.

I am hoping this is globally enabling STP rather than just on the internal openwrt bridges, ultimately if none of these internal bridges were configured there would be no means of turning STP on.
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Alex Atkin UK on January 01, 2022, 04:38:45 AM
Seeing as STP is a way to prevent network loops over switches/bridges, why would you need it on anything that isn't a switch/bridge?
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Chrysalis on January 02, 2022, 02:24:47 AM
Seeing as STP is a way to prevent network loops over switches/bridges, why would you need it on anything that isn't a switch/bridge?

My first LAN broadcast storm had a bridge not on the switch.

The bit that confused me is that you can still create a loop without these internal bridges. But potentially with no means of enabling STP.

I could only enable STP as the ethernet switch is bridged with wifi interfaces on 3 of the 4 interfaces. (on second switch there is no guest AP as I dont use wifi on that switch, so hence not been able to turn on STP on its guest LAN interface).

I have probably misunderstood the full way STP works, so I will just leave it turned on and be happy with that.
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: aesmith on January 02, 2022, 04:43:29 PM
Modern spanning tree algorithms are miles better than the original, really nothing to be worried about.  If available look for Rapid Spanning Tree, or Multiple Spanning Tree if you have lots of VLANs.  They still give the LAN a little thump if you do something gross like change the STP root, but nothing like the old 90 seconds or so it used to take.

Which reminds me, if you have a non-trivial LAN it's worth hard coding the STP root, defaults won't necessarily put it in a sensible place.
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Alex Atkin UK on January 02, 2022, 05:21:40 PM
Prior to STP existing you should never EVER create a loop and even with it unless you've deliberately done it for redundancy (if one link goes down it will switch to the other) then it shouldn't be happening and I'd want to know why and where its occurring.

Ethernet was designed to only have a single path to reach each MAC address on the network.
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Reformed on January 02, 2022, 11:07:02 PM
Chrysalis - I'm not sure about OpenWRT, sorry. There's only one place where loop could be happening as you mentioned, your call if you want to invest the time in finding the exact cause.

Which reminds me, if you have a non-trivial LAN it's worth hard coding the STP root, defaults won't necessarily put it in a sensible place.

Truth. If you don't set priority to ensure the switch you want becomes root whatever has a port with the lowest MAC address wins. If there are only a pair of switches involved not an issue.

Ethernet was designed to only have a single path to reach each MAC address on the network.

It was also designed originally to have all devices in a single broadcast domain sharing the same bus cable so no need to worry too much about the original design of Ethernet.
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Alex Atkin UK on January 03, 2022, 10:00:21 AM
It was also designed originally to have all devices in a single broadcast domain sharing the same bus cable so no need to worry too much about the original design of Ethernet.

I still say if there's a loop and you didn't deliberately create it, you've done something fundamentally wrong that may have other unforeseen consequences.

Remember a bridge on OpenWRT is software driven, a glitch in the configuration could cause unnecessary CPU overhead.  STP probably solves it, but if you don't know how it started in the first place, how can you be sure?
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Reformed on January 03, 2022, 12:47:45 PM
I would be sure by replacing the software switch with a £13.99 piece of hardware. ;)
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Alex Atkin UK on January 03, 2022, 10:52:31 PM
I would be sure by replacing the software switch with a £13.99 piece of hardware. ;)

Nah, my Torrent box has 8 ethernet ports and they're all bridged so it doubles as a switch with a 5Gbit USB uplink (3.6Gbit due to USB limitation).  Even my NAS has its 10Gbit NIC bridged to the on-board Gigabit ports so my Topaz AI Upscaler box plugs straight into the server.  I'd prefer a 16 port switch instead of 10 there, but that doesn't seem to be an option for multi-Gigabit switches right now and going full 10Gbit with PoE+ would cost a fortune.

All WiFi Access Points and Virtual Machines are inherently software switches.
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Reformed on January 04, 2022, 12:09:26 AM
Hypervisors certainly have virtual switches in them for good reason. Not sure if relevant to connecting networks unless using VNFs, which I do routinely. Very basic configuration is good for those.

Think for most folks a £20 8 port switch is probably a more viable option than a dodgy downloads box with 8 bridged Ethernet ports. For those needing VLANs, RSTP, etc, 8 ports for £40.

Software bridging that comes attached to other hardware is a slightly different thing from something running OpenWRT. WiFi access points depending how complex they are can function as routers and switches. The switches and APs I work with on the daily are pretty smart.

The switches https://www.arubanetworks.com/resource/aruba-cx-10000-with-pensando-at-a-glance/ and APs https://www.arubanetworks.com/products/wireless/access-points/ are very clever, and no need to worry about internal bridges in the AP equipment looping, it can't, it's an access node - it sends to RF or an Ethernet port. Loop creation possible only with work to create one and external intervention.

The switches are hardware switching planes with a bunch of intelligence on top. They will not loop on their internal bridges, STP prevents them contributing to external loops.

My own NAS has 4 ports. 3 of them are in an Etherchannel, one is in a DMZ, both routed ports, no bridging required.

The three go to a switch, the DMZ goes to a router. Will build a bridge on there, on the switching hardware, for port density.

This is just how I work. I try and keep as much as I can modular and don't use excess ports on other kit for network functions beyond VNF.

Current switches in use at home are 3 or 4 basic 8 port smart switches, 3 4 SFP+, 1G, with 10G out of each and those serving 8 port GE a 10G port in, a 2 Gb Etherchannel to the switch. 3 core 8 SFP+, 1G ports aggregating the above resiliently and linked themselves in a ring, and a 2 SFP+, 24 GE switch that actually does quite a bit of work and is also in the core ring.

To get to my router from the modem is a 3 switch journey. Router to WiFi AP 3 switches then across an Etherchannel to the AP.

To get to the second AP is a mere one switch journey.

Currently 7 VLANs on there with more to come.

Produces a somewhat convoluted path but works well. I like to keep things as modular as possible. Just my personal preference and despite high device count makes it simple to isolate any routing or switching loop.

APs are on edge ports, only a single Ethernet port active on each. All switches in my snowflake run RSTP if dual uplinked.

Maybe my methods are out of touch. @aesmith?
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Alex Atkin UK on January 04, 2022, 06:00:15 AM
I'm not suggesting its something you should routinely do where a hardware switch is an easier option.

The reason I use bridging so much is, why have another device plugged in when you don't have to?  Plus a software bridge is smarter than an unmanaged switch.

I have no spare outlets nor space to put a second switch where the NAS is and as I mentioned, upgrading from a 10 port multi-gig switch is not even an option, as far as I can tell they don't exist so you have to go up to 10Gbit and with PoE+ functionality were talking thousands of pounds and likely a lot more noisy than my existing one.  Power could be handled by a PoE powered switch, but actual physical space is a tricker one.

The torrent box also has no spare outlet, it has my Mac Mini and Litebeam plugged into it.  Though I do plan to run the Litebeam to the main switch as I bought a PoE converter for it.  The downsides to software bridging are basically zero, when that box is always on and never has its CPU maxed out.

As for OpenWRT, there's the Home Hub 5A where the LAN is bridged to a VLAN so its a single-cable solution.

The pfSense box also operates a bridge, I have both a bonded LAN connection and my VoIP box plugged into it.  That was a little nerve racking seeing as you lose access to the GUI if you do it wrong and I'm not familiar with FreeBSD nor the awful single-file XML configuration of pfSense, in order to fix it, but got there in the end.
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Chrysalis on January 07, 2022, 03:45:15 PM
I still say if there's a loop and you didn't deliberately create it, you've done something fundamentally wrong that may have other unforeseen consequences.

Remember a bridge on OpenWRT is software driven, a glitch in the configuration could cause unnecessary CPU overhead.  STP probably solves it, but if you don't know how it started in the first place, how can you be sure?

It was if I remember right caused on my proxmox (this was a while ago and only temporary, not related to the internet sourced storm mentioned in this thread), it has two cables connected to my LAN, usually one is disabled however, on one occasion I actually enabled both, and it looped my network.

The misunderstanding is on my part I believed STP to make a LAN immune to loops that could exist anywhere on the network rather than just the bridge that STP is running on.

Not that big a deal, I turned on STP and have moved on to other things now.
Title: Re: One for you guys to solve? Internet caused broadcast storm?
Post by: Reformed on January 07, 2022, 10:32:57 PM
STP is a per-device thing. All bridges running STP have a chat, elect a root bridge and calculate a single path to that root bridge, not sending any traffic to any others.

To have a loop free network via STP needs every switch to run STP and there to be no other devices bridging other than those switches running STP.

I don't really need it on every bridge as mine is a spine-leaf with a ring in the middle but it's there for test purposes, as in I may do things that'll cause transitory loops and I'd rather avoid that as it's a bad use of my time.