Kitz ADSL Broadband Information
adsl spacer  
Support this site
Home Broadband ISPs Tech Routers Wiki Forum
 
     
   Compare ISP   Rate your ISP
   Glossary   Glossary
 
Please login or register.

Login with username, password and session length
Advanced search  

News:

Pages: [1] 2

Author Topic: Goodness indicator from modems’ stats  (Read 2718 times)

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick
Goodness indicator from modems’ stats
« on: March 22, 2021, 09:44:38 PM »

I’m wondering if I can write a quick program that takes full Broadcom stats and derives a summary from it all that gives an "all is well" or "all is not well" indicator. I can already extract stats from all my modems and am thinking of applying regexes to tweeze out the numbers I need. So the program would either come out with a binary outcome or maybe failing that a percentage on a goodness scale.

Clearly I want to look at ES and SES counts, but I’m unsure about what number of events per unit time to use as a threshold value between well and unwell. Any suggestions? That or again a sliding scale value of some sort.

I should perhaps also check the current SNRMs too.
Logged

g3uiss

  • Kitizen
  • ****
  • Posts: 1151
  • You never too old to learn but soon I may be
    • Midas Solutions
Re: Goodness indicator from modems’ stats
« Reply #1 on: March 22, 2021, 09:46:57 PM »

Sounds a useful tool !
Logged
Cerebus FTTP 500/70 Draytec 2927 VOXI 4G fallback.

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick
Re: Goodness indicator from modems’ stats
« Reply #2 on: March 22, 2021, 10:02:17 PM »

Any ideas about what number (per unit time) of ES or SES is reasonable ? Zero ? Or a count of CRCs?
Logged

g3uiss

  • Kitizen
  • ****
  • Posts: 1151
  • You never too old to learn but soon I may be
    • Midas Solutions
Re: Goodness indicator from modems’ stats
« Reply #3 on: March 22, 2021, 10:28:14 PM »

CRC’S don’t always pride ES so ES a better measurement. For me ES over a figure per hour or say 3 hrs might relate to a danger of the DLM taking action. I appreciate there is now some debate over what that might be but on a “ speed” line it used to be 120/hr. maybe a caution at 60/hr ? SES should be really low I don’t see any very often here on either VDSL or ADSL. Of course DLSstats has options for alerting at various triggers, but not everyone runs that 24/7.
Logged
Cerebus FTTP 500/70 Draytec 2927 VOXI 4G fallback.

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick
Re: Goodness indicator from modems’ stats
« Reply #4 on: March 22, 2021, 11:35:25 PM »

So a threshold of 60 ES/hr might light up a severe warning light? What about a secondary warning if the count is non-zero?

I thought I might ignore the SES value as it implies there will be at least that many ES and it is the latter that I’m triggering on. Sound reasonable?

Do you think it’s worth watching for abnormal SNRM? Both too low and too high? It’s a nuisance but I would have to configure the system to tell it what the expected target SNRM is.

Since I have four modems to check and I have to either mess about with a web browser, or run an existing all-modems report tool, and then scan through masses of pages of irrelevant detail, I’m hoping this will be a useful timesaver.
Logged

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick
Re: Goodness indicator from modems’ stats
« Reply #5 on: March 23, 2021, 08:20:56 AM »

I’ve written part of the code. It collects the raw stats from all the modems, parses it crudely and pulls out ES up/down records, then converts these numbers into ES per hr so the ES rates are normalised. In my code, it refers to one of the, say, 15 min (or less) ES-counting periods, or up-to-24hr periods, as “buckets”.

Here’s some of the debugging output it produces:

Code: [Select]
FEC: 6 5
CRC: 0 0
ES: 0 0
SES: 0 0
UAS: 0 0
LOS: 0 0
LOF: 0 0
LOM: 0 0
modem=[1]; bucket=[Latest 15 minutes time]; dir=[down]; ES_count=[0]; bucket duration =[308]; ES/hr = [0]
modem=[1]; bucket=[Latest 15 minutes time]; dir=[up]; ES_count=[0]; bucket duration =[308]; ES/hr = [0]
FEC: 27 49
CRC: 0 0
ES: 0 0
SES: 0 0
UAS: 0 0
LOS: 0 0
LOF: 0 0
LOM: 0 0
modem=[1]; bucket=[Previous 15 minutes time]; dir=[down]; ES_count=[0]; bucket duration =[900]; ES/hr = [0]
modem=[1]; bucket=[Previous 15 minutes time]; dir=[up]; ES_count=[0]; bucket duration =[900]; ES/hr = [0]
FEC: 104 49
CRC: 0 0
ES: 0 0
SES: 0 0
UAS: 0 0
LOS: 0 0
LOF: 0 0
LOM: 0 0
modem=[2]; bucket=[Latest 15 minutes time]; dir=[down]; ES_count=[0]; bucket duration =[323]; ES/hr = [0]
modem=[2]; bucket=[Latest 15 minutes time]; dir=[up]; ES_count=[0]; bucket duration =[323]; ES/hr = [0]
FEC: 304 59
CRC: 0 0
ES: 0 0
SES: 0 0
UAS: 0 0
LOS: 0 0
LOF: 0 0
LOM: 0 0
modem=[2]; bucket=[Previous 15 minutes time]; dir=[down]; ES_count=[0]; bucket duration =[900]; ES/hr = [0]
modem=[2]; bucket=[Previous 15 minutes time]; dir=[up]; ES_count=[0]; bucket duration =[900]; ES/hr = [0]
FEC: 11 0
CRC: 0 0
ES: 0 0
SES: 0 0
UAS: 0 0
LOS: 0 0
LOF: 0 0
LOM: 0 0
modem=[3]; bucket=[Latest 15 minutes time]; dir=[down]; ES_count=[0]; bucket duration =[127]; ES/hr = [0]
modem=[3]; bucket=[Latest 15 minutes time]; dir=[up]; ES_count=[0]; bucket duration =[127]; ES/hr = [0]
FEC: 98 13
CRC: 0 0
ES: 0 0
SES: 0 0
UAS: 0 0
LOS: 0 0
LOF: 0 0
LOM: 0 0
modem=[3]; bucket=[Previous 15 minutes time]; dir=[down]; ES_count=[0]; bucket duration =[900]; ES/hr = [0]
modem=[3]; bucket=[Previous 15 minutes time]; dir=[up]; ES_count=[0]; bucket duration =[900]; ES/hr = [0]
FEC: 11358 80
CRC: 3 0
ES: 3 0
SES: 0 0
UAS: 0 0
LOS: 0 0
LOF: 0 0
LOM: 0 0
modem=[4]; bucket=[Latest 15 minutes time]; dir=[down]; ES_count=[3]; bucket duration =[305]; ES/hr = [35.40983606557377]
modem=[4]; bucket=[Latest 15 minutes time]; dir=[up]; ES_count=[0]; bucket duration =[305]; ES/hr = [0]
FEC: 31316 241
CRC: 11 0
ES: 11 0
SES: 0 0
UAS: 0 0
LOS: 0 0
LOF: 0 0
LOM: 0 0
modem=[4]; bucket=[Previous 15 minutes time]; dir=[down]; ES_count=[11]; bucket duration =[900]; ES/hr = [44]
modem=[4]; bucket=[Previous 15 minutes time]; dir=[up]; ES_count=[0]; bucket duration =[900]; ES/hr = [0]
--
{"modem":1,"bucket label in stats":"Latest 15 minutes time","dir":"down","ES per hr":0}
{"modem":1,"bucket label in stats":"Latest 15 minutes time","dir":"up","ES per hr":0}
{"modem":1,"bucket label in stats":"Previous 15 minutes time","dir":"down","ES per hr":0}
{"modem":1,"bucket label in stats":"Previous 15 minutes time","dir":"up","ES per hr":0}
{"modem":2,"bucket label in stats":"Latest 15 minutes time","dir":"down","ES per hr":0}
{"modem":2,"bucket label in stats":"Latest 15 minutes time","dir":"up","ES per hr":0}
{"modem":2,"bucket label in stats":"Previous 15 minutes time","dir":"down","ES per hr":0}
{"modem":2,"bucket label in stats":"Previous 15 minutes time","dir":"up","ES per hr":0}
{"modem":3,"bucket label in stats":"Latest 15 minutes time","dir":"down","ES per hr":0}
{"modem":3,"bucket label in stats":"Latest 15 minutes time","dir":"up","ES per hr":0}
{"modem":3,"bucket label in stats":"Previous 15 minutes time","dir":"down","ES per hr":0}
{"modem":3,"bucket label in stats":"Previous 15 minutes time","dir":"up","ES per hr":0}
{"modem":4,"bucket label in stats":"Latest 15 minutes time","dir":"down","ES per hr":35.40983606557377}
{"modem":4,"bucket label in stats":"Latest 15 minutes time","dir":"up","ES per hr":0}
{"modem":4,"bucket label in stats":"Previous 15 minutes time","dir":"down","ES per hr":44}
{"modem":4,"bucket label in stats":"Previous 15 minutes time","dir":"up","ES per hr":0}
--
Severely bad (>=60ES / hr):

--
Non-zero:
{"modem":4,"bucket label in stats":"Latest 15 minutes time","dir":"down","ES per hr":35.40983606557377}
{"modem":4,"bucket label in stats":"Previous 15 minutes time","dir":"down","ES per hr":44}

Currently the program looks only at the two most recent 15 min (max) duration buckets, and doesn’t inspect the longer eg. 24-hour ones. I think I should do something about this. I had initially thought that "most recent is most relevant" but now I’m having my doubts. What do you think?

It has already proven its worth because it has detected a problem with line 4; the downstream SNRM is down to ~1.4 dB which appears to be a bad thing. You can see the problem in the ES downstream for line 4.

I need to distill the output down further as well, producing an additional summary that clearly shows an “action vs no action” indicator to the user, plus individual warning light-type indicators per modem. I could do with some guidance on this. Anyone up for helping me out?
« Last Edit: March 23, 2021, 08:24:51 AM by Weaver »
Logged

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick
Re: Goodness indicator from modems’ stats
« Reply #6 on: March 23, 2021, 08:45:21 PM »

I’ve stripped it down to remove all the debugging output and this is what the output looks like now:

Code: [Select]
Severely bad (>=60ES / hr):
{modem: 3, Previous 15 minutes time, dir: up, ES per hr: 72}

--
Non-zero:
{modem: 1, Latest 15 minutes time, dir: down, ES per hr: 4}
{modem: 1, Previous 15 minutes time, dir: down, ES per hr: 32}
{modem: 2, Previous 15 minutes time, dir: down, ES per hr: 8}
{modem: 3, Latest 15 minutes time, dir: up, ES per hr: 45}
{modem: 3, Previous 15 minutes time, dir: down, ES per hr: 8}
{modem: 3, Previous 15 minutes time, dir: up, ES per hr: 72}
{modem: 4, Previous 15 minutes time, dir: down, ES per hr: 28}

I still have to write something to assess the overall state and assess the SNRMs down and upstream.
Logged

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick
Re: Goodness indicator from modems’ stats
« Reply #7 on: March 24, 2021, 10:23:05 PM »

Extended it and cleaned up the clutter in this v1.00 beta. Note that it has found a problem with line 3.

* Summary of DSL links’ wellbeing and error counts
---------------------------------------------------

*** There is some badness; all is not well ! ***

* Modems with severe error problems:  ≥ 60 ES / hr:  None

--
* Modems with SNRM too low/high:  None
(Assuming an expected target SNRM of 6 dB downstream, 6 dB upstream)

--
* Modems with a few errors:
modem: 3 downstream, ES per hr: 1.4, Latest 15 minutes time
modem: 3 upstream, ES per hr: 8.6, Latest 15 minutes time
modem: 3 downstream, ES per hr: 4.4, Previous 15 minutes time
modem: 3 upstream, ES per hr: 11.2, Previous 15 minutes time
Logged

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick
Re: Goodness indicator from modems’ stats
« Reply #8 on: March 25, 2021, 08:30:18 PM »

In that last example output, I had written the code to assess the SNRM figures as too low or too high. There are warnings if below 67% or above 150% of the target SNRM in each case for down or upstream. Do those percentages seem reasonable?

So if the target d/s SNRM is 6dB and the current SNR drops below 4 then you get a warning listed. And if the SNR goes up from 6 dB to 9 then you also get a warning.

For the low threshold, I just picked some number that seems reasonable from experience with 3dB d/s: if the level drops from 3dB down to below 2dB then that’s when I start to sometimes see CRC errors, hence the 67%. The low threshold must not be so high that it triggers in routine daily variation just because it’s nighttime unless there is some kind of abnormal variation going on due to a fault or interference. Does 67% sound reasonable for the daily variation thing? What’s the minimum daily downward variation you see that’s reasonable when all is normal? (Such as that in the droop during the nighttime.)

I’m also not so sure about the latter high threshold ratio of 1.5; I just picked a number out of thin air and I could do with some guidance. The way I wanted to look at it for the latter was: when is the SNR too high because of DLM having taken action? - because that is something that you do want to be alerted to. I might reduce this ratio to 1.2 because it occurs to me that going from normal 9dB to 12dB could be the next step up due to DLM, which is a ratio of 1.333 so would not trigger a warning, which is the wrong behaviour.

Any guidance as to how much higher than the target are SNRM values seen at certain times? Example : Such as after resync, or when you had a resync in the night and then later it’s daytime and conditions have improved so the SNR goes right up. I don’t want to report a warning on this latter type of normal variation.
« Last Edit: March 27, 2021, 02:46:20 AM by Weaver »
Logged

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick
Re: Goodness indicator from modems’ stats
« Reply #9 on: March 30, 2021, 06:15:49 AM »

I’m wondering if I should be interested in the "since link time" error count ? What do you think?

Interesting to be alerted to last serious problems that have gone away more recently and so don’t show up in the two most recent 15-mins-max collection buckets. I thought about using the "since link time" count rather than the 1 day counts because in the 1 day count there might be a short period of intense errors to do with a link dropping or being forcibly brought down, and I don’t want to include something that won’t be relevant later on as an error count was only because of something related to the link drop.
Logged

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick
Re: Goodness indicator from modems’ stats
« Reply #10 on: April 05, 2021, 11:40:20 PM »

At the moment I’m only looking at ES counts and not SES. See however Chrysalis’ opinion at https://forum.kitz.co.uk/index.php/topic,25752.msg433233.html#msg433233

Quote
To me the SES is what stands out. In my experience of years and years on DSL, my rule of thumb is ES on their own are not usually service affecting (unless they trigger DLM), but SES usually are, and yep you have SES on that upstream.

If you were getting a steady flow of ES but each ES was maybe just 1 or 2 CRC, you probably would have no red on your graph and wouldnt notice it, but looks like it’s coming in large bursts.

So should I really be considering SES too? I’m assuming that wherever there are SES there are also ES. (By definition??) But I have a ‘more severe’ category for an ES count above a certain threshold and I wondering if any non-zero SES counts should automatically generate a report in that same ‘more serious’ category?
Logged

kitz

  • Administrator
  • Senior Kitizen
  • *
  • Posts: 33879
  • Trinity: Most guys do.
    • http://www.kitz.co.uk
Re: Goodness indicator from modems’ stats
« Reply #11 on: April 07, 2021, 11:44:35 PM »

The main item that I am personally concerned about on my line is E/Secs.   
As to what is acceptable, then I'd use the figures applicable to your relevant DLM profile as a guideline and then take a bit more off for your own safety margin - See the coloured table DLM - categorising your line 
 
SES indicates a concern that the line is struggling with >30% packet loss.  It's not unusual for a line to drop out after a few consecutive SES.    By the time you get a warning about SES (unless its the odd one from burst noise) then the line may have already lost sync.

iirc you line has re-tx so you may want to monitor for  LEFTRS which is kind of the G.INP version of E/S.

Warnings about SNRM are useful.  What level you set it at depends upon how your line performs.   It's probably of interest if the line is consistantly swinging or if it changes by several dB and stays there - indicating that a manual resync may be of benefit.
Logged
Please do not PM me with queries for broadband help as I may not be able to respond.
-----
How to get your router line stats :: ADSL Exchange Checker

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick
Re: Goodness indicator from modems’ stats
« Reply #12 on: April 08, 2021, 12:47:35 AM »

Thanks for getting back to me, Lesley. I don’t have LEFTRS, presumably because I only have PhyR, not the full standard G.INP.

1. I wondered if I should count situations where there is an SES event, but no ES event. Is that even possible? I don’t count SES events now, so such an event would go unnoticed.

2. I also wondered if I should consider either the weighted totals, for some value of k (but I have no idea what), of either (ES + SES * k),  where k ≥ 0 and k ≠ 1 or where k ≥ 0
Logged

burakkucat

  • Respected
  • Senior Kitizen
  • *
  • Posts: 38300
  • Over the Rainbow Bridge
    • The ELRepo Project
Re: Goodness indicator from modems’ stats
« Reply #13 on: April 08, 2021, 04:10:53 PM »

1. I wondered if I should count situations where there is an SES event, but no ES event. Is that even possible?

I don't think that can be possible. If there is an SES event then there must also be ES.

Quote
2. I also wondered if I should consider either the weighted totals, for some value of k (but I have no idea what), of either (ES + SES * k),  where k ≥ 0 and k ≠ 1 or where k ≥ 0

Surely you are just suggesting where k != 1 . . . b*cat is a little confused.  ???
Logged
:cat:  100% Linux and, previously, Unix. Co-founder of the ELRepo Project.

Please consider making a donation to support the running of this site.

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick
Re: Goodness indicator from modems’ stats
« Reply #14 on: April 09, 2021, 02:24:48 AM »

> don't think that can be possible. If there is an SES event then there must also be ES.

Agreed. An SES event does not include access to anything that is other than that that is recorded in an ES.

the idea of some value of some constant k ≠ 1 might be used to prioritise an SES event as being more important than a mere ES, which is taken from a suggestion by Chrysalis; see Chrysalis’ opinion at https://forum.kitz.co.uk/index.php/topic,25752.msg433233.html#msg433233
Quote
To me the SES is what stands out. In my experience of years and years on DSL, my rule of thumb is ES on their own are not usually service affecting (unless they trigger DLM), but SES usually are, and yep you have SES on that upstream.

A weighted function something like w = badness / goodness = (ES + SES * k); where k > 1 and the event collection buckets where the value of w counted is in some range w > 0 and w < w1, then w ≥ w1 and w < w2 and finally w ≥ w2 which draws attention to the presence or absence of any w counts in those particular ranges.
« Last Edit: April 09, 2021, 02:39:01 AM by Weaver »
Logged
Pages: [1] 2