Topic: Algorithm design - SNRM slow decline (Read 1568 times)

Weaver · « **on:** September 23, 2022, 09:27:20 AM »

In many situations I have seen how SNRM, maybe especially downstream, goes down over a certain fairly long period, with increasing error rates, until either DLM kicks in and takes action because if the error rate, or the SNRM gets so low that the modem decides to give up, at a point depending on the modem.

Questions:

   1. what’s your experience of this problem?
   2. How long does it take to get down to the point of death - ie ridiculously low SNRM where it drops the link?
   3. Is upstream less vulnerable or is there no difference? (Although the target SNRM (ie initial) values may differ between up- and downstream.)

I’m wondering if I could look through the log file in my ZyXEL modem that is constantly written by the custom Mr Kitizen Johnson ZyXEL firmware that is installed and watch the downstream SNRM, or maybe up- and downstream, and try and detect any slow decline. I’m up for some guidance about the algorithm for this.

There’s a lot of data so it would perhaps not be good if I have to search through it all, and I’m wondering if I could get away with only looking at every nth sample. I’m assuming that I start from the latest time sample and go backwards in time, no? A couple of awkward things though. I only have 24 hrs worth of data, and that isn’t ideal.

I would like to find the start of the downward slide, but it might be before the start of the data-set. I don’t know what the target SNRM values up/down are because although I do have config file information about this, it’s not necessarily up to date, and my code has to work in all circumstances, or else say honestly that it can’t make a determination, but must never deliver bogus error reports. Maybe it’s not the end of the world if the start point of a decline is ‘off the screen to the left’, so to speak. I think levels aren’t that important and it’s all about diagonal average slopes, averaged out over a very wide range.

Another tricky thing is that I need to deal with day-night variation. Perhaps a wide-enough range averaging process will do the job. Alternatively, linear regression, fitting the best straight line using the least-squares method would do the job. I wrote this kind of code, but much more sophisticated, as one of my first mainframe programs 42 years ago, but I can’t remember how to do it now.

It absolutely has to be a cheap fast algorithm, as this code is being written in iPadOS Shortcuts which is horribly slow, but arithmetic is cheap as that will be done by machine code in the runtime library and the only thing that matters as regards slowness is the number of lines of Shortcuts’ code we run through (the number of so-called ‘actions’). So loop count is everything.

I think that in certain situations we will need to apply a simple averaging noise filter that smoothes out all the random variation. We can’t apply this everywhere for reasons that will become clear below, and also we don’t want to do runs of two different averaging processes over the same data range, if we can help it, as it’s a waste of CPU time.

One more tricky things is that the code has to detect discontinuities - rising step functions. Sharp vertical jumps in the SNRM of a height greater than some value to be worked out are to be counted as resynchs if they are not spikes, that is there is an upward jump and no immediately following downward jump. We define that as a spike and classify it as very bad noise. We shouldn’t be having too many spikes like that or else we need to do something about our noise filtering. Also we are looking for a rising step function, and so we need to check that before and after the step, the derivative of the heavily smoothed data with short-interval averaging applied in a patch before and after the step has a really low value, so the function is nearly flat before and after the discontinuity. My line 2 upstream has a lot of steps in every day. The algorithm won’t work on upstream for this line.The presence of falling steps tells us to forget it. The value has to drop so that it can jump up again.Why this happens has been a mystery for a long time. Something to do with a noise source being switched on or off, or perhaps more likely is that something bad in that line is a noise detector which somehow gets disturbed and flicks between something like enabled and disabled or sensitive/insensitive states.

We need to detect discontinuities first before we do any long-range smoothing or linear regression. In the first case we don’t want to try and smooth out a vertical jump, a step. And secondly we are looking at potentially two different diagonal (or not) lines, one for the situation before and then after the step, ie after the resync.

So I need to give some more thought to the order of operations as I have already mentioned disaster if I get that wrong. I think down-step detection might be a good idea to run first, to rule out a mad situation such as my line 2 upstream, as the presence of a nasty such as that would confuse the algorithm of the resync detector / rising step detector phase.

When I find a resync, I then have two situations: a before and an after period. Do I then split the whole range into two and start making up a list of ‘inter-resync situations/periods’ and ultimately have a loop looking at each one? Then I would have to report multiple judgement decisions, on each period in turn. I don’t think I want to go down this path; too much code and complexity and also splitting the data range into say two 12-hr periods, or an even worse case where they are unequal length and we can see that one period could easily be way to short to run the algorithms properly as they need a wide range to work well. So perhaps keep it simple and give up if there’s a resync. On only do the after period, and only then if the length of that range is above some minimum so that we have plenty of data available so that the algorithms can work well over a wide enough time range.

If I do find SNRM droop, I report the fact, providing it isn’t merely 24-hr cycling, which can be identified because the values go down but then rise again later, and also the period has to be exactly 24 hrs. Also very wide range smoothing might erase a 24-hr wavelength sine wave. (Are we sure we want to make it quite that powerful though?)

Is it right that some modems have SNRM droop over time because they have a problem with the operation of bitswap? Or am I just misremembering that or simply making something up? I seem to remember from when Kitz taught me about the ‘monitored tones’ hardware feature in modems that a cheap and nasty modem that doesn’t support monitored tones could be bad news with bitswap in unlucky circumstances. The noise spectrum shifts over time and the original bitloading chosen during training becomes no longer suitable for a new spectrum, so bitswap tries to do its thing, but if the new spectrum is far too weird/too different and if the noise is so bad on some tones that those tones are marked as ‘unusable’, then they get knocked out of consideration. Without the monitored tones feature, by which the modem keeps listening to those unusable tones to see if they ever recover, we have the situation where some tones have got knocked out and they will never get reinstated, while over time even more tones get knocked out, and it’s a one-way process because without the monitored tones ‘rehabilitation’ to add tones back into the available set, it’s a highway straight down to hell. If any of that is correct, that sounds like a perfect cause of long-term SNRM decline, caused by the combination of a cheapo modem’s lack of monitored tones plus a greatly varying noise spectrum.

Will a higher target SNRM help fix these problems? Presumably it depends on how far down the decline will ever go. If it stops at some lower level, then you carry on like that with a high error rate, which is not great, and nothing happens unless DLM chooses to kick in and forces a retrain. Regarding this monitored tones theory, I don’t see why the downward slide will ever stop, not unless the spectrum variation ceases. The cure in this latter case is to go out and buy a proper modem though, as I’m not certain that a higher target SNRM will save you. Does that sound right?

An understanding of the why behind a downward slide shapes the advice that my program will spit out when various different subtypes of error are detected. One think I would like to think about is whether or not it’s possible to be selective and either advise/not advise raising the target SNRM. Mind you if uncertain, I can always recommend a list of multiple ‘possibles’ to try.

My god, I cannot believe how long this post has become. Must be some kind of record, even for me.

If you made it this far, then many thanks. All advice and experience greatly welcomed.

burakkucat · « **Reply #1 on:** September 23, 2022, 05:14:55 PM »

I have read your post but do not think I can add anything sensible.

However I do have a random thought. Have you considered forcing a re-train for each modem once every 24 hours? You have four circuits, so each modem could be forced to re-train six hours after the previous modem's re-train.

Weaver · « **Reply #2 on:** September 23, 2022, 07:19:09 PM »

I did think about it, but it’s not a problem that I suffer from a lot. Some of my lines don’t exhibit this at all, and even aside from that, I think it’s a think that has been seen in phases from time to time. The problem is that I haven’t been keeping a look out for it nor organised records. I’m going to give your switch-off idea some more thought. Could switch some lines off around 5am roughly.

I’ve already solved one of the problems in my war and peace post. The logfile explicitly records a special marker for resynchs so that’s one job done for be.

kitz · « **Reply #3 on:** October 03, 2022, 02:59:36 PM »

I thought I answered this the other day. I started to, but obviously didnt finish, but think we've had similar convos in the past.

Short version - the original was more technical. Slow decline can be due to bitswap or increasing cross talk. Its normal that bitswap can cause a decline especially after a bout of errors as the modem marks tones unavailable for use.

DSLstats on a Raspberry Pi would log all the data for you in a nice graphical form. It does many of the things (and more) that you seem to require.

>> Will a higher target SNRM help fix these problems? <<

Yes to some extent in that you should get fewer errors, so less chance of tones becoming unavailable. But its at the price of a drop in sync speed. Bearing in mind that your sync speeds arent great, if it were me I'd rather not go down that route unless you were getting so many errors that the modem was frequently loosing sync. A line should be able to handle small dips in SNRM... its one of the joys of DSL :/ You probably wouldnt even notice the odd errors if you werent monitoring.

There's many years of dev work gone into DSLstats based on all possible scenarios, you can ask it to alert you if SNRM drops below a certain figure, even use it to power cycle. I dont run DSL stats on a Pi, I run it straight from windows, but there are plenty who do monitor BCM modems using a Pi. I cant see any problem monitoring several different lines as long as you rename accordingly. It will make sense of all the data in the Zyxel log files.

Have a look at some of DSLstats can already do. Without a doubt DSLstats is THE best when it comes to analysing all the data from your modem. The website only shows some of the things DSLstats can do, there is more if you explore. Most of the things you are requiring is based on the same requirements from adsl2 and then improved to encompass vdsl 10 years ago... and expanded again to cover things like G.Inp or vectoring etc.

http://dslstats.me.uk/

[Moderator edited: typo fixes.]

Weaver · « **Reply #4 on:** October 04, 2022, 03:30:41 AM »

@kitz re your ‘short version’, and we have indeed talked about this in the past, if your modem has the monitored tones feature that you told me about, is my understanding correct that monitored tones will save you from the gradual decline due to running out of available tones, which is a connected with bitswap? So you really, really want this on your feature list when going out modem shopping ? And if you don’t have monitored tones, is that why your SNRM degrades when you are using bitswap ? So should we turn bitswap off if we do not have monitored tones? (Or just get a grip and get a better modem.)

If never heard anyone in the forum being told to turn bitswap off. At one point Andrews and Arnold was shipping modems with bitswap turned off, and I just assumed that was a bug and moaned at them (politely), but unfortunately I never heard any feedback so never found out whether or not it was a conscious decision on their part.

Apologies for my shocking memory; it’s either part of the ME/CFS or due to the pain drugs, or both, but my memory is really horrible and very embarrassing.

My modems on the whole do not show this roughly linear decline. But I have seen it before with other modems, where it’s a permanent nuisance, going down and down until there’s a resync. One reason why I was thinking about writing a detector for it is to spot a change in such behaviour.

I need to explain a bit more about the Raspberry Pi. I’m not sure if DSLStats can run without a display/keyboard/pointing device? If at some point I do decide to port my existing code from the iPad, part of the reason for doing it the way I intend to is as a learning project as part of my exploration of the D programming languages. I could just write the whole thing in C, but using the greater power of the D programming language and its fairly extensive runtime libraries will be a lot easier for me anyway.

The main app to which I’ve devoted a lot of development time, is a wellness summary and overall status alert app for the iPad, which is now complete. It tells me in one status line, "everything all good with all modems / lines". Or not, with a minimal diagnosis. And the aim is always to give the minimum amount of information. I only have one window to consult for a single summary of the wellness of all modems combined. Mr Johnson’s code in the ZyXELs already give me access to all the stats and all the graphing in a web browser, and with a touchscreen friendly ui as well. And that’s where I go if I want to dig into history more.

kitz · « **Reply #5 on:** October 04, 2022, 09:54:09 AM »

I don't really look at stats these days. The degradation from bit-swap will be there on all lines to some extent if it has errors. Some modems handle bitswap better than others. For eg not that impressed with the Lantiq implementation of bitswap on the TPlink. TPlink did report and new f/w was issued, but it still leaked SNR

I wouldn't recommend turning off bitswap,there's other things that can occur during the bitswop process. eg the modem can increase gain, this increase the real SNR, which can make way for new tones to become available across the range. If you're lucky, DSLAM DSM (Dynamic Spectrum Management System) might remember this gain across a resync and boom, you get a nice little speed burst (gain > atten > SNR > bit load > higher sync). Watch your power levels and attenuation for signs of this.

Bitswap should be on. BTw & Openreach are now using DSM for spectral masks (bit load shaping). Its like a DLM for bitloading. In theory should be beneficial for longer lines against x-talk. I'm not 100% certain, but I think it may be the PSD mask that survives a resync after power gain.

>>> ME/CFS or due to the pain drugs, or both

Sympathise - increasingly struggling atm. Takes me ages to post and I can forget words I want to use. Its so frustrating. I know what I want to type but its like I hit a brick wall and a word just will not happen. Unfort the disease is affecting my speech and written text.

>> about writing a detector

I purposely keep reminding you about dslstats to save you having to write one.

>>> I’m not sure if DSLStats can run without a display/keyboard/pointing device?

It will run without. Obviously though you would need to set up and install so need input devices to configure and set up folders etc. Best asking those who use a Pi with dslstats what they do. Sorry about hit my limit for today, bad night and its taking an eon to type ... so signing off for now.

News:

Author Topic: Algorithm design - SNRM slow decline (Read 1568 times)

Weaver

Algorithm design - SNRM slow decline

burakkucat

Re: Algorithm design - SNRM slow decline

Weaver

Re: Algorithm design - SNRM slow decline

kitz

Re: Algorithm design - SNRM slow decline

Weaver

Re: Algorithm design - SNRM slow decline

kitz

Re: Algorithm design - SNRM slow decline