It may be that attempting to recover from very slowly harvested data is actually the problem, a better solution possibly being to ditch the slowly obtained data & only work with quickly obtained data.
I presume you mean the exe version, I could give it a go and see what happens.
I was really suggesting that maybe the scripts could be edited to stop trying to allow for samples taking more than 1 minute to harvest, or with "too many" attempts.
just a snippet from last night, in the order of being appended to modem_stats.log:-
07/10/2012 20:12
07/10/2012 20:13
07/10/2012 20:14
So far so good......
07/10/2012 20:18
07/10/2012 20:18
07/10/2012 20:20
07/10/2012 20:16
07/10/2012 20:19
07/10/2012 20:22
07/10/2012 20:24
07/10/2012 20:21
07/10/2012 20:26
07/10/2012 20:23
07/10/2012 20:25
07/10/2012 20:31
07/10/2012 20:33
07/10/2012 20:34
07/10/2012 20:30
07/10/2012 20:28
07/10/2012 20:27
07/10/2012 20:15
07/10/2012 20:29 & then back to looking correct again..............
07/10/2012 20:35
07/10/2012 20:36
07/10/2012 20:37
As the great, late Eric Morecambe might have said, we have all the right times—but not necessarily in the right order.
TestStats2 runs at 30 seconds past every hour. Getstats is taking about 18 seconds to complete, Teststats2 takes about 14 seconds from start until it starts plotting the graphs
The EXE versions take only a fraction of those times & seem able to cope somewhat better with the PC slowing down due to "other" things running.
First occurrence happened on the 23 September, then 4 October, then last night 7 October.
I can't recall now, does that tie in with using the new getstats.BAT that harvests more data?
Just ran a virus scan, just before 20:00, which slowed things a lot, multiple CMD processes etc but once the scan completed things caught up. Looking at the modem stats log there were two entries written out of order 20:01 & 19:59, with 20:00 being completely missed. Also the current stats folder had various TXT files left in it.
That suggests that either various processes have completely hung, or that the Teststats2.BAT script had terminated abruptly, before deleting its temporary files.
Both getstats,BAT & Teststats2.BAT use the sleep.exe program to ensure suitable pauses in the data harvesting.
Part of getstats.BAT's error correction is to kill "hung" processes, sleep.exe being one of them.
Maybe killing sleep.exe from getsts.BAT to fix errors actually causes Teststats2.BAT to throw a wobbler & quit too early & a circle of increasing errors commences.
Looking at the modem stats log for last night, things went out of order at 20:14, so it is quite possible that something is slowing the process down until the modem gives up and keels over. It's similar for the previous occasions.
I was considering going completely back to basics, flash the standard firmware, even use the Plusnet supplied router and see if the problem re-occurred, if not then re-introduce one thing at a time, but after reading the above I think I'll just disable the stats logging for the moment and see what happens, or if you was hinting at trying the EXE version I'd be willing to give it a go.
As to what's potentially slowing down my server, I have no idea and not really sure where to start looking - nothing obvious to me in event viewer.
For now I've disabled both scripts, as it seems to go wrong when TestStats2 isn't running.
I would personally try running with just getstats.BAT running at its 1 minute schedule, which
might confirm that running both scripts together causes the issue.
It should be noted that the larger ES.TXT, Error.LOG & modem_stats.log become, the slower the scripts run anyway.
This is apparently a known issue when using large text files in that the whole file has to be read just to discover where the last line starts. This is an even slower process when controlled by batch script files.
Maybe splitting the logs into smaller chunks or starting with blank logs again would resolve matters.
At my end, the EXE version of getstats.BAT is run every minute, appending even more data than the script version to modem_stats.log which is currently over 61000KB in size.
Error.LOG is over 48000KB in size (full of debugging data).
Once completed, modem_stats.log is then copied from where the EXE version updates it into the original Ongoing_Stats folder.
This all usually takes less than 2 seconds from start to finish.
When my virus checker (AVG) runs, it can take up to 40 seconds or so.
This morning, just one sample was missed - at 03:08 - there is currently no attempt made to try again whenever a sample is missed.
The EXE version isn't really ready for public release yet & it may be quite some time before it is ready, but it may be worth a try for testing purposes.
I'll have a think about how best to use it temporarily on your setup.
It is currently started via Task Scheduler running a batch file script as a temporary measure, but the single EXE program does all the work.
I do still think the issue may be really caused by your server running slowly at times, possibly due to "other" process running at the same time, so maybe a much quicker harvesting process will overcome that limitation?