OK, the results of some (late-at-night
) testing .... all times are for my ageing server.
1) DSLstats v3.94 nothing 'unusual' of note in the log (thank you Eric)
01 Sep 2013 16:57:47 Average errors saved as 'errors43.txt' in snapshot directory
01 Sep 2013 16:57:47 Average error data reset
01 Sep 2013 16:58:02 Recording started
01 Sep 2013 16:58:29 IP address is now ....
02 Sep 2013 04:58:25 Auto snapshots taken
2) Last night I obtained BE's latest debugging modules, and decided to swap them in after the normal 23:58 Ongoing stats report for the day.
3) Stopped all modem_stats's logging using the settings editor at about 00:15
4) at ~00:30 ran current stats manually, with (it turns out) just the right timing to discover something ....
But before some results: here are some timings:
1) stats.exe (BE's harvesting) starts on the minute and takes ~3secs on my server
2) DSLstats starts sampling at ~10s past the minute and takes about 13-14 secs. Please note Eric that AFAICS (and I've tried running the new DSLstats a number of times now) it
always seems to run at 10s past, no matter
when you start the programme (e.g. even at say 30-45secs past)
3) Current stats (which fortunately did run sucessfully this morning at 06:00) is started after the normal harvesting has completed (at about 3 secs past, but then with a further 10s delay, so ~13secs past in reality) and runs until about 31 secs past, i.e. across the 10s-24s past slot when DSLstats sampling occurs, and on this occasion right at the start of the modem sampling period of DSLstats.
So, some results:
********************************************************************************************
02/09/2013 0:30:04.12 - In [HG612_current_stats.exe] - At the start of the main() function
02/09/2013 0:30:15.46 - **** [HG612_current_stats.exe] - reply(display login) O.K. Status = 0.
02/09/2013 0:30:15.68 - **** [HG612_current_stats.exe] - get_login_data() OK! Status = 0.
=============================
User Name : admin
Login IP : 192.168.1.64
Login Time : 2013-09-02 00:30:14
Login TimeLen :
=============================
User Name : admin
Login IP : 192.168.1.64
Login Time : 2013-09-02 00:30:11
Login TimeLen : 3(Secs)
i.e. I started current stats at around 4 seconds past the minute & it was 14 seconds past when it logged in to the modem.
(Login Time : 2013-09-02 00:30:14)
(There is a configurable (default 10s) delay in BE's module to try to avoid a clash with DSLstats when it previously ran on the minute,
and in case a previous instance of harvesting has yet to complete due to CPU load, for example)
But we can also see that DSLStats had logged in at 11 seconds past the minute:-
(Login Time : 2013-09-02 00:30:11)
(and as I have recorded above runs until ~24secs past, but not all of that time is logged into the modem, we know)
The net effect was to cause current stats to fail - apparently because it did not get the expected reply to one of it's xdslcmds (perhaps similar to the occasional failure to get tone data that happens to DSLstats, I don't know?)
**************************************************
* *
* Gathering Snapshot Statistics from the modem *
* *
**************************************************
The local time is: 00:30:04.138
There are 0 instances of HG612_stats.exe running. Please wait for it to complete
HG612_stats.exe is NOT running
HG612_stats.exe is NOT running
Temp File CURRENT-ISRUNNING-003004-138 was created
Creating log file Plink_20130902-0030.log
Delaying snapshot data harvesting by 10 seconds to avoid clashes with any other
scheduled data harvesting tasks
# 1 of 16
## 2 of 16
### 3 of 16
In reply(admin)
In reply(admin)
In reply(display login)
rxbuf = User Name : admin
Login IP : 192.168.1.64
Login Time : 2013-09-02 00:30:14
Login TimeLen :
=============================
User Name : admin
Login IP : 192.168.1.64
Login Time : 2013-09-02 00:30:11
Login TimeLen : 3(Secs)
In reply(sh)
#### 4 of 16
In reply(xdslcmd info --stats)
##### 5 of 16
###### 6 of 16
In reply(xdslcmd info --pbParams)
####### 7 of 16
######## 8 of 16
In reply(xdslcmd info --Bits)
######### 9 of 16
########## 10 of 16
In reply(xdslcmd info --linediag)
########### 11 of 16
And that's the point at which it 'stuck'. Had it been running automatically with these same (relative) timings, then it's likely things would have stacked up as I saw yesterday, and Kitz has seen frequently.
These are just the facts. The relevant logs have been sent to BE for further analysis, but in the meantime perhaps Eric might want consider a) why his synch delay doesn't seem to be as he expects b) if it needs to be much longer than that e.g. say 35secs or c) if he could have detected that e.g. current stats was already logged in when his sampling started, and delayed until it finished. At worst (c) might lose one sample every so often, but might be easier to solve than these timing issues. I don't know?
Similarly perhaps I need to reconsider the default (configurable) delay that Paul's programme uses to avoid clashes, because since they are
both in effect delaying 10s past the minute, they end up running at ~ the same time again!!!