Kitz Forum

Chat => Chit Chat => Topic started by: Weaver on August 14, 2022, 07:35:34 PM

Title: Regex speed-up
Post by: Weaver on August 14, 2022, 07:35:34 PM: I have the following data file, courtesy of mr johnson’s custom code in my ZyXEL modems. It is the SNR per tone data:
Code: [Select]
xdslctl: ADSL driver and PHY status Status: Showtime Last Retrain Reason: 8000 Last initialization procedure status: 0 Max: Upstream rate = 286 Kbps, Downstream rate = 3064 Kbps Bearer: 0, Upstream rate = 352 Kbps, Downstream rate = 2676 Kbps Tone number SNR 0 0.0000 1 0.0000 2 0.0000 3 0.0000 4 0.0000 5 0.0000 6 0.0000 7 31.0000 8 34.5000 9 36.5000 10 35.5000 11 33.5000 12 32.5000 13 31.5000 14 29.5000 15 28.5000 16 27.5000 17 26.5000 18 24.5000 19 24.0000 20 23.5000 21 21.5000 22 21.0000 23 19.5000 24 20.0000 25 18.5000 26 17.0000 27 18.0000 28 15.5000 29 14.5000 30 15.0000 31 14.5000 32 0.0000 33 29.1875 34 31.5625 35 33.1875 36 34.0000 37 34.8750 38 35.9375 39 36.6250 40 37.4375 41 37.9375 42 38.6250 43 39.3125 44 39.6875 45 39.4375 46 33.9375 47 39.0000 48 38.5625 49 37.5625 50 37.0000 51 37.0000 52 36.8125 53 36.8750 54 37.0000 55 37.1875 56 37.3750 57 37.5000 58 36.5000 59 36.5625 60 37.2500 61 37.6875 62 37.5625 63 37.5625 64 37.0625 65 37.3750 66 37.1250 67 35.4375 68 36.7500 69 36.3750 70 36.0625 71 35.8125 72 35.3125 73 35.0000 74 34.3750 75 34.3125 76 34.0625 77 33.7500 78 33.4375 79 32.8750 80 32.6875 81 32.1250 82 31.5625 83 31.5000 84 31.0000 85 30.5625 86 30.1250 87 29.6875 88 29.4375
The file is truncated because the rest isn’t relevant to what I’m doing.

Let us call the first field (ASCII decimal number) on a line x and the second y. I’m searching for a given x and then I return the associated y. I use the following regex to do it:
replace( /^\X*\n[ \t]*<x>[ \t]+([0-9.]+)\X+$/, "$1" ); - where "<x>" is to be replaced by the literal ascii decimal search x value, without the < >.

* My question: do you think I can speed this up by chopping off the first part of the file up to line 35 ?

This is possible because the lowest x-valued query I ever make is around x=40 and the search x is certainly always greater than 35. That’s safe.

I certainly can test the speed myself. I’m writing this in iOS Shortcuts and would need to make two enclosing comparison loops with a large loop-count that search the original vs chopped-off data. But I wanted to hear your opinions before I waste some time.
Title: Re: Regex speed-up
Post by: burakkucat on August 14, 2022, 10:28:26 PM: In your example, above, sub-carriers (tones) 7 to 31 (inclusive) are the US band and sub-carriers (tones) 33 upwards are the DS band of your ADSL2 circuit. Why would you not want to consider the entire range?
Title: Re: Regex speed-up
Post by: Weaver on August 15, 2022, 02:00:34 AM: Because this is only used in HCD detection and the upstream is not considered. The lowest downstream tone ever searched for is something around tone 40, and the highest so far is 88. I am not sure whether or not I’d get a very minor speed increase from chopping off the highest tones, say above 100 (a bit higher than 88 just to be safe in case of future changes). Since the highest part is not searched only discarded, I’m not sure there’s much benefit to be gained.

Coming back to the lowest tones, I’m assuming the time cost of searching for a line with tone=x=nn is the cost of matching, searching for newlines, then skipping any white space and checking for a digit, then matching the decimal number x value. How quickly you can scan forwards and reject non-matching lines determines the performance, so I’m thinking that chopping of the first part decreases the number of times around the loop. Even with this initial truncation, one would still have to go round say ( 88 - 35 ) times to find line tone=x=88 if 35 is the first line, that compares to 88 + n_initial_non_record_lines = 88 + 8.
Title: Re: Regex speed-up
Post by: burakkucat on August 15, 2022, 04:17:25 PM: I now see your reasoning. Thank you.

33 to 100 or 35 to 88? Your choice. :)