Kitz ADSL Broadband Information
adsl spacer  
Support this site
Home Broadband ISPs Tech Routers Wiki Forum
 
     
   Compare ISP   Rate your ISP
   Glossary   Glossary
 
Please login or register.

Login with username, password and session length
Advanced search  

News:

Author Topic: Problem with Zoom app: horrible inbound audio > silence > connection drops  (Read 1827 times)

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick

I’m using the iPadOS Zoom app in my internet-based language-learning classes. Zoom has been fine for weeks then this week twice the sound went really bad for periods of 15-30 mins, then silence (inbound), then connection dropped with an error message. Then repeat ad nauseam. Then later on the problem just goes away. I was the only student in my class who was having problems, and it was only audio.

Does anyone have experience with problems like this with Zoom ? Or advice about the iOS app, care and feeding of?

First time it happened was on Saturday afternoon, then again on Monday morning. I first assumed that it was a bandwidth hog on my LAN, so banned all other users while my classes are in progress. Since it occurred again after the start of the ban that rules out that theory. I can also see on the traffic pictures on A&A’s server that there’s no high level of other traffic coming in. Nor is there any DOS attack in progress hitting my firewall.

I have asked AA for some advice and a sanity check. We’re talking about doing traffic capture but there are lots of problems with this idea - can’t get it to happen on demand and interpretation of the results will be a nightmare for me without some help.

I have done every rule-out that I can think of. Measured real internet connection performance with speedtester.aa.net.uk which give 10.3 / 1.67 Mbps which is good. Looking at the AA server’s traffic pictures, Zoom doesn’t seem to be using all the bandwidth available to it: it only seems to be using about 5-6Mbps downstream approx, while upstream is difficult to read but doesn’t look unusual. So plenty of unused downstream bandwidth, so it appears from the pictures. There is also no packet loss between me and my ISP, AA.

Don’t understand why it’s sporadic, and why now when I haven’t had any problems from the start of January until last Saturday. And I don’t understand why the audio is failing but the video keeps going. People at the other end can hear me, I think, although there seems to be a long delay always. Like talking to the moon. And why starting part-way through a class? That’s why I thought about the evil bandwidth hog on my LAN, but such are now banned and I would see rogue users/software in the traffic pictures.

Have also asked the college in Central London citylit.ac.uk, who is running my classes for some advice, as they must have a lot of experience with Zoom and all there students are now on it due to COVID-19. They said they can’t see anything I’ve missed in my debugging and rule-out procedures taken thus far, and their senior internet / server-side gods are getting back to me.

If it was congestion further out on the internet then I wouldn’t be able to see that, but I would expect other users to be affected too, no?

Any opinions? Does anyone who is a Zoom app expert know anything about optimum app settings for robustness? (Is that a word?)
Logged

tubaman

  • Senior Kitizen
  • ******
  • Posts: 12507

It is a bit bizarre that just the sound is going but the video stream continues ok. Have you tried muting your own mic as it might make a difference. Also, are you running the latest version of Zoom as it's updated quite regularly?
I'm also wondering if you are maxing-out the upstream side of the connection - perhaps run that on audio only if not already doing so?
 :)
Logged
BT FTTC 55/10 Huawei Cab - Zyxel VMG8924-B10A

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick

Thanks, tubaman, some good points.

Yes, it is bizarre that the sound alone goes bad. It is even more bizarre that the problem disappears and reappears - why isn’t it permanent?

The point about the upstream, is a very astute one. yes, could well be indeed, but again why does it later work ok? It must be able to manage some how because it does so at a later time, and people can always hear me, I think.

> Have you tried muting your own mic
That’s a good tip. When the app disconnects and then reconnects, I have noticed that my mic always is initially muted. Perhaps that explains why the app temporarily works for a short while after a reconnection until the audio badness soon starts again. I’ll try that tip if/when it recurs, so thank you.

> perhaps run that on audio only

Ah, excellent tip. I’m not sure how to do that - I think I can work it out, one of the icons on the top line.

The why does it come and go thing remains as a problem that defeats all the worthy candidates for explanation.

Coming back to the maxed out upstream thing. On the AA servers’ traffic pictures, the red line representing upstream is not a straight high horizontal line as it would be when doing a flat out max upload, but even if not maxed out, the upload must be running close to the limits. I can’t buy any more upstream for any money and adding a fourth line didn’t seem to effectively increase measured TCP combined upstream total, at least not at some times. This is seemingly something to do with the behaviour of TCP, and I don’t understand what’s going on. Upstream of line 3 is also really slow, 20% slower than the other lines, so perhaps that has something to do with the TCP thing. I also don’t understand why at times the measured combined upstream total is indeed better than the three line combined total, and sometimes not.

Just to add further confusion, the current measured total TCP upstream reported by speedtest.aa.net.uk is for some reason the highest ever seen, at 1.67 Mbps. Normally it report something like 1.15 to 1.3. Don’t know why it goes up and down - this is with repeated measurements made in the dead if night so no other users generating competing traffic - carefully measured with lower results discarded and only the max taken.

Now since it has been working ok since the start of January why does it fail sometimes now when upstream is the highest ever seen? Seems the least likely time to fail. But then who knows why the speed tester does what it does? We don’t understand what’s going on with TCP and don’t understand why the results vary over a long period and I don’t mean short term statistical noise in a group of repeated back-to-back measurements.

So thank you to Tubaman for a very intelligent and helpful analysis. Still left with overriding questions that seem to have no answers.
Logged

parkdale

  • Reg Member
  • ***
  • Posts: 597

Latency springs to mind :-\ you tried to monitor to see if it occurs, Also is there any correlation to your current line problems and Zoom?

I could be talking complete B***ks, just my 5cents worth :fingers:
Logged
Vodafone FTTC ECI cab 40/10Mb connection / Fritz!box7590

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick

Parkdale makes another very good point. Latency here is never great but we need a change as once again we’re back to the ‘why has it worked before’ thing. I can see latency on the AA server traffic pictures. I didn’t notice anything funny but I should look again now you’ve made that point.

Picture for Monday, bad period is 10:30-11:00 maybe longer, certainly was good after 11:15 maybe a fair bit earlier. You can see the drops in the traffic at around 10:45 and 11:05 where Zoom had dropped the connection and then I reconnected a short while later. I may have been away from Zoom, doing some diagnostics during a short period there. Latency is the bright green, bright blue and very dark blue. Key on the right; may need to scroll. This is for line 3, the slowest line.



Note the flat-out download from Netflix at around 4:30 in the morning to see what maxed out traffic looks like.


This is Zoom session on Saturday, 14:00-16:25. The bad period is around 16:00 to 16:25 when things got so bad that I gave up. There is something weird going on in part of the session and especially after it has finished - see the dark red horizontal line for flat-out (or nearly so) upload going on for ages. Looks exactly like an iPad backup to the iCloud, from experience. There do seem to be other points where the upload traffic is higher though, so that flat line seemingly isn’t the max. I don’t understand why a backup would not be the max though.



Perhaps the iPad backup started during the Zoom session, but the timing doesn’t look right. Also if upstream was the problem, why was it inbound audio that failed, not outbound?

The pattern of upstream during the good part of the Zoom session after 14:00 looks all over the place. Downstream seems very low, btw.

* Need to find out how to turn off upstream video if need be, as a desperate workaround.
« Last Edit: March 03, 2021, 02:19:11 AM by Weaver »
Logged

tubaman

  • Senior Kitizen
  • ******
  • Posts: 12507

...

* Need to find out how to turn off upstream video if need be, as a desperate workaround.
I don't have an iPad so can't verify, but hopefully this assists - https://osxdaily.com/2020/12/08/how-turn-off-camera-microphone-zoom/
 :)
Logged
BT FTTC 55/10 Huawei Cab - Zyxel VMG8924-B10A

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick

I’ve found something! It’s ‘sporadic’ and inbound-related, so it would seem to be a good candidate. It’s about line three which I now see is not just slow, but sickly.

Here we go again :-)

Yet again, the performance of cwcc@a.3 is deteriorating slowly. The downstream speed is now down to ~75% of what it was. AA keeps fixing this problem temporarily, somehow, by magic, by means not understood, but the fix never lasts long term.

I have the detailed stats from the modem. I notice in the ‘Previous 1 day time’ section that there are some serious errors reported in the downstream. Possibility of a link with my sporadic Zoom problem: Today is Wednesday, so I don’t know if that period covered Monday morning when I had problems with Zoom inbound. Even if it’s not that period, then maybe Monday did have some error events anyway, and, speculating, if Zoom doesn’t use TCP then that would mean real data loss, without some retransmitting error-recovery transport protocol. I would think though that corrupted packets ES and SES might show up as packet loss in the AA clueless pictures, but who knows. Perhaps the PPP LCP echo they use don’t get corrupted because they’re too short? So nothing shows up in the graphs.

DLM has increased the downstream SNRM from 3dB to 6dB, which has slowed things down further but is essential.

I think it’s worth getting it fixed anyway, before it gets even worse.

Here are the detailed modem stats:



xdslctl: ADSL driver and PHY status
Status: Showtime
Last Retrain Reason:   8000
Last initialization procedure status:   0
Max:   Upstream rate = 318 Kbps, Downstream rate = 2452 Kbps
Bearer:   0, Upstream rate = 389 Kbps, Downstream rate = 2228 Kbps

Link Power State:   L0
Mode:         ADSL2 Annex A
TPS-TC:         ATM Mode(0x0)
Trellis:      U:ON /D:ON
Line Status:      No Defect
Training Status:   Showtime
      Down      Up
SNR (dB):    6.1       5.9
Attn(dB):    64.0       40.3
Pwr(dBm):    18.0       12.4

         ADSL2 framing
         Bearer 0
MSGc:      59      11
B:      13      5
M:      16      16
T:      5      8
R:      16      16
S:      3.1579      7.7241
L:      608      116
D:      1      4

         Counters
         Bearer 0
SF:      1304410      257878
SFErr:      0      0
RS:      26495837      2271738
RSCorr:      914      1847
RSUnCorr:   0      0

ReXmt:      1376      0
ReXmtCorr:   1376      0
ReXmtUnCorr:   0      0

         Bearer 0
HEC:      0      0
OCD:      0      0
LCD:      0      0
Total Cells:   110382590      19289897
Data Cells:   1982394      584159
Drop Cells:   0
Bit Errors:   0      0

ES:      18      0
SES:      10      0
UAS:      110      100
AS:      21010

         Bearer 0
INP:      28.00      2.00
INPRein:   0.00      0.00
delay:      8      8
PER:      16.03      16.41
OR:      32.42      8.28
AgR:      2250.94   396.16

Bitswap:   601/601      110/110

Total time = 3 days 5 hours 33 min 34 sec
FEC:      13462      5775
CRC:      448      0
ES:      18      0
SES:      10      0
UAS:      110      100
LOS:      1      0
LOF:      8      0
LOM:      0      0
Latest 15 minutes time = 3 min 34 sec
FEC:      7      0
CRC:      0      0
ES:      0      0
SES:      0      0
UAS:      0      0
LOS:      0      0
LOF:      0      0
LOM:      0      0
Previous 15 minutes time = 15 min 0 sec
FEC:      24      1293
CRC:      0      0
ES:      0      0
SES:      0      0
UAS:      0      0
LOS:      0      0
LOF:      0      0
LOM:      0      0
Latest 1 day time = 5 hours 33 min 34 sec
FEC:      888      1843
CRC:      0      0
ES:      0      0
SES:      0      0
UAS:      0      0
LOS:      0      0
LOF:      0      0
LOM:      0      0
Previous 1 day time = 24 hours 0 sec
FEC:      9937      1157
CRC:      444      0
ES:      14      0
SES:      10      0
UAS:      60      50
LOS:      1      0
LOF:      8      0
LOM:      0      0
Since Link time = 5 hours 50 min 9 sec
FEC:      914      1847
CRC:      0      0
ES:      0      0
SES:      0      0
UAS:      0      0
LOS:      0      0
LOF:      0      0
LOM:      0      0
NTR: mipsCntAtNtr=0 ncoCntAtNtr=0




« Last Edit: March 03, 2021, 02:49:24 PM by Weaver »
Logged

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick

In the light of this, I checked the other modems and found downstream CRCs and ES sporadically on all of them, so I have now increased the downstream target SNRM to 6dB on every one. That has cut my measured combined downstream speed as reported by speedtest.aa.net.uk by 10%, from 10.3 to 9.3 Mbps. Who cares; it’s all as naught if the network doesn’t work, and I’m not trying for the best mere numbers. If only using TCP always, then the risk of using 3dB SNRM is not critical, but there is the possible performance slowdown and cost impact of TCP retransmissions even then.

Much depends on whether or not Zoom uses TCP.
Logged

Alex Atkin UK

  • Addicted Kitizen
  • *****
  • Posts: 5260
    • Thinkbroadband Quality Monitors

AFAIK UDP is absolutely essential for any video conferencing software as you don't want the video/audio to slowly lag as it falls behind real-time due to re-transmits and buffering.  It needs to stay as close to real-time as possible.
Logged
Broadband: Zen Full Fibre 900 + Three 5G Routers: pfSense (Intel N100) + Huawei CPE Pro 2 H122-373 WiFi: Zyxel NWA210AX
Switches: Netgear MS510TXUP, Netgear MS510TXPP, Netgear GS110EMX My Broadband History & Ping Monitors

parkdale

  • Reg Member
  • ***
  • Posts: 597

I use Skype and Zoom, but the former nearly but not always sounds like the voice scrambler has been switched on :-X so having to drop connection and try again.
The other half uses Zoom for most of her drama group meets, seems to work quite well :fingers:
Logged
Vodafone FTTC ECI cab 40/10Mb connection / Fritz!box7590

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick

I agree with Alex, I would assume this kind of app uses something like UDP. The penny only dropped regarding this today though, and when I thought about this, I started down this train of investigation.

In January when things were all frozen, that’s when the incredibly high downstream sync rates, in fact record rates, were established by all the modems. The weather then warmed up and those sync rates were maybe part way towards becoming untenable. Line 3 has been deteriorating due to disease and who knows what has been going on with error patterns or time this last week - my logs don’t go back far enough.

In an email from AA, they said that they had been thinking this about line 3 and that was the reason they suggested to me that we disable it for a Zoom test. I heard them propose this, but they didn’t explain the reason, and I had other ideas about why this might be a good idea, to do with jitter or even packet reordering. So a lack of communication. In any event, I haven’t had the opportunity to do another test as I can’t just do one whenever I like.
Logged