I recently found pretty decent improvements to my QoS, but it was a long time to get to where I am.
I found basic fq-codel to be pretty much as good as it gets for improving my egress during busy periods but it wasnt so good for ingress, given I do lots of downloading, watching streams etc. and not that much uploading (maybe sending emails, and uploading data to cloud drive on occasion). I decided to go back to a optimal setup for ingress again
For that I found nothing better than HFSC which I have implemented via ALTQ on pfsense.
Now the original setup I had a while back was to set the overall size of the shared queue about 5% lower than my max possible throughput, but then not leave much buffer from that to the child queues. I also didnt take advantage of the service curve feature where you can have the traffic run at a temporary rate limit before it goes to a ongoing rate limit.
Take a look at this pic.
https://imgur.com/a/Dm3YDmh (also attached to post)
What I did was set for the first 10ms a much lower rate limit, which makes the slow start and congestion control behave different, so its less aggressive, but you can see ultimately the throughput is still almost at full speed, it just has some breathing space as shown in the dumeter graph which makes all the difference in QoS. That graph was a 32 threaded steam download as well. Also the limit of the parent queue is much closer to my max throughput (98%), and the 3% I saved I reduced the child queues so there was more space for higher priority queues like icmp.
Now the other issue, is when you dealing with many download threads in applications like steam, you ideally need a ton of extra overhead for better shaping, so I borrowed an idea from Carl which was to route my steam traffic through an external network, and shape it on that device so before it reaches my network, its already "tamed" traffic. I dont know if the above service curve trick would have removed the need for that taming, but that result is with both things combined.
Obviously when you doing things like this you need an isp which provides consistent performance e.g. shaping at say 90% of your line rate wont do squat if congestion puts attainable throughput at say 60%.