Did some quick testing.
Initial results are a very nice improvement.
So I can do basic fq_codel no issue on direct shaping, but actual classification and weighting of packets needs a high margin on the pipe size vs the connection throughput limit.
I then set a basic limiter on the tun0 device on my VPN to rate limit the speed about 5mbit/sec below my line rate and set my pipe to 2.5mbit/sec below my line rate on my pfsense unit, so 2.5mbit buffer on WAN, plus further 2.5mbit buffer to VPN limit, routed the steam traffic to lower weighted codel flow and the following works pretty smooth.
Single threaded ftp download max pipe speed.
Start steam download 24 threads across 6 ips, at same time hits about 400kB/sec as reported by steam client.
Single threaded ftp download drops by about 3 mbit/sec out of 67.
No jitter/loss on ssh packets.
If I stop the ftp download, steam then auto grabs the freed up bandwidth and fills up to the rate set on the VPN shaper.
This didnt require any packet marking on VPN server, the config that side was really simple, I just rate limited everything outbound from it to my side of VPN tunnel, and only sent steam traffic through it.
I did after try some iptables marking stuff but it wasnt working, linode dont allow loading kernel modules on their linux vps images, and I dont know if its statically compiled into the kernel, all I know is it wasnt working. To do weighted classification VPN side would need marking working. But classifying pfsense side and just routing traffic that needs "taming" through the VPN is effective.
--edit--
After some sleep realised I forgot the restore mark mangle rule and classification now also works should I choose to use it.