Kitz ADSL Broadband Information
adsl spacer  
Support this site
Home Broadband ISPs Tech Routers Wiki Forum
 
     
   Compare ISP   Rate your ISP
   Glossary   Glossary
 
Please login or register.

Login with username, password and session length
Advanced search  

News:

Author Topic: Caching friendliness - reducing traffic + load on Kitz's servers  (Read 1061 times)

Weaver

  • Addicted Kitizen
  • *****
  • Posts: 6376
  • Retd sw dev; A&A; 4 ◊ 7km ADSL2; IPv6; Firebrick

I am hoping that well-tuned caching-friendliness in http headers might reduce some of the traffic on Kitzís a lot in order to save money and help with snappy responsiveness.

There are limits though, info that changes dynamically all the time as part of even an old page. When an old, supposedly unchanged page is re-viewed it can mess up cacheability as the actual total content of the page changes because of any such auxiliary bits of included information. I am thinking of "Total time logged in" "Recent topics" and "Currently logged-in users" which are shown on some pages. Those destroy cacheability because they are calculated dynamically. That could be fixed though, by putting them into an external file and pulling that in in an iframe so that the main page would not change and would then be cache-friendly. Also in old pages, blurb about particular users could have changed from the time the page was originally last really altered.

Adding calculated Etags in the headers is easy and does not require any brain power - I did it once, so that proves it :-). Adding a last-modified date header for cacheability is I think too hard for forum html files, and so is if-modified-since support, but supporting the other query, if-none-match would be very good.

Saving you from a huge amount of traffic can only be done if there are not things in the way because of annoying little dynamic content changes, so the iframe tweaks will fix things if nothing is missed and we will know because the Etags will stay the same if they are derived by hashing the content (although you can derive Etags in any way you like, from either last-changed dates (if you have any) or content hashes or both, and dates are easy on the server because you don't have the cost of reading the whole file and the cpu time of hashing it all. I used dates because I had them available, but I think you would have to use hashing to make Etags. You just use any fast hash on the whole file content.

There will be a lot of people around who know all about how to help with this or perhaps your hosting company would just do it all for you. Just add Etags into the headers and make sure the server is being generally cache-control friendly.

For all images then there should be full cacheability with etags and preferably last-modified dates all determined by if-none-match queries and even better by if-modified-since queries, with a long cache lifetime because images do not change. In Apache you can set that up by file type, so that it knows images do not change and it will not keep sending them to the user again and again.

Articles may end up being fully cacheable anyway because they do not change very often and as long as they are not pulling in minor calculated content then they will be cache-friendly

If we find that forum pages are actually changing for no good reason we will see it because their Etags will keep changing if they Etags are derived by hashing. Then we will know that we won't be saving as much traffic as we should potentially be, and perhaps the iframe trick would work. Another thing that will really help to save some traffic is not having any inline js or css but putting it all into external files so they are not refetched ever because they do not change.

And could have a long lifetime, plus date-related cache control info on them, like images and treat them separately by file type again. That is what I did: images, txt files, fonts, pdfs, js, css, favicons and a dozen more things all were listed with different long cache lifetimes in apache.

Whether this is worth doing or not depends on how much money there is to be saved. The work involves someone making some changes to the web server config file. In apache it would mean adding a few lines. Then an optional stage 2 would be making it much more effective, by doing things like the iframe tweak or a simple copying over of js and css to external files. All current advice is to include js and css inline for high performance, because this removes the latency of a tcp connection startup to get each external file, but that is the opposite of saving money on traffic and it is irrelevant if the js and css are cached anyway because the cache-friendliness has been set up properly.

There are bound to be some people around who know a hundred times more about this than me. I wrote some code to plug into a web server to do all of this automatically many years ago, thatís how I know a little about it.

There used to be a tool that would assess a website for cache-friendliness and give an excellent report and checklist.

Just a thought. God, I am so sorry this post has grown to be so long. Sincere apologies  :-[ if you knew all of this stuff already.
« Last Edit: June 14, 2018, 05:46:05 AM by Weaver »
Logged

kitz

  • Administrator
  • Senior Kitizen
  • *
  • Posts: 31649
  • Trinity: Most guys do.
    • http://www.kitz.co.uk
Re: Caching friendliness - reducing traffic + load on Kitz's servers
« Reply #1 on: June 14, 2018, 10:59:01 AM »

Hi Weaver.   

Thanks for the suggestions.   Most of the bandwidth is on the main site where I already use expire headers.   Funny enough I was looking at them last night in relation to a query whereby someone was seeing an old copy of something despite the fact I knew that I'd uploaded a new file a few hours previously.   
I also use gzip compression on the server and 304 not modifieds are in use.     All images on the main site are optimised to keep bandwidth to a minimum.

eg

Code: [Select]
Request Method: GET
Status Code: 304 Not Modified

Cache-Control: max-age=172800
Connection: Keep-Alive
Date: Thu, 14 Jun 2018 09:28:31 GMT
Expires: Sat, 16 Jun 2018 09:28:31 GMT

On occasion it does cause problems - the most usual one is when I change the site logo to the xmas banner and back again.   Over the years I've lost count of how many people due to caching still think I have christmas decorations still up long past the date they were taken down a few weeks ago, but on the whole it seems to work ok.

------

As regards to the forum.    I use SMF software so I have very little control over what goes on in the core and what can be edited.

Think of the SMF software as a framework where I can edit CSS stuff to make it look pretty and add my own logo etc,  but there isn't much I can do with the CORE functions that actually make it work.    By nature forums are dynamic, so each page refresh will query the database for new content.

Since I started using SSL I also had to add a proxy cache to stop image content from other websites causing errors.
Logged
Please do not PM me with queries for broadband help as I may not be able to respond.
-----
How to get your router line stats :: ADSL Exchange Checker

Chrysalis

  • Content Team
  • Addicted Kitizen
  • *
  • Posts: 5441
Re: Caching friendliness - reducing traffic + load on Kitz's servers
« Reply #2 on: June 14, 2018, 03:27:18 PM »

Now days I wouldnt expect people to be bandwidth limited with hosting, limits are not what they used to be, even cheap VPS packages costing a couple of £ a month tend to have limits in the terabyte range.  Still caching helps with performance and cpu/ram load of course.
« Last Edit: June 14, 2018, 03:56:41 PM by Chrysalis »
Logged
Sky Fiber Pro - Billion 8800NL bridge & PFSense BOX running PFSense 2.4 - ECI Cab - LINE STATISTICS CLICK HERE

Weaver

  • Addicted Kitizen
  • *****
  • Posts: 6376
  • Retd sw dev; A&A; 4 ◊ 7km ADSL2; IPv6; Firebrick
Re: Caching friendliness - reducing traffic + load on Kitz's servers
« Reply #3 on: June 14, 2018, 03:55:21 PM »

Is it on top of Apache?

I understand, so you cannot go around hacking up the html and so on, so my points about iframe sand changing the way JS + CSS is delivered are useless. I am forgetting that it is an engine. Duh.

However your hosting company can help you tweak you Apache config files should you ever need to do anything further with regard to cache friendliness. You pointed out that you are already giving out 304s. You did not give me an Etag when I probed the server, but I don't see how you could change that unless there is some known somewhere because you won't be in control of that. I reaslise now that you have woken me up. Should have put my brain in gear first, what little is left of it.
Logged

kitz

  • Administrator
  • Senior Kitizen
  • *
  • Posts: 31649
  • Trinity: Most guys do.
    • http://www.kitz.co.uk
Re: Caching friendliness - reducing traffic + load on Kitz's servers
« Reply #4 on: June 14, 2018, 06:16:55 PM »

Is it on top of Apache?

Yep it runs on Apache.  The software that gets installed on the server, just like how you'd say install 'Word' on a PC.    It also requires PHP and a MySQL database, which is where post content data is stored.     The software pulls all the data from a db and presents it within a html frame environment.

Quote
I understand, so you cannot go around hacking up the html and so on, so my points about iframe sand changing the way JS + CSS is delivered are useless. I am forgetting that it is an engine. Duh.

You could if you really wanted because its open source, but as far as I'm concerned No you cant.  The CORE is the main framework which could be amended if you wanted to make your own version.   I don't like touching anything in the Core though as thats when applying security patches and updates become more difficult.   I have absolutely no inclination to touch anything in the Core and make things more difficult for myself.

Quote
However your hosting company can help you tweak you Apache config files should you ever need to do anything further with regard to cache friendliness. You pointed out that you are already giving out 304s. You did not give me an Etag when I probed the server, but I don't see how you could change that unless there is some known somewhere because you won't be in control of that. I reaslise now that you have woken me up. Should have put my brain in gear first, what little is left of it.

Unless I misunderstand what you mean (probably), the expiry tags are in the http headers
Code: [Select]
Expires: Sat, 16 Jun 2018 09:28:31 GMT
From memory I think images have a longer expiry than html pages.
Logged
Please do not PM me with queries for broadband help as I may not be able to respond.
-----
How to get your router line stats :: ADSL Exchange Checker

kitz

  • Administrator
  • Senior Kitizen
  • *
  • Posts: 31649
  • Trinity: Most guys do.
    • http://www.kitz.co.uk
Re: Caching friendliness - reducing traffic + load on Kitz's servers
« Reply #5 on: June 14, 2018, 06:40:37 PM »

Now days I wouldnt expect people to be bandwidth limited with hosting, limits are not what they used to be, even cheap VPS packages costing a couple of £ a month tend to have limits in the terabyte range.  Still caching helps with performance and cpu/ram load of course.


Yup the prices of  VPS have dropped and there are some cheap deals around and some of them do indeed offer unlimited bandwidth.
However when you look into it, they often have restrictions on other factors such as databases & email accounts and dont offer C-Panel. 

----
Rest of reply not specifically aimed at Chrys

Because its a UK targetted site I want UK hosting with managed services which is always going to cost more than say US. 
There's also lots of other things to be taken into account such as number of databases and for me a biggie is CPanel/WHM which is when the prices start increasing.   

I've had a look around and £20 pm is about the average bolt-on start cost for CPanel VPS.  Restrictions of 1-3 databases - no way!  I want flexibility to manage my own mail.    At the end of the day I personally have always been kitz.co.uk it's just that 'the site' took over.   I can add whatever software I like be that SMF or any other forum software that I choose.  I can add media-wiki.   I can add various CMS systems.  I could set up a blog/WP.  I can install Joomla and play with it before deciding the site wont convert easily to that format.  I'm free to install most packages and set up god knows how many databases (I currently have about 10 active).   Heck I could even resell any unused bandwidth.


As I said I'm not a server person but I do want the likes of C-Panel or WHM to be able to do what I need and without any database restrictions.   I have quite a few db's  including the forum, the wiki, the isp prices, isp info, the rate myISP, my own custom backend management for the main site yada yada yada.     

Cheap hosting doesn't cut it I'm afraid because it just won't do what I need. I spend more than enough time messing with admin and the existing way of doing things takes away a lot of stress from me.  Hosting restrictions is one of the reasons why Tony spent ~2k building his own and then had to buy a connection on top of that.   As I said elsewhere a lot of people vastly under-estimate some of the site facilities.
The site has been running for >15yrs and during that time it has grown a lot and as it grew I went through various different types of hosting accounts.  I tried cloud services and that didn't work out either, so I'm happy with the current package which does exactly what I need.  I may have to add another £10pm for additional storage for the forum at some point soon, but atm I'm managing. 

There's also one other thing that I won't go into details about in public, other than to say several years ago Vidahost pulled out all stops to assist with something that I doubt no other hosting provider would have done.   For me it was a major thing and I can say right here and now that if they hadn't come up with a solution,  then I would have immediately pulled the site.  To this day that beyond normal service thing Vidahost did is still in place.  The moderators and a couple of helpfuls know what happened and know that this was far from trivial matter and why I am actually indebted to Seb director @ Vidahost for even offering to do what they did at no cost.     

So as I mentioned elsewhere.   Whilst I appreciate people are trying to help, the subject of swapping hosts/account type is just not up for debate.
Logged
Please do not PM me with queries for broadband help as I may not be able to respond.
-----
How to get your router line stats :: ADSL Exchange Checker

Weaver

  • Addicted Kitizen
  • *****
  • Posts: 6376
  • Retd sw dev; A&A; 4 ◊ 7km ADSL2; IPv6; Firebrick
Re: Caching friendliness - reducing traffic + load on Kitz's servers
« Reply #6 on: June 14, 2018, 06:45:09 PM »

Etag ("Etag:") is an opaque string not (necessarily) a date, can be anything at all it is just compared for equality. It could contain a date in seconds since whenever epoch time or be a hash computed on the file contents or both concatenated. Whatever you put in it is up to you and no one can interpret it. If an Etag matches the remembered etag value seen in the headers the last time the page was delivered then that means the page is the same, so does not need to be refetched. There is no such thing as lifetime if a Etags are being used because you just know there is absolutely no point in fetching the page again. A lot of people do an MD5 hash of the page content. I did not like that because of the time taken to read the whole file plus the cpu time, nor the way that this could be huge for a big file.

When I wrote some code to calculate them and stick them into headers on an Apache server I took the date of a file in seconds since 1970 or whenever, subtracted the year 200x, the time when the code v 1.00 was first ever released, I then wrote this out in hex and appended an Ďetag format-versioní number plus a hyphen separator. That was so that if I ever changed the method of calculation or the syntax of etags then the format version number would go up by one so that the etag string would not match anything ever issued before. The subtracting a start date thing was just done to make the numbers as low as possible so that the hex strings would be as short as could be just to save some bytes in the header and make comparisons as quick as possible. If I had been more with-it I would have written the number out backwards least significant byte first and even better would have been to use base64 instead of hex.
Logged

Chrysalis

  • Content Team
  • Addicted Kitizen
  • *
  • Posts: 5441
Re: Caching friendliness - reducing traffic + load on Kitz's servers
« Reply #7 on: June 15, 2018, 07:32:59 AM »

Yes the approach hosting companies take now is to offer a bandwidth limit that is high enough to not be the bottleneck, but they will bottleneck the product in another way, e.g. its harder to use high bandwidth when you only have limited storage, and storage capacity is often a prime driver of pricing on rented servers.  If you rent out hosting with say only 10 gig space available, there is not much content you can fit on that storage and as a result bandwidth usage is unlikely to be high.  Occasionally someone may buck the trend but like when selling broadband its all about the average usage, also often there is fair usage policies as well, like on my soyoustart server I have a gigabit port, but I have to keep my average usage below 250mbit/sec.  However 250mbit/sec of average usage is way higher than what we had say 10 years ago at that price point which might have been a 100mbit port with 100 gig of bandwidth.

So the prime limiting factors these days with hosting tends to be storage, cpu and ram.  Not bandwidth.

For reference kitz I wasnt suggesting you move your website to a £2 VPS server ;)  Just giving an example that even on bottom of the barrel services, bandwidth is given out like candy, its a commodity available in abundance.

However you will be surprised how cheap you can get dedicated server class hardware for these days, like a 16 core XEON, ECC ram, enterprise storage, another factor driving pricing is support packages, so you can e.g. get server class hardware for under £30 per month, but at that price the expected level of support will be minimal, obviously no managed support thats unmanaged pricing, but support response times at places like soyoustart (soyoustart part of the large OVH company but the budget brand) will be poor, and leaseweb is another example, they have tiered packages, their base packages are fairly cheap but have things like no weekend response, and 5 day response times on tickets, but you can pay for response time SLA's of say 1 hour including weekends but the pricing is then 10x the amount, its about making things affordable for hobbyists whilst also offering enteprise class services under the same roof.

However another advantage with these low priced modern offerings is the companies have had to utilise technology to keep their costs down, and this benefits the consumer, so as an example 10 years ago if you needed a remote reboot you would need to do a support ticket, wait for a human response, and wait for a human to walk to your server to power cycle it.  Now days you have access to a web interface and just hit a remote reboot button, its done instantly.  KVM consoles used to be a premium and are often now inclusive on packages as well.

Inclusive cpanel isnt a problem as one can simply get their own cpanel license and install it themselves, and this brings me to my next point, if you know how to use linux/bsd or have a friend who can do it for you, then unmanaged services is not an issue, but if you want/expect a service that does it for you, then you pay a premium for it and have a much more restricted market to choose a product from.
« Last Edit: June 15, 2018, 07:46:47 AM by Chrysalis »
Logged
Sky Fiber Pro - Billion 8800NL bridge & PFSense BOX running PFSense 2.4 - ECI Cab - LINE STATISTICS CLICK HERE

jelv

  • Helpful
  • Kitizen
  • *
  • Posts: 1154
Re: Caching friendliness - reducing traffic + load on Kitz's servers
« Reply #8 on: June 15, 2018, 12:28:06 PM »

@Chrysalis

Do the packages you are suggesting come with the same level of support that Kitz is currently getting at no extra cost?
Logged
Line rental: Pulse8, Broadband: AAISP Home::1 FTTC 80/20, Mobile: id Mobile

Chrysalis

  • Content Team
  • Addicted Kitizen
  • *
  • Posts: 5441
Re: Caching friendliness - reducing traffic + load on Kitz's servers
« Reply #9 on: June 15, 2018, 02:03:18 PM »

As I said probably not as kitz has a managed service, at those price points you can only expect unmanaged services,  sysadmin could be hired or a friend could do the management part for free, I would do it for free certainly as a contribution to the cause.  One would get a linux server, get a cpanel license (might get one supplied still at these pricepoints tho usually as an addon), then sysadmin would set it all up, and would still have the same webgui with cpanel and WHM.
Logged
Sky Fiber Pro - Billion 8800NL bridge & PFSense BOX running PFSense 2.4 - ECI Cab - LINE STATISTICS CLICK HERE

kitz

  • Administrator
  • Senior Kitizen
  • *
  • Posts: 31649
  • Trinity: Most guys do.
    • http://www.kitz.co.uk
Re: Caching friendliness - reducing traffic + load on Kitz's servers
« Reply #10 on: June 15, 2018, 09:21:01 PM »

Same as with DSL - I guess you get what you pay for and they make restrictions on one thing if not the other.

As far as managed was concerned, it was a no brainer to let them be in charge of everything as I have enough to do without worrying about security patches etc.     

CPanel licenses aren't cheap starting at ~£20 per month upwards.    I'd certainly miss things like phpMyAdmin for setting up databases.    I have unlimited databases and although I don't have hundreds, for me its nice not to worry about such things and be able have separate databases for things like ratemyISP...  and just be able to set up any software requiring dbs that I want.   

TBH I don't use a lot of the facilities that I do have eg SSH, WHM etc but it is there if I do want to delv or make changes myself rather than have to ask them. 

So whilst there is cheap out there, by the time you add on all the nuts and bolts that I need and use, there won't be much savings to be made.   
Logged
Please do not PM me with queries for broadband help as I may not be able to respond.
-----
How to get your router line stats :: ADSL Exchange Checker

Weaver

  • Addicted Kitizen
  • *****
  • Posts: 6376
  • Retd sw dev; A&A; 4 ◊ 7km ADSL2; IPv6; Firebrick
Re: Caching friendliness - reducing traffic + load on Kitz's servers
« Reply #11 on: June 16, 2018, 12:01:59 AM »

Quite right about cheapening it.
Logged

Chrysalis

  • Content Team
  • Addicted Kitizen
  • *
  • Posts: 5441
Re: Caching friendliness - reducing traffic + load on Kitz's servers
« Reply #12 on: June 16, 2018, 06:21:31 AM »

I am sidetracking the topic, so will leave what I said at that.
Logged
Sky Fiber Pro - Billion 8800NL bridge & PFSense BOX running PFSense 2.4 - ECI Cab - LINE STATISTICS CLICK HERE