Announcements > Site & Forum Discussion

Caching friendliness - reducing traffic + load on Kitz's servers

(1/3) > >>

Weaver:
I am hoping that well-tuned caching-friendliness in http headers might reduce some of the traffic on Kitz’s a lot in order to save money and help with snappy responsiveness.

There are limits though, info that changes dynamically all the time as part of even an old page. When an old, supposedly unchanged page is re-viewed it can mess up cacheability as the actual total content of the page changes because of any such auxiliary bits of included information. I am thinking of "Total time logged in" "Recent topics" and "Currently logged-in users" which are shown on some pages. Those destroy cacheability because they are calculated dynamically. That could be fixed though, by putting them into an external file and pulling that in in an iframe so that the main page would not change and would then be cache-friendly. Also in old pages, blurb about particular users could have changed from the time the page was originally last really altered.

Adding calculated Etags in the headers is easy and does not require any brain power - I did it once, so that proves it :-). Adding a last-modified date header for cacheability is I think too hard for forum html files, and so is if-modified-since support, but supporting the other query, if-none-match would be very good.

Saving you from a huge amount of traffic can only be done if there are not things in the way because of annoying little dynamic content changes, so the iframe tweaks will fix things if nothing is missed and we will know because the Etags will stay the same if they are derived by hashing the content (although you can derive Etags in any way you like, from either last-changed dates (if you have any) or content hashes or both, and dates are easy on the server because you don't have the cost of reading the whole file and the cpu time of hashing it all. I used dates because I had them available, but I think you would have to use hashing to make Etags. You just use any fast hash on the whole file content.

There will be a lot of people around who know all about how to help with this or perhaps your hosting company would just do it all for you. Just add Etags into the headers and make sure the server is being generally cache-control friendly.

For all images then there should be full cacheability with etags and preferably last-modified dates all determined by if-none-match queries and even better by if-modified-since queries, with a long cache lifetime because images do not change. In Apache you can set that up by file type, so that it knows images do not change and it will not keep sending them to the user again and again.

Articles may end up being fully cacheable anyway because they do not change very often and as long as they are not pulling in minor calculated content then they will be cache-friendly

If we find that forum pages are actually changing for no good reason we will see it because their Etags will keep changing if they Etags are derived by hashing. Then we will know that we won't be saving as much traffic as we should potentially be, and perhaps the iframe trick would work. Another thing that will really help to save some traffic is not having any inline js or css but putting it all into external files so they are not refetched ever because they do not change.

And could have a long lifetime, plus date-related cache control info on them, like images and treat them separately by file type again. That is what I did: images, txt files, fonts, pdfs, js, css, favicons and a dozen more things all were listed with different long cache lifetimes in apache.

Whether this is worth doing or not depends on how much money there is to be saved. The work involves someone making some changes to the web server config file. In apache it would mean adding a few lines. Then an optional stage 2 would be making it much more effective, by doing things like the iframe tweak or a simple copying over of js and css to external files. All current advice is to include js and css inline for high performance, because this removes the latency of a tcp connection startup to get each external file, but that is the opposite of saving money on traffic and it is irrelevant if the js and css are cached anyway because the cache-friendliness has been set up properly.

There are bound to be some people around who know a hundred times more about this than me. I wrote some code to plug into a web server to do all of this automatically many years ago, that’s how I know a little about it.

There used to be a tool that would assess a website for cache-friendliness and give an excellent report and checklist.

Just a thought. God, I am so sorry this post has grown to be so long. Sincere apologies  :-[ if you knew all of this stuff already.

kitz:
Hi Weaver.   

Thanks for the suggestions.   Most of the bandwidth is on the main site where I already use expire headers.   Funny enough I was looking at them last night in relation to a query whereby someone was seeing an old copy of something despite the fact I knew that I'd uploaded a new file a few hours previously.   
I also use gzip compression on the server and 304 not modifieds are in use.     All images on the main site are optimised to keep bandwidth to a minimum.

eg


--- Code: ---Request Method: GET
Status Code: 304 Not Modified

Cache-Control: max-age=172800
Connection: Keep-Alive
Date: Thu, 14 Jun 2018 09:28:31 GMT
Expires: Sat, 16 Jun 2018 09:28:31 GMT

--- End code ---

On occasion it does cause problems - the most usual one is when I change the site logo to the xmas banner and back again.   Over the years I've lost count of how many people due to caching still think I have christmas decorations still up long past the date they were taken down a few weeks ago, but on the whole it seems to work ok.

------

As regards to the forum.    I use SMF software so I have very little control over what goes on in the core and what can be edited.

Think of the SMF software as a framework where I can edit CSS stuff to make it look pretty and add my own logo etc,  but there isn't much I can do with the CORE functions that actually make it work.    By nature forums are dynamic, so each page refresh will query the database for new content.

Since I started using SSL I also had to add a proxy cache to stop image content from other websites causing errors.

Chrysalis:
Now days I wouldnt expect people to be bandwidth limited with hosting, limits are not what they used to be, even cheap VPS packages costing a couple of £ a month tend to have limits in the terabyte range.  Still caching helps with performance and cpu/ram load of course.

Weaver:
Is it on top of Apache?

I understand, so you cannot go around hacking up the html and so on, so my points about iframe sand changing the way JS + CSS is delivered are useless. I am forgetting that it is an engine. Duh.

However your hosting company can help you tweak you Apache config files should you ever need to do anything further with regard to cache friendliness. You pointed out that you are already giving out 304s. You did not give me an Etag when I probed the server, but I don't see how you could change that unless there is some known somewhere because you won't be in control of that. I reaslise now that you have woken me up. Should have put my brain in gear first, what little is left of it.

kitz:

--- Quote from: Weaver on June 14, 2018, 03:55:21 PM ---Is it on top of Apache?

--- End quote ---

Yep it runs on Apache.  The software that gets installed on the server, just like how you'd say install 'Word' on a PC.    It also requires PHP and a MySQL database, which is where post content data is stored.     The software pulls all the data from a db and presents it within a html frame environment.


--- Quote ---I understand, so you cannot go around hacking up the html and so on, so my points about iframe sand changing the way JS + CSS is delivered are useless. I am forgetting that it is an engine. Duh.

--- End quote ---

You could if you really wanted because its open source, but as far as I'm concerned No you cant.  The CORE is the main framework which could be amended if you wanted to make your own version.   I don't like touching anything in the Core though as thats when applying security patches and updates become more difficult.   I have absolutely no inclination to touch anything in the Core and make things more difficult for myself.


--- Quote ---However your hosting company can help you tweak you Apache config files should you ever need to do anything further with regard to cache friendliness. You pointed out that you are already giving out 304s. You did not give me an Etag when I probed the server, but I don't see how you could change that unless there is some known somewhere because you won't be in control of that. I reaslise now that you have woken me up. Should have put my brain in gear first, what little is left of it.

--- End quote ---

Unless I misunderstand what you mean (probably), the expiry tags are in the http headers

--- Code: ---Expires: Sat, 16 Jun 2018 09:28:31 GMT
--- End code ---

From memory I think images have a longer expiry than html pages.

Navigation

[0] Message Index

[#] Next page

Go to full version