Cache Busting On Demand"

From Documentation
Line 68: Line 68:
  
 
Instead of a manually maintained url parameter it would be better to have a  
 
Instead of a manually maintained url parameter it would be better to have a  
unique resource path or filename - that changes automatically if and only if necessary. (URL parameters may also have issues with certain webserver/proxy configurations '''LINK ME''').
+
unique resource path or filename - that changes automatically if and only if necessary. (URL parameters may also have [https://stackoverflow.com/questions/9692665/cache-busting-via-params issues with certain webserver/proxy configurations]).
  
Since ZK already has a place to add a build stamp path element, we'll leverage the same mechanism for our purposes. As we've seen earlier ZK recognizes the token '''xyz1234''' and ignores it at server side when resolving a resource URL to the actual file on the classpath. In addition to the ZK release buildstamp ZK also recognizes (and ignores) any other path element starting with '''"_zv"''' (short for zk version).
+
Since ZK already has a place to add a build stamp path element, we'll leverage the same mechanism for our purposes. As we've seen earlier ZK recognizes the token '''xyz1234''' and ignores it at server side when resolving a resource URL to the actual file on the classpath. In addition to the ZK release buildstamp ZK also recognizes (and ignores) any other path element starting with '''"_zv"''' (short for "zk version").
  
This means, the URL ...
+
This means both URLs below will locate and serve the same resource from the classpath:
  
<code><nowiki>http://localhost:8080[/myapp]/zkau/web/xyz1234/img/mylogo.png</nowiki></code>
+
* '''<nowiki>http://</nowiki>localhost:8080[/myapp]/zkau/web/<span style="color: red;">xyz1234</span>/img/mylogo.png'''
 +
* '''<nowiki>http://</nowiki>localhost:8080[/myapp]/zkau/web/_zv<span style="color: red;">[ANYTHING_HERE]</span>/img/mylogo.png'''
  
... will serve the same resource from the classpath as ...
+
The only difference is the URL, which the browser can now cache independently of the ZK buildstamp. For our customization this token should be unique for different files contents.
 
+
Useful unique identifiers could be some kind of resource version, file modified timestamp or a content hash. While using a resource version or file modified timestamp sound preferable they assume this information is actually available as some kind of metadata (which can be again out-dated, or expensive to determine).
<code><nowiki>http://localhost:8080[/myapp]/zkau/web/_zvANY_TOKEN_HERE/img/mylogo.png</nowiki></code>.
 
 
 
The only difference is the URL, which the browser can now cache independently of the ZK buildstamp. For our purpose this token should be unique for different files contents.
 
Useful unique identifiers could be resource version, file modified timestamp or a content hash. While using a resource version or file modified timestamp sound preferable they assume this information is actually available as some kind of metadata (which can be again out-dated, or expensive to determine).
 
  
 
For this example I'll use a content-hash (MD5) instead, because it can always be computed from the resource content itself without additional information. Different contents will naturally produce a different URL - and also important, unchanged contents will result in '''identical''' URLs allowing the browser to safely reuse the cached version.
 
For this example I'll use a content-hash (MD5) instead, because it can always be computed from the resource content itself without additional information. Different contents will naturally produce a different URL - and also important, unchanged contents will result in '''identical''' URLs allowing the browser to safely reuse the cached version.
  
(We are not looking for a secure hash function, just one that's cheap to compute and produces changing results at a high probability - if that's a concern just substitute MD5 with a different hash algorithm.)
+
(We are not looking for a secure hash function, just one that's cheap to compute and produces changing results at a high probability. Resource integrity is handled by HTTPS already. Still, feel free to pick something else: be it SHA1, SHA256 or [https://github.com/webpack/loader-utils/issues/114 even MD4])
  
 
== Implementation using ExtendletContext ==
 
== Implementation using ExtendletContext ==

Revision as of 11:41, 7 July 2021

DocumentationSmall Talks2021JulyCache Busting On Demand
Cache Busting On Demand

Author
Robert Wenzel, Engineer, Potix Corporation
Date
July, 2021
Version
any

Introduction

Browser caching is an important feature to speed up the loading time of static resources affecting your overall web application performance. Enabling the browser cache is done by setting the "right" caching headers (not the topic here). In many cases this requires finding the "right" balance between long caching times and up-to-date resources.

As a rule of thumb:

  • Dynamic resources (zul pages, ajax requests) are never cached
→ content always changes
  • Static resources (js/css/images/font resources) are cached as long as possible (possibly forever)
→ treat updated file contents as a new resource itself with a unique URL

This gets problematic as soon as one or more static resources needs to change, and you don't want to change the filename. When something goes wrong in that process you end up instructing users to clear their browser cache (press CTRL-F5) - or worse disable the cache altogether.

ZK already takes care of this for its own static resources - by setting the appropriate caching headers and by ensuring changing URLs for static resources between ZK versions.

This article will show how to leverage and customize this built-in mechanism, with the goal to get unique URLs for unique resources.

In short, how to generate URLs like this?

  • http://localhost:8080/zkau/web/[UNIQUE_HASH_FOR_SOMELOGO]/img/somelogo.png

  • http://localhost:8080/zkau/web/[UNIQUE_HASH_FOR_SOMESCRIPT]/js/somescript.js

  • http://localhost:8080/zkau/web/[UNIQUE_HASH_FOR_SOMESTYLES]/css/somestyles.css

...

Existing Methods/Features

Before customizing we have to understand the current mechanism.

ZK encodes internal resource URLs using a URL prefix "~./". These resources located on the classpath below the package web are called Class Web Resources or short CWR - this implies they can be packaged inside jar files or inside the war files web-content folder in WEB-INF/classes/web/..

e.g.: An image file in src/main/resources/web/img/somelogo.png or src/main/webapp/WEB-INF/web/img/somelogo.png (assuming a standard maven project structure) can be referenced from a zul file (e.g. <image src="~./img/somelogo.png"/>).

This will produce a URL similar to http://localhost:8080[/myapp]/zkau/web/xyz1234/img/somelogo.png

Key is the exemplary token xyz1234. It may appear random, but it's some kind of hash representing the current ZK version and static JS resource packaging configuration. This will change for each ZK Release - ensuring ZK's resources have a new URL for each release preventing the browser cache from loading (or mixing) old resources.

If one of your application's static resource (using this mechanism "~./...") changes without updating the ZK release, the URL will not update and the browser may continue displaying a cached old version of the same resource.

There are several ways to avoid that (here just listed, please read the related documentation for additional details).

  1. Configure org.zkoss.zk.ui.versionInfo.enabled with an arbitrary string (except `true/false`) to salt the ZK version token
  2. For custom components you can configure a <javascript-module> version in your lang-addon.xml
  3. A manual approach is to add/increment a query parameter to your resource URLs e.g. ~./img/mylogo.png?v=1

The first approach - while reliable and simple - is a bit overkill since all CWR resources (including ZK's own resources) will be 'cache-busted' (not ideal for regular application deployments with small changes).

The second is limited to JS files ... ZK uses this for addon component/widgets such as CkEditor or Gmaps, which have a release cycle independent of ZK.

The last approach may be good enough for a small number of resources used in a central page of your application but becomes tedious and error-prone quickly as your application grows.

Next: A more selective and maintainable alternative.

Dynamic Resource URL "Revving"

Instead of a manually maintained url parameter it would be better to have a unique resource path or filename - that changes automatically if and only if necessary. (URL parameters may also have issues with certain webserver/proxy configurations).

Since ZK already has a place to add a build stamp path element, we'll leverage the same mechanism for our purposes. As we've seen earlier ZK recognizes the token xyz1234 and ignores it at server side when resolving a resource URL to the actual file on the classpath. In addition to the ZK release buildstamp ZK also recognizes (and ignores) any other path element starting with "_zv" (short for "zk version").

This means both URLs below will locate and serve the same resource from the classpath:

  • http://localhost:8080[/myapp]/zkau/web/xyz1234/img/mylogo.png
  • http://localhost:8080[/myapp]/zkau/web/_zv[ANYTHING_HERE]/img/mylogo.png

The only difference is the URL, which the browser can now cache independently of the ZK buildstamp. For our customization this token should be unique for different files contents. Useful unique identifiers could be some kind of resource version, file modified timestamp or a content hash. While using a resource version or file modified timestamp sound preferable they assume this information is actually available as some kind of metadata (which can be again out-dated, or expensive to determine).

For this example I'll use a content-hash (MD5) instead, because it can always be computed from the resource content itself without additional information. Different contents will naturally produce a different URL - and also important, unchanged contents will result in identical URLs allowing the browser to safely reuse the cached version.

(We are not looking for a secure hash function, just one that's cheap to compute and produces changing results at a high probability. Resource integrity is handled by HTTPS already. Still, feel free to pick something else: be it SHA1, SHA256 or even MD4)

Implementation using ExtendletContext

The ZK interface to implement is ExtendletContext. Among other things, implementations deal with the URL encoding of all internal urls starting with "~" (distinguished by prefix). The default prefix "~./" triggers the implementation for ZK's Class Web Resources. We can register our own prefix (e.g. I chose this string ".[md5]/"), to encode URLs in the way we need.

MD5HashInit.java

 1     private static final String CWR_EXT_CTX = ".";      // the default CWR URL prefix
 2     private static final String MD5_EXT_CTX = ".[md5]"; // our custom resource URL prefix
 3 
 4     private ServletContext servletContext;
 5     private ClassWebResource cwr;
 6     private ExtendletContext cwrExtendletContext;       // the default extendlet
 7 
 8     @Override
 9     public void init(WebApp wapp) {
10         servletContext = wapp.getServletContext();
11         cwr = WebManager.getWebManager(wapp).getClassWebResource();
12         cwrExtendletContext = Servlets.getExtendletContext(servletContext, CWR_EXT_CTX);
13 
14         Servlets.addExtendletContext(servletContext, MD5_EXT_CTX, (ResourceUrlEncoder) this::encodeURL);
15     }

(ResourceUrlEncoder is a functional interface extending ExtendletContext with empty default method bodies for unused methods so we can provide the encodeURL method as a method reference, just for readability).

Now all URLs prefixed with "~.[md5]/" will be handled by our encodeURL method:

1     private String encodeURL(ServletRequest request, ServletResponse response, String uri)
2             throws ServletException, IOException {
3         String locatedUri = Servlets.locate(servletContext, request, uri, cwrExtendletContext.getLocator());
4         String defaultCwrUrl = cwrExtendletContext.encodeURL(request, response, locatedUri);
5         String hash = this.computeResourceHash(locatedUri);
6         return defaultCwrUrl.replace(cwr.getEncodeURLPrefix(), "/_zv_md5_" + hash); // prefix "/_zv" would be sufficient
7     }

The uri parameter (Line 1) contains our resource path without the prefix.

We reuse the default location (Line 3) and encoding (Line 4) strategy for the resource uri (incl. wildcard handling for localization and browser-specific resources).

Next we compute the hash for the actual resource (Line 5).

Line 6 replaces the default version token (cwr.getEncodeURLPrefix()) with the MD5 hash, prefixed by "_zv_md5_" ("_zv" for ZK to ignore it, and "_md5_" for us to as an optional visual hint).

Checking the Results

As a small test index.zul adds a few static resources using the default CWR (Line 5) and the new MD5 hash prefix (Lines 1, 2, 8).

 1 <?style src="~.[md5]/css/mystyle.css"?>
 2 <?script src="~.[md5]/js/myscript.js"?>
 3 <zk>
 4 	<window border="normal">
 5 		<image src="~./img/zk-logo.svg" width="60px" height="40px"/>
 6 		Image loaded via "~./img/zk-logo.svg"
 7 		<separator/>
 8 		<image src="~.[md5]/img/zk-logo.svg" width="60px" height="40px"/>
 9 		Image loaded via "~.[md5]/img/zk-logo.svg"
10 		<separator/>
11 		<button label="Default Button Style"/>
12 		<separator/>
13 		<button label="Important Button Style" sclass="important"/>
14 	</window>
15 </zk>


The results can be verified in the browser's developer tools.

Improvement: Cache the Hash Results

Hashing the file content every time a resource is accessed would be inefficient. That's why the actual example code also contains a way to cache the MD5 results and configuration property to disable the cache during development, so we can actually change the file at runtime and see the MD5 hash change. In a production environment the 'static' resources don't change anyway without a redeployment or restart. So we can just keep the MD5 hash results in the hashmap forever (If needed you can add more complexity to clear the hashmap if needed).

 1     private Map<String, String> hashByUri = new ConcurrentHashMap<>();
 2     private boolean cacheEnabled;
 3 
 4     private String encodeURL(ServletRequest request, ServletResponse response, String uri) throws ServletException, IOException {
 5         String locatedUri = Servlets.locate(servletContext, request, uri, cwrExtendletContext.getLocator());
 6         String defaultCwrUrl = cwrExtendletContext.encodeURL(request, response, locatedUri);
 7         String hash = cacheEnabled
 8                 ? hashByUri.computeIfAbsent(locatedUri, this::computeResourceHash)
 9                 : this.computeResourceHash(locatedUri);
10         return defaultCwrUrl.replace(cwr.getEncodeURLPrefix(), "/_zv_md5_" + hash); // prefix "/_zv" would be sufficient
11     }

It's important to notice that we have to cache the actually located resource URL (locatedUri), which can resolve wildcards into browser and locale.

The resulting one time (per application startup) overhead to hash the resources should be negligible for smaller resources (a few milliseconds). This is a strategy for static resources anyway, usually there aren't that many in a web application - plus the computation happens on demand whenever a resource is first referenced from the code (It's trivial to provide a list of resources to be pre-hashed eagerly during application startup and add the results to the cache hashByUri).

For huge resources a different strategy would be preferable (e.g. pre-calculate the hashes at build time ... or give those files unique names in the first place.

Not every resource has to use the same algorithm, you can configure multiple ExtendletContext implementations for different prefixes.

Summary

I think the above described mechanism is a good trade-off between complexity/additional overhead and the gained benefits - not to forget the convenience factor.

Pros

  • simple to implement (and disable)
  • effective/robust
    • unique URLs can remain cached "infinitely"
    • updated resources are reloaded
    • no hassle with additional cache headers (lifetimes, etags, browser differences ...)
    • reduced network requests (no additional 304 responses)
  • leverage ZK's built-in resource mechanism
    • resources from classpath allow (pluggable) jar modules
    • l13n/browser specific resources

Cons

  • runtime hash computation overhead (can be reduced/avoided by adding more complexity)
  • non standard/"weird" looking resource URLs (at least indicating their purpose with a prefix)

Source Code

The example project source code is available on github in /zkoss-demo/zk-cache-busting-on-demand.

Please check the README.md for run instructions.


Comments



Copyright © Potix Corporation. This article is licensed under GNU Free Documentation License.