30. March 2023

3 simple ways on how to get started with caching in Varnish Cache

2 factors to consider and 3 ways to choose from to get started with caching in Varnish

Do you have complete control over your Varnish configuration AND/OR do you have complete control over your application?
Is your website (mostly) static or dynamic? Does it have pages behind a user login? Or is it using sessions even for guests?

Depending on the answers there are 3 simple ways on how to get started with caching in Varnish.

1. Caching using only headers from backend server and application

If you have full control over only your application, then this could be a starting point.

Usually Varnish Cache is given some cache header information from the backend application which it uses to handle caching. And indepth tutorial at the Varnish Developer Portal about Http caching basics is all about how Varnish interprets those caching headers.

In Varnish you keep your configuration very simple and allow the application to tell Varnish what to cache and how to cache.

In your application you need to send caching headers with the response. In Laravel, with a default configuration, sessions are used and therefore every response contains explicit no-cache headers: Cache-Control: "no-cache, private". To send cache headers you have to remove your session middleware and add the core cache header middleware to those routes you want to cache:

1 <?php
2 
3 // routes/web.php

4 
5 Route::withoutMiddleware([StartSession::class, ShareErrorsFromSession::class, VerifyCsrfToken::class])
6      ->middleware('cache.headers:public;s_maxage:3600')
7      ->group(function () {
8          Route::view('/', 'welcome');
9      });

Here you instruct Varnish to cache the root route for 1h.

Fore more information on this approach you can checkout slides for the Laracon EU 2022 presentation by Thijs Feryn.

But if you need sessions for logged in users, you need another approach:

2. Caching using PARTLY headers from backend server and application, and PARTLY Varnish logic

If you have some sort of user login, you first need to implement some business logic mentioned in my article about avoiding showing user specific information by wrongful caching.

For this to work you need some control/influence over your Varnish configuration (just a little) and be able to implement some changes on your application to make it work.

Then, for a simple start, you can use Spatie's Varnish package to add caching headers without removing any middleware like in the previous step, which could look like this:

 1 // routes/web.php

 2 
 3 // Will be cached by Varnish for a default time of 24h

 4 Route::group(['middleware' => 'cacheable'], function() {
 5     Route::get('/', 'HomeController@index');
 6     Route::get('/contact', 'ContactPageController@index');
 7 });
 8 
 9 // Varnish will cache the responses of the routes inside the group for 15 minutes

10 Route::group(['middleware' => 'cacheable:15'], function() {
11    ...
12 });
13 
14 // won't be cached by Varnish at all since specific cache headers are missing

15 Route::get('do-not-cache', 'AnotherController@index');

Your final simple Varnish configuration then would look like this:

vcl 4.1;
 
# Default backend definition. Set this to point to your content server, for example Nginx, which retrieves static files, passes data through to PHP, etc...
backend default {
    .host = "127.0.0.1";
    .port = "8080";
}

sub vcl_recv {
  if (req.http.Cookie ~ "(?i)(loggedin)") {
    return (pass);
  }
  
  return (hash);
}

sub vcl_backend_response {
  if (beresp.http.X-Cacheable ~ "1") {
    unset beresp.http.set-cookie;
  }
}

3. Caching using ONLY Varnish logic

If you know for sure that you have certain pages which never are allowed to be cached but others should (always) be cached, you can setup Varnish in such a way that you don't need to modify your application at all. This blog uses this method since it's very predictable what should be cached and what shouldn't.

The way to go is defining how requests will be processed by categorizing them right when they come in (setting "X-Cache-Mode" in vcl_recv) and how they will be stored (vcl_backend_response).

For better understanding you can proceed by reading this Varnish configuration and the comments in it:

sub vcl_recv {
  # you start by defining that the default case is always uncachable to make sure you don't get any false positives
  set req.http.X-Cache-Mode = "uncachable";

  # urls with certain file extensions or from certain folders will be marked as static
  if (req.url ~ "^[^?]*\.(css|map|gif|ico|jpeg|jpg|js|png|svg|txt|webm|webp|woff|woff2|xml)(\?.*)?$" || req.url ~ "^/(storage|vendor|build|css|js|images)/") {
    unset req.http.cookie;
    set req.url = regsuball(req.url, "\?.*$", "");

    set req.http.X-Cache-Mode = "static-file";

    return (hash);
  }

  # don't cache big static files, just stream them
  if (req.url ~ "^[^?]*\.(7z|avi|bz2|flac|flv|gz|mka|mkv|mov|mp3|mp4|mpeg|mpg|ogg|ogm|opus|rar|tar|tgz|tbz|txz|wav|webm|xz|zip|pdf)(\?.*)?$") {
    unset req.http.Cookie;
    set req.http.X-Cache-Mode = "static-file-stream";

    return (pass);
  }

  # bypass admin pages, preview pages, auth pages, etc...
  if (req.url ~ "/admin"
    || req.url ~ "/preview"
    || req.url ~ "/auth"
  )
  {
    set req.http.X-Cache-Mode = "bypass";

    return (pass);
  }

  # avoid delivering any content from cache if logged in. but only at this point, to make sure that static files are delivered from cache
  if (req.http.Cookie ~ "(?i)(loggedin)") {
    set req.http.X-Cache-Mode = "bypass";
    return (pass);
  }

  # if this point has been reached then you are in (mostly) static territory, so remove cookies to make sure that this request gets cached
  unset req.http.cookie;

  # here you have only 3 types of pages: home, articles and the rss feed. everything else does not need to even be passed over to the application since it will always result in a 404.
  # helps a lot with bad bots and script kiddies
  if (req.url != "/" && req.url !~ "^/articles/" && req.url != "/feed") {
    return (synth(404, "Not found"));
  }

  # at this point you can assume that you are dealing with static content only
  set req.http.X-Cache-Mode = "static-content";

  return (hash);
}

# here you process all the request coming back from your backend, be it static files or content from your application
sub vcl_backend_response {
  # to make sure we can see in response headers what mode has been set, for debugging purposes only
  set beresp.http.X-Cache-Mode = bereq.http.X-Cache-Mode;

  # Never cache 50x responses
  if (beresp.status == 500 || beresp.status == 502 || beresp.status == 503 || beresp.status == 504) {
    return (abandon);
  }

  # if it's content that should not be cached, return right away without any further processing
  if (bereq.http.X-Cache-Mode == "uncachable" || bereq.http.X-Cache-Mode == "bypass")
  {
    return (deliver);
  }

  # Large static files should be streamed and never cached
  if (bereq.http.X-Cache-Mode == "static-file-stream") {
    unset beresp.http.set-cookie;
    set beresp.do_stream = true;

    return (deliver);
  }

  # at this point we should have only x-cache-mode == "static-content" and "static-file" stuff, so cache them for 7 days
  if (bereq.http.X-Cache-Mode == "static-file" || bereq.http.X-Cache-Mode == "static-content") {
    set beresp.ttl = 7d;
	
	# important: make sure that the browser does not cache content as well, otherwise the browser cache would take precedence
	set beresp.http.Cache-Control = "no-cache, private";

    unset beresp.http.set-cookie;

    return (deliver);
  }

  return (deliver);
}

Debugging tip

If you want to see how Varnish interprets these headers, you can set some additional debugging headers. First you want to know how Varnish interprets those caching headers it receives from its backend:

sub vcl_backend_response {
  set beresp.http.X-Original-TTL = beresp.ttl;
}

This way you know what time-to-live (TTL) Varnish has initialy interpreted. But that is only for the initial backend fetch. Afterwards, when Varnish stores the object, upon delivery, you can see additional information about its current state, like wether its being delivered from cache (HIT) or from the backend (MISS), how many successful requests from cache (X-Cache-Hits) already haven been made, and the remaining TTL until the object expires (X-Cache-Remaining-TTL).

sub vcl_deliver {
  if (obj.hits > 0) { # Add debug header to see if it's a HIT/MISS and the number of hits, disable when not needed
    set resp.http.X-Cache = "HIT";
	set resp.http.X-Cache-Hits = obj.hits;
    set resp.http.X-Cache-Remaining-TTL = obj.ttl;
  } else {
    set resp.http.X-Cache = "MISS";
  }
}

Cache invalidation

Caching is actually the easy part. Cache invalidation is the tricky part. The most easy way to do this is on any content change just to purge everything and start all over (maybe even with a warmup script). The simple way is to purge by url. And in some cases you may even need to purge by tags. But that's for another article.