Moving all non-essential config from Nginx to Varnish Cache

That way we define one single source of truth for all business logic related to incoming requests, caching, header transformation, etc...

To bring down load on our server even further I want to introduce Varnish Cache into the tech stack, which will cache quite a lot of requests and deliver them blazingly fast out of memory.

But here's the kicker: Varnish can not only handle caching, but due to it's super simple yet powerful VCL - Varnish Configuration Language - it can handle all different kinds of things. Things like 404 handling, blocking of bad bots, redirects, authentications, rate limiter, header transformations in general, and more, all in one place. The community edition is open source and it's powerful enough to handle most non-super-enterprise cases out of the box.

The easiest approach I've found to start with is to first move all non-essential business logic from my server, be it Apache or Nginx, to Varnish. I like to have everything in one place, a single source of truth.

To get started with installation and basic configuration, you can follow this nice tutorial by the makers themselves. There are more tutorials out there which handle basic configuration with your server, like Nginx for SSL Termination since Varnish Cache Community Edition only runs on Port 80. Just search for Varnish Cache Nginx Setup and you will find plenty basic tutorials.

So, everything covered so far in the last articles can be moved to Varnish, which would look something like this:

vcl 4.1;

# Default backend definition. Set this to point to your content server, for example Nginx, which retrieves static files, passes data through to PHP, etc...
backend default {
    .host = "127.0.0.1";
    .port = "8080";
}

sub vcl_recv {
    # Happens before we check if we have this in cache already.
    #
    # Typically you clean up the request here, removing cookies you don't need,
    # rewriting the request, etc.

	call custom_access_check;
	call custom_old_files_handling;
	call custom_redirects;
}

sub vcl_backend_response {
    # Happens after we have read the response headers from the backend.
    #
    # Here you clean the response headers, removing silly Set-Cookie headers
    # and other mistakes your backend does.
}

sub vcl_deliver {
    # Happens when we have all the pieces we need, and are about to send the
    # response to the client.
    #
    # You can do accounting or modifying the final object here.
}

# handles blacklist and whitelist access checks 
sub custom_access_check {
  # block some known bad user agents
  if (req.http.User-Agent ~ "(?i)(MSIE [2-9]|Firefox/[2-5]|rv:[2-5]|Mozilla/[1-4])"
   || req.http.User-Agent ~ "(?i)(curl|ruby|python|go-|java|mozlila|wordpress)"
  ) {
    return (synth(403, "Forbidden"));
  }

  # block some extensions and some general urls, like wordpress folder /wp/ or anything wp-* related
  if (req.url ~ "(?i)(/\.|\.php|\.asp|\.cgi|\.jsp|\.json|\.yaml|\.yml|\.zip|\.md|\/wp\/|\/wp-)") {
    return (synth(403, "Forbidden"));
  }
}

# handles all kinds of redirects
sub custom_redirects {
  # redirect full url
  if (req.http.host == "mydomain.com") {
    return (synth(301, "https://www.mydomain.com" + req.url));
  }

  # redirect only to new domain without full url
  if (req.http.host == "alternative-domain.com") {
    return (synth(301, "https://www.newdomain.com"));
  }
}

# requests to old urls due to migrations should not be considered anymore
sub custom_old_files_handling {
  if (req.url ~ "^/(?i)(media|static|some-other-folder)") {
    return (synth(404, "Not found"));
  }
}

You may have noticed that handling old files (404s) is called before redirects are handled. Could be a debatable approach. My take is this: Some bots follow redirects which would mean two requests (30x and 404) instead of just one (404). So we reduce the amount of requests in such cases. This could irritate some bots because it seems like some old domain is redirecting part of traffic to somewhere else, and another part of traffic returns only 404s. But who cares about bots anyway?!

Bear in mind that returning from a subroutine means not returning just from that point within the subroutine back to vcl_recv, like functions in C or PHP, but a return right from vcl_recv.

Cleanup

Afterwards you can cleanup your server config (be it Nginx or Apache) and leave only the essentials parts in it. My config looks like this in most cases:

 1 server {
 2     listen 8080;
 3     listen [::]:8080;
 4     server_name .mydomain.com;
 5     server_tokens off;
 6     root /var/www/mydomain.com/public;
 7 
 8     set_real_ip_from 0.0.0.0/0;
 9     real_ip_header X-Forwarded-For;
10     real_ip_recursive on;
11 
12     charset utf-8;
13     index index.php;
14 
15     location ~* ^.+\.(jpg|jpeg|gif|png|bmp|webp|ico|svg|tiff|css|js|txt|json|map|mp3|wma|rar|zip|flv|mp4|mpeg).*$ {
16         access_log off;
17         log_not_found off;
18 		# this is the important part: either this file exist or it does not, no fallback
19         try_files $uri =404;
20 
21         expires 30d;
22         add_header Pragma public;
23         add_header Cache-Control "public, max-age=1209600";
24     }
25 
26     location ~* /(storage|css|js|build|vendor|images)/ {
27         access_log off;
28         log_not_found off;
29 		# this is the important part: either this file exist or it does not, no fallback
30         try_files $uri =404;
31         autoindex off;
32 
33         expires 30d;
34         add_header Pragma public;
35         add_header Cache-Control "public, max-age=1209600";
36     }
37 
38     location / {
39 	    # for everything else we can try for static file access, otherwise fallback to the application
40         try_files $uri $uri/ /index.php?$query_string;
41     }
42 
43     access_log /var/log/nginx/mydomain-access.log;
44     error_log  /var/log/nginx/mydomain-error.log error;
45 
46     error_page 403 /403.html;
47     error_page 404 /404.html;
48     error_page 503 /503.html;
49 
50     location ~ \.php$ {
51         fastcgi_split_path_info ^(.+\.php)(/.+)$;
52         fastcgi_pass unix:/var/run/php/php-fpm.sock;
53         fastcgi_index index.php;
54         include fastcgi_params;
55     }
56 }