Just block bad url requests in general

If you have a Laravel or Symfony or Ruby or Python or Next.js application, or some fancy flavor of a CMS, then why do you need to allow wordpress requests coming in?

Yeah, good question, that's what I think too everytime I survey access logs to find ways to further reduce traffic and load. And that's only the tip of the iceberg.

In the last 2 articles about better handling 404s and better handling of bots I wrote about evaluating requests based on user agents. This article is about evaluating requests based on request urls.

Do you need this?

If you have an application where you definitly don't need requests with direct .php file calls, then block it. If you don't have any *.json or *.yaml or *.zip files or urls, then block them. And if you don't have wordpress, then block all wordpress-related request urls. Recently I saw a Next.js application which allowed simple wp-login.php requests, with all the language redirects and domain redirects, and finally ended in a 404, all served by the application.

But we can do better, here is a very simple example:

 1 server {
 2     ...
 3     location ~* ^.+\.(php|phtm|jsp|cgi|asp|json|yaml|yml|zip|bak|gz|tgz|tar).*$ {
 4         return 403;
 5     }
 6 	
 7     location ~* ^.*(wp-includes|wp-login|wp-cron|wp-content|wp-admin|wordpress).*$ {
 8         return 403;
 9     }
10 
11 	# or a general block by short identification which is not used anywhere in your app

12     location ~* ^.*(wp-|wordpress).*$ {
13         return 403;
14     }
15 	...
16 }

Finetuning

Of course you need to finetune it since every application is different. My go to steps for improving this part is:

  • Make an analysis first of what URLs the application really does need or does not need
  • Scan access logs for most common 404 requests and extract which requests could be summarized by file extension or other common patterns
  • Implement block rules

We can avoid processing these requests in our application and instead handle it properly way before our application gets hit. That way we avoid ugly peaks in load and traffic, especially when some script kiddie is scanning your site for any available scripts.