Blocking malicious domains using Squid

Squid has to be my all-time favourite open source project. I’ve used it extensively in my own projects, and Squid formed a key part of my lecturing to finalists in my spell as an academic and consultant. Every student that finished my final year networks course would have encountered my Squid build worksheet!

One of the nice features of Squid is the extensibility of the platform, the quality of the product, and clarity of configuration files. Combining Squid with some scripts can lead to some interesting security solutions. In this blog post I’m going to talk about how Squid can be used to block malicious domains, using dynamic data downloaded from the Internet and some very simple scripting.

The concept and design

So what is the goal here? The outcome should be a dynamically-updated Squid that takes in real-time domains suspected of dishing out malware and spam, and then ensuring no user can access them via the proxy. Fileless attacks? You might get less of them or avoid them entirely. Malicious URL clicked on some website advert? That might get blocked too.

This diagram sums up the overall design:

Design for using Squid to block malicious hosts and domains

There are many products out there that sell this kind of capability, but in most cases you can easily roll your own and achieve a level of capability that you did not have before. Even without using domain lists from the Internet, you can use this kind of technique to blacklist domains you don’t want your users to be able to access; for example, webmail, particular TLDs, competitor careers sites—the sky is the limit.

Blacklisting and whitelisting using proxies is also best practice, and you’ll find it buried in CESG Architectural Patterns and many standard security controls such as the IS1 BCS. You could, for instance, use malware domain filtering for the general case, but require whitelisting for highly-secure systems such as database servers.

The backbone of this solution is the ability to retrieve “bad domain lists” from information security sites.

There are numerous sources of malicious domains on the Internet, and these can be readily found with a bit of googling. I’m not going to recommend any particular sources in this post, and have removed the sources I use. However using the scripts below it should simply be a matter of plugging-in the required URL and you’ll be up and running.

Malicious domain sources

So what kinds of domains are we getting from these sources? Lots of different domains, some no doubt including hostnames. The primary challenge in processing real-time bad domain files is they can be of varying format. It’s not going to be in a nice-to-use SLD+TLD listing. Hostnames and domains may be provided. All of this means the file has to be tidied up to remain effective.

SLD generalisation

One of the decisions I’ve made in the scripts below is to generalise each entry to a Second Level Domain (SLD). What do I mean by that? Well, if I have an entry of host0293.webhost92.com, I’m taking the approach that I’ll block everything under webhost92.com and discard the entry for “host0293”.

The astute of you will realise this is not a fool proof strategy, and there are some TLDs that use SLDs as a point for further domain registration. The classic case being the .uk ccTLD, which only allowed registration under SLDs until 2014.

So my filters below could, in theory, generalise malware1234.co.uk to co.uk, and prevent access to all co.uk sites. This is where my whitelist comes into play, and incrementally I’m updating that as I need do. In essence this is a balancing act between generalising to catch more malware, and remaining so specific as to get benefit in niche cases. This generalisation of domain names could in fact be removed entirely.

Adding a blocking capability to Squid

The first step is to configure Squid to start filtering on the basis of a “domainblock” file. Adding the following to the top of /etc/squid/squid.conf is what is needed:

# BLOCK bad domains list
acl bad_url dstdomain "/etc/squid/domainblock"
http_access deny bad_url
deny_info ERR_MALICIOUS_ACCESS_DENIED bad_url

As the above suggests, we will soon have a list of domains to block in the domainblock file under /etc/squid and we will want Squid to block using them.

Create the real-time block file

The next step is to create the domain block file. A Bash script is the best option for this, which looks like the following:

#!/bin/bash
# Do all of the work in the temp folder
cd /tmp
# Retrieve URL source #1 (ascii text file of variable length domains)
wget -O 1.txt <URL1>
# Same as above but for URL source #2
wget -O 2.txt <URL2>
# tidy up both files (remote leading periods, etc.)
sed '/^#/ d' < 1.txt > 1b.txt
sed '/^#/ d' < 2.txt > 2b.txt
sed -e 's/^/./' 2b.txt > db.txt
sed -e 's/^/./' 1b.txt >> db.txt
# Remove the first line of the input file
sed -i '1d' db.txt
# Post-process the domain files to get to a suitable format for Squid
grep -o '[^.]*\.[^.]*$' db.txt > db_new.txt
sort db_new.txt | uniq -u > db_new2.txt
sed -e 's/^/./' db_new2.txt > dbaa.txt
# Remove any domains that have been whitelisted in the configuration
grep -v -x -F -f /etc/squid/wldomains dbaa.txt > db3.txt
# The finished file for Squid
cp -f db3.txt /etc/squid/domainblock
service squid restart

Apologies for the inefficient script file above, it needs to be tidied up and I need to use “sed -e” properly! The result of the above code is a series of domains, potentially FDQNs with hostnames.

Generalising the block list to SLDs

As I explained earlier, I want to generalise to the SLD level. I am doing this because I want to use indicators under an SLD as enough of a reason to block the entire SLD.

A small piece of Perl will help in this case:

#!/usr/bin/perl -w
use strict;
my $filename = "db.txt";
open (my $fh, $filename)
   or die "Could not open file";

while (my $row = <$fh>) {
   chomp $row;
   printf "%s\n", $row.split('.').last(2).join('.');
}

Whitelisting (trusted domains and sites)

Let’s add in some whitelisted domains. We can solve some our SLD worries by adding the following to /etc/squid/wldomains:

.org.uk
.co.uk
.gov.uk
.ltd.uk
.ac.uk
.nhs.uk
.police.uk
.plc.uk
.net.uk
.sch.uk
.yahoo.com
.yahoo.co.uk

The last two are particularly significant entries, due to the furore over Yahoo’s search engine allegedly acting as a host of cached malware. If you don’t know much about that, it’s worth reading up on. A calculated risk perhaps!

Automate it all!

And finally we need to automate all of this to get the ‘real time’ benefit. Its quite easy to do that using Cron.

Now at this point a word of caution is necessary: hammering a source server for their block lists is not a good idea and will probably cause the developer to reconsider making it available in the first place. You might get the inglorious privilege of your own entry in the file you’re downloading.

I can’t imagine refreshes under a day would be considered reasonable, so select a longer interval.

It may be better to mirror the sources using wget -N to use last modified data if available, and for good measure ensure your requests go via your proxy!

User experience – the “access denied” notification

In the Squid configuration insert, I created the following line:

deny_info ERR_MALICIOUS_ACCESS_DENIED bad_url

This causes Squid to serve up a page from /usr/share/squid-langpack/templates of the same name.

Have a look in this folder and copy, say, the ERR_DNS_FAIL page and update it to your desired message.

And that’s more or less it. The result of accessing a blocked domain? Squid will prevent access from ever reaching the domain and display an error message to users.

Conclusions

To get the most benefit from using Squid, particularly as a security control, you’ll need to combine it with a stateful firewall that denies outbound access for web except for requests originating from your proxy. (Or you could use two NICs.)

So how many domains am I blocking? At the moment, it’s 14,000 and I’m pretty happy with that result. It’s a good security measure for zero cost apart from development time.