It occurred to me that I had posted my e-mail address to this website without first protecting it. And believing, as most programmers do, that I could solve this problem on my own I set out to find out what methods would be effective in combating spambots or e-mail address harvesters. What we are fighting against here is a program that loads a web page and scans it for character sequences that look like e-mail addresses. They are looking for ‘text@text.text‘ where text is any legal set of URL characters. It would be easy to spend an afternoon and whip up a reasonably proficient harvester in Perl.
Searching Google I found 2 main methods for hiding e-mail addresses on web pages. The first involves converting the e-mail address to an HTML Unicode string. the second involves using the document.write(); function in JavaScript to safely output the e-mail address.
Of the two methods the JavaScript version strikes me are the more robust. It would be easy to decode the Unicode strings using a lookup table in Perl before scanning the document for addresses. So perhaps no spambot is that smart at the moment but if the technique gets more wide spread spammers will catch on. By using JavaScript we force the spambot to work harder. It must intemperate the script to harvest the address. The odds of a spammer going to such lengths are low at the moment. Writing a full JavaScript interpreter is too costly. It is more likely that they might look for the ‘@’ symbol in the JavaScript and attempt to concatenate the strings around that symbol into an address. JavaScript is a powerful enough language that we can foil this attack by doing some acrobatic string manipulation.
I also wanted the solution to be easy to use for all the e-mail addresses.
Usage
This JavaScript inserted into anywhere on the page;
<script type="text/javascript">mailto("user");</script>
Will produce the following HTML fragment:
<a href="mailto:user@defaultdomain.com">user@yourdomain.com</a>
Supplying 2 arguments allows you to specify the mailbox and the domain:
JS: <script type="text/javascript">mailto("user", "customdomain.com");</script>
HTML: <a href="mailto:user@customdomain.com">user@customdomain.com</a>
Supplying 3 arguments allows you to specify the link text, mailbox and domain:
JS: <script type="text/javascript">mailto("send mail here", "user", "customdomain.com");</script>
HTML: <a href="mailto:user@customdomain.com">send mail here</a>
Further Obfuscation
To make things even harder for the spambot we wont embed the script on each page. Besides, code should be written down in one place not copied all over your website. You can include a script in your webpage by adding the following line to your page template in the header section:
<script src="http://yourwebsite.com/HideMail.js" type="text/javascript"></script>
We can bring in the unicode technique and apply the @ symbol in as a unicode character:
function buildMailAddress(mailbox, domain) { return mailbox + "@" + domain; }
You can download a copy of the whole script here. Feel free to use and adapt it as you need.
Further Development
It would be interesting to see a filter for Apache Tomcat that scanned outgoing pages for mailto: links and then replaces them with the appropriate JavaScript block that called HideMail. this solution would be useful to large scale enterprise that are trying to cut down on spam but want to make their employees e-mail addresses available on the Internet. It wouldn’t be necessary to alter any web pages of put any procedures in place. This will have to wait for another day.