Archive for June 2003


Coldplay – Clocks

Listening to “Coldplay – Clocks”. I heard it on the radio on the way home from the airport and thought it was either U2 or String. I found it though google by searching for U2 OR Sting “nothing else compares” lyrics. Regardless its awesome, so its today’s continuous loop.

In general thats my trick for finding songs. Put a bunch of lyrics in quotes, add the word lyrics and search on google, like this. You can find the lyrics for “Clocks” here:


It occurred to me that I had posted my e-mail address to this website without first protecting it. And believing, as most programmers do, that I could solve this problem on my own I set out to find out what methods would be effective in combating spambots or e-mail address harvesters. What we are fighting against here is a program that loads a web page and scans it for character sequences that look like e-mail addresses. They are looking for ‘text@text.text‘ where text is any legal set of URL characters. It would be easy to spend an afternoon and whip up a reasonably proficient harvester in Perl.

Searching Google I found 2 main methods for hiding e-mail addresses on web pages. The first involves converting the e-mail address to an HTML Unicode string. the second involves using the document.write(); function in JavaScript to safely output the e-mail address.

Of the two methods the JavaScript version strikes me are the more robust. It would be easy to decode the Unicode strings using a lookup table in Perl before scanning the document for addresses. So perhaps no spambot is that smart at the moment but if the technique gets more wide spread spammers will catch on. By using JavaScript we force the spambot to work harder. It must intemperate the script to harvest the address. The odds of a spammer going to such lengths are low at the moment. Writing a full JavaScript interpreter is too costly. It is more likely that they might look for the ‘@’ symbol in the JavaScript and attempt to concatenate the strings around that symbol into an address. JavaScript is a powerful enough language that we can foil this attack by doing some acrobatic string manipulation.

I also wanted the solution to be easy to use for all the e-mail addresses.


This JavaScript inserted into anywhere on the page;

<script type="text/javascript">mailto("user");</script>

Will produce the following HTML fragment:

<a href=""></a>

Supplying 2 arguments allows you to specify the mailbox and the domain:

JS:   <script type="text/javascript">mailto("user", "");</script>
HTML: <a href=""></a>

Supplying 3 arguments allows you to specify the link text, mailbox and domain: 

JS:   <script type="text/javascript">mailto("send mail here", "user", "");</script>
HTML: <a href="">send mail here</a>

Further Obfuscation

To make things even harder for the spambot we wont embed the script on each page. Besides, code should be written down in one place not copied all over your website. You can include a script in your webpage by adding the following line to your page template in the header section:

<script src="" type="text/javascript"></script>

We can bring in the unicode technique and apply the @ symbol in as a unicode character:

function buildMailAddress(mailbox, domain) { return mailbox + "&#64;" + domain; }

You can download a copy of the whole script here. Feel free to use and adapt it as you need.

Further Development

It would be interesting to see a filter for Apache Tomcat that scanned outgoing pages for mailto: links and then replaces them with the appropriate JavaScript block that called HideMail. this solution would be useful to large scale enterprise that are trying to cut down on spam but want to make their employees e-mail addresses available on the Internet. It wouldn’t be necessary to alter any web pages of put any procedures in place. This will have to wait for another day.

Spam Spam Spam

It occured to me that spambots or e-mail harvesters might vacume these pages for my e-mail address. Now nobody wants to get spam, least of all me. A quick look around the web reveals 2 commin strategies for avoiding the spambots.

One is the convert your e-mail address to an HTML Unicode sequence. this is cumbersom and anoying unless its a single address youare doing this for. The seccond method uses JavaScript’s document.write() function to output your e-mail addresses from code. I found several script solutions online but none were general enough for me. I wrote my own, HideMail, and you can see the source here.

Obviously if you dont have JavaScript enabled in your browser you wont see the mailto: links. Too bad, you must not want to bad enough, or your a SpamBot. Either way Im not too concerned.

While i was writing HideMail and testing it on the links page I couldnt understand why the links.html file was not changing when I rebuilt that weblog. In desperation I wiped the template source hoping MT would pick it up from the file again. Oops, i didnt compleatly wipe the source from the input box and the file was emptied. Then i noticed why it wasnt updating int he first place, i hadn’t copied it localy and made the necessary changes.

So someone please fix MT so that in ‘link template to file’ mode you cant edit the template. Period. Always use the file, dont even show the text box, simple.

The Matrix Reloaded

I saw Reloaded last night as it has finally wended its way to the islands this week. It was a very good show in my opinion. I discussed it with a friend who suggested that the action sequences were too long and become boring.

The action sequences are in no way integral to the plot. You could cut them out of the movie and loose nothing essential.The action just appears to be filler. Very nicely choreographed, Kung Fu, gun slinging, Katana wielding, CGI assisted filler. In the first movie the action sequences were molded into the plot. Neo was discovering his powers and each new action sequence shown some new ability or new side of his character. Ignoring the action I think it was a great movie, lots of deep stuff to ponder and enough evidence to support just about any possible outcome in the final installment.

In other news the website tweaking continues. The two styles were merged without incident. CSS and XHTML will validate perfectly once I get the character encoding into the templates, then the W3C buttons will go up. The links section is up, once again MoveableType is proving its flexibility.