Archive for January 2005


Unique Unique ID’s

Mindshare being the unique thing that it is needs a unique ID format. Each group member generates their own member ID. These ID’s need to be unique, which is the usual requirement. Because the number of ID’s create s small and scoped within a group the size of the ID’s does not need to be that large. The 128 bit UUID format is probably overkill and Java cant generate true UUID’s anyway because the JVM lacks access to several components of that format (MAC address and a high resolution timer). My initial response to this requirement was to Base64 encode some random bits.

The other requirements are more subtle. Mindshare save everyones files to disk and cares about who owns those files, something no other P2P program cares about. The user ID may be called upon to disambiguate files in cases where two files have the same name or to store each users files in a separate directory. This puts severe restrictions on the characters that can be used as ID’s. For starters using Base64 encoding cant be used for file/directory names because it includes the ‘/’ character which is either illegal or delineates directories on Unix platforms and in URI’s.

Then there are the case-preserving, case-insensitive filesystems of the two popular desktop OS’s. Base64 is case sensitive so raw Base64 cant be reliable used because two different Base64 strings might be the same thing as a file name.

Finally Base64 uses the ‘+’ and ‘=’characters. The = is easy to avoid by packing the base 64 such that no filler is needed. The ‘+’ character falls into the punctuation group and is thusly not suitable for use in the authority section of a URI.

So for now Mindshare ID’s will be 15 bytes of secure random data, encoded with Base64, resulting in a 20 byte ID. All capitol letters will be replaced with their common equivalent (id.toLowerCase()). The characters ‘+’ and ‘/’ will be replaced with the more URI friendly ‘-’ and ‘_’ respectively. There is still the possibility of collision in this scheme by virtue of the loss of capitalisation. I consider this to be unlikely or at least as nebulous as the possibility of generating identical random bits because Java doesn’t have access to accurate spatial or chronological information. At some future time an encoder can be built that encodes raw bytes in the alphabet [a-z][0-9][-_] without changes to the protocol.

Basically this is a post to say I have thought about this issue but I am too busy/lazy to do the right thing at this point in time. This gets put off for 0.2 when we do crypto & security.

Current Work

My thesis is slated to be handed in sometime in mid February. This puts a closed time-frame on work remaining before the alpha release when real people (users) will get to squeeze the software in their greedy little hands :-).

Give the time constraints the main priority is to get tree and file swapping implemented. Events have, however, conspired to make this not so easy. I lost some important code (formatted it into oblivion) that was not too trivial to write. To save time I am cutting XML as the back end data format. Right now much of the core already works on XML. The alternate encoding will be BEncoding much less stressful to work with. It solves many problems associated with sending binary hashes inside XML that cant contain binary characters. Right now I want to strip out all the XML code and replace it with BEncoded messages instead. The work needed to do this may be too much to be worth while for the alpha.

Presence support was rewritten to be more general. Integrating this back into the Mindshare client is a bit of a pain but the result should be a more responsive presence service.

All point to point links will use BEEP for transport. Before the code loss I had BEEP working over a JxtaSocket quite well. This uncovered a bug with closing JxtaSockets and BidiPipes that is now fixed. JxtaSocket still does not support keepalives so the protocol will include a keepalive packet. In the future a second BEEP channel could be used for one to one chat.

Heading up to the release I will try to post more often so check back.

Spam Warz

I recently became the victim of comment spam. The Mindshare website in particular was hit hard by a single spammer running a botnet and posting spam from multiple IP’s. The attack package they are using has an easily identifiable signature; the e-mail field always starts with two numbers. This website has been indexed by Google which I am very happy about. This is also how the spammers find their targets. All comments now require registration for the time being. I will hopefully soon lift that restriction when I get another nifty tool installed on the server.

On the positive side my e-mail address has not been indexed by Google which proves that the Hide Mail script is indeed effective for keeping your e-mail address away from automated address harvesters. I now get less spam at than i do in misdirected spam aimed at the entire * doamin.

And at the end of the first quarter its: Spammers:1 Gareth:1