Open Source: tOKo, Sigmund, BlogTrace
I cannot remember having mentioned it before, but "one of the things" I'm currently working on is an Open Source version of the tools I have been blogging about. My experience with Open Source as a developer is a bit half-baked: I'm listed as the second author of the immensely popular SWI-Prolog environment. Although this is factually correct, I designed and wrote the first version of the GUI environment, Jan Wielemaker is maintaining it and setting a high standard doing that. Obviously, all the tools I develop are written in SWI-Prolog with some C-language code when high performance is required.
My first step towards Open Source was integrating the stable parts of tOKo, Sigmund and parts of BlogTrace into a single project (the whole is called tOKo). I more or less finished doing that last week. Most Open Source projects, contrary to what the name suggests, appear to rely on a few developers and (many) users who contribute by commenting, finding bugs etc. tOKo is currently pre-alpha, meaning that its circulation is restricted to those from whom I can expect feedback. One striking case is a student at Océ who is using the Sigmund part to extract significant terms from Océ's huge document archive. This has resulted in a more or less endless stream of bug reports, mostly related to the size of the data-set and Sigmund not being particularly fluent at Dutch. The conclusion must be that one actually needs users and their data to iron out (some of) the bugs.
What should be the next step? I received a hint during a workshop on "Digital Traces" last Friday. My contribution was a demo of the integrated version of tOKo, particularly the parts that analyse weblog communities. There are some pictures of that demonstration on Flickr I have been told. The hint was something along the lines "I would like to use this software" (apparently even Microsoft Research is interested :-)). Fine, but this is Open Source: so you'll have to contribute yourself as well.
There are globally two application areas for tOKo: ontology development and "viewing" weblogs from a different angle. The first application area is currently well-covered by projects, the second application area suffers a bit from the fact that it is awfully difficult to acquire clean weblog data. The interesting thing is that ideas resulting from weblog analysis contribute towards ontology development (and vice versa).
For the time being the next step is therefore to contact bloggers to participate. If either myself or Lilia contacts you, please consider you will be a certified contributor to an Open Source project (as well as a guinea pig)!
I'm not sure what my picture as a "blog researcher" looks like (I don't use categories). Below is Lilia's picture (based on the posts she assigned "blogging as research" compared to her entire blog; Image courtesy of tOKo which includes Sigmund and parts of BlogTrace, and Lilia for providing the data :-)).

Hi Anjo,
the pics on Flickr are in my photostream here
Posted by: Ton Zijlstra | March 29, 2006 at 12:51 PM
Hi Anjo.
I Working on my Phd Project. The research concentrate on: "Text Mining and Social Aspects on Bloger's and Virtual Community". Do you know how can i get the dataset for the research??
Thk.
Shlomi
Posted by: shlomi sela | December 26, 2007 at 11:32 AM