« Using a GPS for Planning Cycling Trips | Main | IQ test »

Sigmund goes Public

We are happy to announce that the tool associated with the Shared Conceptualizations in Weblogs paper at BlogTalk 2.0 will be made public. The slides of the presentation, with some screendumps of the tool, have been made available by Lilia Efimova earlier.

One of the main motivations to make the tool public is that quite a few (how many would that be?) people asked for it. And there also seems to be a genuine interest in doing text analysis on weblogs and not "just" counting links and such. The tool, now named Sigmund still needs some work before it can be of general use. Some issues of consideration, partly based on the feedback we got.

Format. Sigmund has now been rewritten to work on the RSS 2.0 specification. Although RSS 2.0 was intended to just provide links to the latest 15 posts of a weblog it can be used to store an entire weblog (which is what Sigmund likes!). We would be more than happy to discuss RSS 2.0 extensions for the purpose of textual analysis of weblogs. As it stands, all that Sigmund really needs can be represented in RSS 2.0.

Dictionaries. This was our biggest concern regarding making Sigmund public. We use CELEX, a highly sophisticated dictionary for English, German and Dutch. Unfortunately, CELEX is licensed material (why are we not allowed the rights to the languages we create?). We have worked out a scheme to be able to use parts of the CELEX material that does not violate the uniqueness of the CELEX project and is still sufficient for Sigmund to work. This scheme still has to be implemented.

Multi-lingual versions. Several people who commented on Sigmund would like to use it for "non-CELEX" languages. The scheme hinted at above allows that. Although it is, of course, still necessary to obtain dictionaries for these languages. We are not expecting any serious problems here.

All in all, we are very pleased with the comments we got on the BlogTalk presentation and hope that the release of Sigmund will provide valuable input to our core business: text analysis of unstructed data.

An initial beta-release of Sigmund is planned for the middle of September (fingers crossed).

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/17700/1012959

Listed below are links to weblogs that reference Sigmund goes Public:

» Blogademia - weblog research blog by Scott Nowson from Mathemagenic
Another weblog research blog - Blogademia [Read More]

» Our BlogTalk paper: Shared conceptualisations in weblogs from Mathemagenic
Here at comes: paper on our work presented at BlogTalk . [Read More]

» Sigmund says from Curiouser and curiouser!
Sigmund on the US Presidential Debate . [Read More]

» Social computing symposium: BlogTrace demo from Mathemagenic
I'm presenting today our work on BlogTrace [Read More]

Comments

Post a comment

If you have a TypeKey or TypePad account, please Sign In