We have embarked on a possibly challenging next step in our weblog conversation research. At the moment most of the results are based on terms, the next step is to see whether such conversations are on (shared) topics. The most sensible thing seemed to start by looking at a blog in which the posts have been "carefully" metadata-ed by topic: Lilia's Mathemagenic blog was selected initially. (Contact me or Lilia if you would like to volunteer your carefully metadata-ed blog by topic :-)).
The initial step was to run Sigmund to extract the terms, and then for each topic determine which terms are most significant for it. This, of course, does not result in a generally agreed set of terms that best describe a particular topic as it is heavily biased by the blog author. Some examples from Lilia's blog (topic in bold, most significant terms follow):
blog reading (39 posts): reading, blog, feed, reader, really simple syndication, piece, post, comment,, subscription, blogger, really simple syndication reader, conversation, attention, news aggregator, really simple syndication feed, wonder, browser, audience, writer, frame, case, stuff, blog research.
e-learning (34 posts): learner, support, course, instructor, product, education, training, tool, learning anagement system, knowledge management, knowledge, community, online, article, classroom, craft work, internet, report, resource, strategy, link, jay cross, workplace, topic, worker.
life (58 posts): family, travel, friend, fun, feeling, home, city, story, birthday, colour, love, vacation, light, detail, russian, night, house, difference, hour, culture, hope, phone, email, country, window.
methodology (32 posts): researcher, ethnography, research, study, artifact, phd, paper, analysis, interpretation, participation, datum, understanding, observation, choice, note, reflection, personal experience, reading, blog research, blogger, experience, interview, practice, blog, theory.
The results seem reasonable, and I'm wondering how biased they are by Lilia's personal preferences. Aside, would it not be nice to have an option to search for "all posts on my view on e-learning". Perhaps the list above also shows that the definition of the word topic itself is not entirely clear. For some reason or another I have strong associations with the notion of viscosity when thinking about topic.
The next step was to see whether the topics themselves are related. Lilia often uses multiple topics for a post, which appears sensible, this also implies that topics themselves are related (in particular the notions of broader and narrower topics). The figure below shows related topics according to Lilia's blog (note that the nodes are topics and not terms!). The similarity metric used was overlap of the most common terms that describe the topic.
Again, the picture seems to make some sense. It is at least nice to see that the "personal" topics are clustered, and that the "PhD" topic is central and heavily linked.

Hi Anjo,
I would guess that my blog is pretty well indexed by topic:
http://matt.blogs.it/topics/
and I would be interested. I'm not 100% sure that all the topic pages are up to date right now (since I moved to my new blogging tool) so please let me know if you want to run Sigmund over my blog and I will ensure everything is properly linked and up to date.
Regards,
Matt
Posted by: Matt Mower | March 05, 2006 at 12:39 PM
Hi Matt,
It would be easiest for me to have a full text RSS feed of your blog. If the feed contains the topics that is sufficient. For Mathemagenic I used an OPML file to relate posts to topics, but basically any well-strucured index is ok.
My email is "anjo science uva nl".
Anjo.
Posted by: Anjo | March 05, 2006 at 01:10 PM