« Conversation Clouds | Main | Cooking: Creating a Corpus »

From Weblogs to Ontologies

I have started a small experiment: is it possible to extract an ontology from a weblog? Selecting an appropriate weblog was not very difficult: Chocolate and Zucchini, a wonderful cooking blog. This also means that the ontology will be about cooking and this creates a slight problem; I don't cook. Another complication is that a lot of the Chocolate and Zucchini blog (CZ) uses French terms. All of these considerations are challenges to ontology engineering tools. It commonly happens that an ontology is constructed by a knowledge engineer (who knows about ontologies in general) in cooperation with a domain expert (who has knowledge about the domain at hand). A knowledge engineer who is not a domain expert in cooking is an extremely unfortunate case.

The objective of the experiment is manifold:

  • First of all I would like to find out whether the tool I'm using is really suitable for ontology development from documents (a weblog in this case). Several people, including knowledge engineers and domain experts, have used the tool in question (called tOKo). Rather than waiting for bug reports and requests for usability enhancements from professional users it might be an idea to use it myself (and given I'm using CZ have some fun in the process).

  • Secondly, I would like to get a feeling to what extent a set of documents (the CZ blog) is sufficient to develop an ontology. This relates both to the browsing functionality of tOKo (i.e. can one find the meaning of a concept inside the weblog) and the potential benefits of integrating external sources.

  • The main reason for the underlying research is finding a way to structure a domain (=cooking in this case) such that it supports end-user activities. One end-user activity that is very obvious is searching and the idea would be that searches based on an ontology might produce better results than searches using words (as provided by general purpose search engines). A related activity is finding similar documents. For example, given a blog post, find similar posts again based on the ontology.

  • A final motivation is that so far tOKo has been used on data and domains that are confidential. This means that it is difficult to write documentation and also to blog about it.

There is also a potential relation to social tagging tools. Most of these tools, such as delicious and assigning categories to weblogs, are very simple to use. Constructing an ontology that also makes sense to others is more difficult and thus many people are reluctant towards ontologies, also given the formal origin. Hopefully a tool like tOKo, with the intention to make developing ontologies as easy as possible, can point bloggers in the direction of structures that have meaning and can be frivolous and social as well.

PS. I hope Clotilde Dusoulier, the blogger behind CZ, reads this post as I will be using her intellectual work as an example while running the experiment. Otherwise, I hope that by the time Clotilde finds out she hasn't baked a cake with the intention of throwing it at me :-).

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83452af8f69e200d8345b0d4469e2

Listed below are links to weblogs that reference From Weblogs to Ontologies:

» "Reading" without reading from Collin vs. Blog
I asked Lotaria if she has already read some books of mine that I lent her. She said no, because here she doesn't have a computer at her disposal. She explained to me that a suitably programmed computer can read a novel in a few minutes and record the... [Read More]

Comments

Looking forward to seeing the results of this, Anjo.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment