I have started a small experiment: is it possible to extract an ontology from a weblog? Selecting an appropriate weblog was not very difficult: Chocolate and Zucchini, a wonderful cooking blog. This also means that the ontology will be about cooking and this creates a slight problem; I don't cook. Another complication is that a lot of the Chocolate and Zucchini blog (CZ) uses French terms. All of these considerations are challenges to ontology engineering tools. It commonly happens that an ontology is constructed by a knowledge engineer (who knows about ontologies in general) in cooperation with a domain expert (who has knowledge about the domain at hand). A knowledge engineer who is not a domain expert in cooking is an extremely unfortunate case.
The objective of the experiment is manifold:
- First of all I would like to find out whether the tool I'm using is really suitable for ontology development from documents (a weblog in this case). Several people, including knowledge engineers and domain experts, have used the tool in question (called tOKo). Rather than waiting for bug reports and requests for usability enhancements from professional users it might be an idea to use it myself (and given I'm using CZ have some fun in the process).
- Secondly, I would like to get a feeling to what extent a set of documents (the CZ blog) is sufficient to develop an ontology. This relates both to the browsing functionality of tOKo (i.e. can one find the meaning of a concept inside the weblog) and the potential benefits of integrating external sources.
- The main reason for the underlying research is finding a way to structure a domain (=cooking in this case) such that it supports end-user activities. One end-user activity that is very obvious is searching and the idea would be that searches based on an ontology might produce better results than searches using words (as provided by general purpose search engines). A related activity is finding similar documents. For example, given a blog post, find similar posts again based on the ontology.
- A final motivation is that so far tOKo has been used on data and domains that are confidential. This means that it is difficult to write documentation and also to blog about it.
There is also a potential relation to social tagging tools. Most of these tools, such as delicious and assigning categories to weblogs, are very simple to use. Constructing an ontology that also makes sense to others is more difficult and thus many people are reluctant towards ontologies, also given the formal origin. Hopefully a tool like tOKo, with the intention to make developing ontologies as easy as possible, can point bloggers in the direction of structures that have meaning and can be frivolous and social as well.
PS. I hope Clotilde Dusoulier, the blogger behind CZ, reads this post as I will be using her intellectual work as an example while running the experiment. Otherwise, I hope that by the time Clotilde finds out she hasn't baked a cake with the intention of throwing it at me :-).
Looking forward to seeing the results of this, Anjo.
Posted by: peter caputa | August 28, 2005 at 10:23 PM