Last Monday I showed BlogTrace 0.1 alpha to Lilia, Rogier and Robert de Hoog (he discussed the analysis of one of my chess games over email; that is old technology Robert). BlogTrace is intended to become an infrastructure to support weblog research. It currently consists of a blog spider, Sigmund and a special feature that supports in locating what we call ``knowledge flows'' in weblogs (more about that in later posts).
The basic architecture of BlogTrace itself is simple. The spider turns the archives of a weblog into an RSS 1.0 feed (which is then in RDF) and also generates a link structure in an RDF ontology we defined, the RSS feed is subsequently passed to Sigmund, so we also know what the blogger is talking about.
A nice characteristic of this architecture is that we can take advantage of Semantic Web technology that is becoming available. I am using the Semantic Web library part of SWI-Prolog and today discovered that I misread the documentation. It says:
The current SemWeb library distributed with SWI-Prolog does not yet contain an OWL module. A module owl.pl is part of the Triple20 triple browser and editor provides limited support for OWL reasoning.
We have an undocumented feature here. After looking at the source code in owl.pl it quickly became clear that a lot of OWL support was already implemented. Knowing Jan Wielemaker, the creator of SWI-Prolog, it probably works as well.
The BlogTrace link ontology defines a simple link (i.e. the result of an HTML: a href="... instance of a simple link ...") as follows in human readable RDF / N3 format (details omitted to save space):
link:SimpleLink a rdfs:Class; rdfs:comment "A simple link between documents."; rdfs:label "SimpleLink"; rdfs:subClassOf link:Link. link:sourceDocument a rdf:Property; rdfs:comment "Document that contains the link."; rdfs:label "sourceDocument"; rdfs:range foaf:Document. link:targetDocument a rdf:Property; rdfs:comment "Document that contains the link target."; rdfs:label "targetDocument"; rdfs:range foaf:Document.
Suppose we want to find out whether a SimpleLink is not a just a simple href but a link between two weblog posts. Given the above and the fact that weblog posts are represented as an rss:item in BlogTrace, we can define the notion of a WeblogPostLink in OWL as follows:
link:WeblogPostLink rdfs:subClassOf link:SimpleLink; rdfs:comment "A WeblogPostLink is a SimpleLink if and only if both the source and the target documents are weblog posts (RSS items)"; rdfs:label "WeblogPostLink"; owl:intersectionOf (link:SimpleLink [ a owl:Restriction; owl:onProperty link:sourceDocument; owl:someValuesFrom rss:item ] [ a owl:Restriction; owl:onProperty link:targetDocument; owl:someValuesFrom rss:item ]).
And it works!
Weblogs were the first to take advantage of Semantic Web technology on a large scale, and now weblog researchers can take advantage of Semantic Web technology.
I am very, very happy.
Me too :)
Posted by: Lilia | December 22, 2004 at 11:54 PM
I'm not normally dumbfounded when technology that I somewhat understand actually works. This is going to be wonderful.
Posted by: Anjo | December 23, 2004 at 12:07 AM