It has been the week of noticing that Lilia is back in person, and several meetings to plan research. Having finished all paperwork that inevitably is associated with the latter, it is time to return to the real stuff.
Andy Boyd came up with a wonderful new term: "ontological fingerprinting" and to illustrate how imaginative he is: zero hits on Google! Suppose one has an ontology (lexicon, thesaurus) and some software that can determine whether the terms in the ontology are present in a document. Applying the software, one gets a "fingerprint" of the concepts in the ontology for a given document. Comparing fingerprints for different documents, such is the assumption, provides a better metric of the similarity between these documents than comparing plain words. Ideas like this simply have to be tested in practice. Fortunately, Andy is making available a lot of real data to try it.
Gee zero hits on Google never thought of that as being something to sing about before, mind you it was more of a collective thinking that went into this idea (thought you were involved in that conversation?) rather than just me. But maybe we could copyright/trademark Ontological Fingerprinting or OF for short. We could even register www.ontologicalfingerprinting.org/com/nl it's so cheap to do it these days!
Mind you we have to see if we can pull it off (or should I type pull it OF!) first.
A
Posted by: Andy | January 28, 2005 at 07:28 PM
You are a funny guy! The term has now been claimed and we are going to be famous... (oops, you where already famous :-)).
Posted by: Anjo | January 28, 2005 at 10:15 PM
Anjo,
I had a master student experimenting with this very idea some years ago. We compared TFIDF scores and similar scores based on ontologies, using the super/subclass hierchies. We also experimented with including sibling nodes in the hierarchy. The ontology based methods were far superiour to the bag of word based approaches. The cosines in the TFIDS vectors are usually not very small, in the ontology-based metrics the angles sometimes are close to zero. Several documents were detected in this way which were very similar and which turned out to be various versions of a paper by the same author about the same research.
Real ontology fingerprinting I would say.
Bob
Posted by: Bob Wielinga | February 01, 2005 at 09:45 PM