« Research Blogging | Main | Furious »

Ontological Fingerprinting

It has been the week of noticing that Lilia is back in person, and several meetings to plan research. Having finished all paperwork that inevitably is associated with the latter, it is time to return to the real stuff.

Andy Boyd came up with a wonderful new term: "ontological fingerprinting" and to illustrate how imaginative he is: zero hits on Google! Suppose one has an ontology (lexicon, thesaurus) and some software that can determine whether the terms in the ontology are present in a document. Applying the software, one gets a "fingerprint" of the concepts in the ontology for a given document. Comparing fingerprints for different documents, such is the assumption, provides a better metric of the similarity between these documents than comparing plain words. Ideas like this simply have to be tested in practice. Fortunately, Andy is making available a lot of real data to try it.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83452af8f69e200d83444b15953ef

Listed below are links to weblogs that reference Ontological Fingerprinting:

Comments

Gee zero hits on Google never thought of that as being something to sing about before, mind you it was more of a collective thinking that went into this idea (thought you were involved in that conversation?) rather than just me. But maybe we could copyright/trademark Ontological Fingerprinting or OF for short. We could even register www.ontologicalfingerprinting.org/com/nl it's so cheap to do it these days!

Mind you we have to see if we can pull it off (or should I type pull it OF!) first.

A

You are a funny guy! The term has now been claimed and we are going to be famous... (oops, you where already famous :-)).

Anjo,

I had a master student experimenting with this very idea some years ago. We compared TFIDF scores and similar scores based on ontologies, using the super/subclass hierchies. We also experimented with including sibling nodes in the hierarchy. The ontology based methods were far superiour to the bag of word based approaches. The cosines in the TFIDS vectors are usually not very small, in the ontology-based metrics the angles sometimes are close to zero. Several documents were detected in this way which were very similar and which turned out to be various versions of a paper by the same author about the same research.

Real ontology fingerprinting I would say.

Bob

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment