« Knowledge vs. Information | Main | The Power of Programming »

Don't worry, be happy

Lilia appears worried, in Facts on archiving, search and retrieval she quotes Piers Young, who on his turn quotes: JAA Sillince, 1992, Literature searching with unclear objectives: a new approach using argumentation. On-line Review, 16 (6), 391-409. The message is that "If two groups of people construct thesauri in a particular subject area, the overlap of index terms will only be 60%."

What is the problem? I would be very happy if Google and myself would overlap 60% of the time. One of my colleagues, Suzanne Kabel, has compared keyword and ontology-based search and retrieval extensively. A recent paper is: Suzanne Kabel, Robert de Hoog, Bob J. Wielinga, and Anjo Anjewierden, "The Added Value of Task and Ontology-Based Markup for Information Retrieval", Journal of the American Society for Information Science and Technology, 55:4, pp. 348-362, February 2004. The paper does not conclude with a quote-bite like "the overlap of index terms will only be 60%". Suzanne, modestly, concludes that most of the results are not statistically relevant, but that it appears that ontology-based indexing produces better retrieval results than keyword-based indexing.

Although I still rely on Google for most of my queries the irritation is growing. The technology they use seems to have reached its limits. I'm happy new roads are being explored and, of course, it will be bumpy before it gets usable.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83452af8f69e200d834562dc269e2

Listed below are links to weblogs that reference Don't worry, be happy:

» Classifications for archiving, search and retrieval from Mathemagenic
There is a couple of follow-up discussions for Facts on archiving, search and retrieval . [Read More]

Comments

Hi Anjo,
As you say, this may not be worth panicking about, but one problem with this lack of overlap is e.g. letting less knowledgeable people search effectively for concepts used by domain experts (and so learn?).

The Sillince article, rightly I think, suggests that while the general keyword-thesaurus-taxonomy-ontology approach is helpful for automated, computer agent information retrieval, they may be less useful for "human agent" information retrieval (with all its vagaries). He suggests that in a number of cases rhetorical search may be more effective. THough to my knowledge this hasn't been developed yet (!)

Piers,

In a paper to be published, Suzanne addresses the indexing problem using non-experts (psychology students) on the same domain (articles about Gorilla's). Ontology-based indexing is an emerging technology, and I would be very surprised if it would be less useful than traditional IR in a couple of decades.

Perhaps the most interesting thing about using ontologies is that there is no reason that they need to be about the "domain" (roughly topics in the document). A simple example is taking the length of the document into account, such that it becomes possible to query for "long documents about Gorilla's". Then, two indexers or an indexer and a retriever may not agree on what a "long document" is, but the likelihood that a relevant document is retrieved still increases. The cited paper contains various examples of taking advantage of ontologies this way, and the Google image search facility is an example as well.

Anjo.

Anjo,
is it a case about task ontologies in a paper you refer to?

As I tried to say a couple of days ago: I expect more overlaps (e.g. during classification or retrieval) in a case of task ontologies (e.g. ones that describe a workflow) than in a case of domain ontologies.

Anyway, I'm very much biased towards "all things distributed" - simply because mental maps of people are different. Of course, this suggests only that a centralised solution (e.g. ontology) should work if you can find a case where people are likely to have similar mental maps (I can't help thinking about workflows :)))

Lilia,
No, I was not referring to task ontologies as such. The idea, as far as I understand it, is that you first take a point of view and then define the ontology corresponding to that point of view. The domain and task-within-domain points of view appear to be the most relevant. Read Suzanne's paper for alternative viewpoints (I mailed it to you).

The "mental maps" notion confuses me, see also a comment on Andy's blog. Clarify.


Anjo,
there is a quite good description at http://www.boxesandarrows.com/archives/whats_your_idea_of_a_mental_model.php

What I mean with "mental models" - our internal representations (~structures in a brain) of something "out there" (e.g. objects, concepts, words). I also associate it with another concept referring to structures that guide our perception, "schemata" (sometimes "schema"), which I can't explain good enough (I use it as it's used in cognitive psycology, not in "XML circles" :).

Lilia and Anjo.

I really have to agree with Anjo. If experts have only 60 % of the terms in common you may also conclude that

1. not all the terms are equally relevant
2. Just discrete words is not enough to describe things properly. A child starts saying mama and bear, but before it is 2 years old it starts to make sentences. The sentences are a way to expres the relationships between the words.
3. there is more than one way to classify and structure a subject.


IMO the mistake is to think is that the words we use are the same as the concepts we think in, and that meaning can be had from enumerating the words, because we "know what they mean". I believe in fact that the concepts and the web of relations between them are more robust and closer to our thinking (if your older than 2 years old at least :-) ) than the words or the discrete descriptions.

Rogier

Rogier (and Anjo and Lilia,

I had the same feeling: we can call things the same word, but may have a (slightly) different meaning. As long as these are not much in conflict, we don't have a problem.

With Google I think the meaning of the words matter. For example, I was looking for information on sales communities. Google wasn't very helpful in this case. Community often has a different meaning. I had to look for other types of combinations.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment