Lilia appears worried, in Facts on archiving, search and retrieval she quotes Piers Young, who on his turn quotes: JAA Sillince, 1992, Literature searching with unclear objectives: a new approach using argumentation. On-line Review, 16 (6), 391-409. The message is that "If two groups of people construct thesauri in a particular subject area, the overlap of index terms will only be 60%."
What is the problem? I would be very happy if Google and myself would overlap 60% of the time. One of my colleagues, Suzanne Kabel, has compared keyword and ontology-based search and retrieval extensively. A recent paper is: Suzanne Kabel, Robert de Hoog, Bob J. Wielinga, and Anjo Anjewierden, "The Added Value of Task and Ontology-Based Markup for Information Retrieval", Journal of the American Society for Information Science and Technology, 55:4, pp. 348-362, February 2004. The paper does not conclude with a quote-bite like "the overlap of index terms will only be 60%". Suzanne, modestly, concludes that most of the results are not statistically relevant, but that it appears that ontology-based indexing produces better retrieval results than keyword-based indexing.
Although I still rely on Google for most of my queries the irritation is growing. The technology they use seems to have reached its limits. I'm happy new roads are being explored and, of course, it will be bumpy before it gets usable.
Hi Anjo,
As you say, this may not be worth panicking about, but one problem with this lack of overlap is e.g. letting less knowledgeable people search effectively for concepts used by domain experts (and so learn?).
The Sillince article, rightly I think, suggests that while the general keyword-thesaurus-taxonomy-ontology approach is helpful for automated, computer agent information retrieval, they may be less useful for "human agent" information retrieval (with all its vagaries). He suggests that in a number of cases rhetorical search may be more effective. THough to my knowledge this hasn't been developed yet (!)
Posted by: Piers Young | March 10, 2004 at 05:16 AM
Piers,
In a paper to be published, Suzanne addresses the indexing problem using non-experts (psychology students) on the same domain (articles about Gorilla's). Ontology-based indexing is an emerging technology, and I would be very surprised if it would be less useful than traditional IR in a couple of decades.
Perhaps the most interesting thing about using ontologies is that there is no reason that they need to be about the "domain" (roughly topics in the document). A simple example is taking the length of the document into account, such that it becomes possible to query for "long documents about Gorilla's". Then, two indexers or an indexer and a retriever may not agree on what a "long document" is, but the likelihood that a relevant document is retrieved still increases. The cited paper contains various examples of taking advantage of ontologies this way, and the Google image search facility is an example as well.
Anjo.
Posted by: Anjo | March 10, 2004 at 08:20 AM
Anjo,
is it a case about task ontologies in a paper you refer to?
As I tried to say a couple of days ago: I expect more overlaps (e.g. during classification or retrieval) in a case of task ontologies (e.g. ones that describe a workflow) than in a case of domain ontologies.
Anyway, I'm very much biased towards "all things distributed" - simply because mental maps of people are different. Of course, this suggests only that a centralised solution (e.g. ontology) should work if you can find a case where people are likely to have similar mental maps (I can't help thinking about workflows :)))
Posted by: Lilia | March 10, 2004 at 11:28 PM
Lilia,
No, I was not referring to task ontologies as such. The idea, as far as I understand it, is that you first take a point of view and then define the ontology corresponding to that point of view. The domain and task-within-domain points of view appear to be the most relevant. Read Suzanne's paper for alternative viewpoints (I mailed it to you).
The "mental maps" notion confuses me, see also a comment on Andy's blog. Clarify.
Posted by: Anjo | March 11, 2004 at 01:39 AM
Anjo,
there is a quite good description at http://www.boxesandarrows.com/archives/whats_your_idea_of_a_mental_model.php
What I mean with "mental models" - our internal representations (~structures in a brain) of something "out there" (e.g. objects, concepts, words). I also associate it with another concept referring to structures that guide our perception, "schemata" (sometimes "schema"), which I can't explain good enough (I use it as it's used in cognitive psycology, not in "XML circles" :).
Posted by: Lilia | March 11, 2004 at 09:49 AM
Lilia and Anjo.
I really have to agree with Anjo. If experts have only 60 % of the terms in common you may also conclude that
1. not all the terms are equally relevant
2. Just discrete words is not enough to describe things properly. A child starts saying mama and bear, but before it is 2 years old it starts to make sentences. The sentences are a way to expres the relationships between the words.
3. there is more than one way to classify and structure a subject.
IMO the mistake is to think is that the words we use are the same as the concepts we think in, and that meaning can be had from enumerating the words, because we "know what they mean". I believe in fact that the concepts and the web of relations between them are more robust and closer to our thinking (if your older than 2 years old at least :-) ) than the words or the discrete descriptions.
Rogier
Posted by: Rogier Brussee | March 11, 2004 at 10:57 AM
Rogier (and Anjo and Lilia,
I had the same feeling: we can call things the same word, but may have a (slightly) different meaning. As long as these are not much in conflict, we don't have a problem.
With Google I think the meaning of the words matter. For example, I was looking for information on sales communities. Google wasn't very helpful in this case. Community often has a different meaning. I had to look for other types of combinations.
Posted by: Carla V. | March 11, 2004 at 12:06 PM