Andy Boyd in Google within Companies writes:
I think Google works well on the HTML based Internet. In companies yes we have HTML based intra-net but most of our data sits in databases (EDMS), CoPs, and ERP systems, when Google crawled these in the trial I saw it was less than impressive.Amazing as it may seem the Google algorithm mainly relies on links, not words (contrary to most traditional IR algorithms). A basic idea of Google is to rate each site, based on the number of links to it and the rating of the linking sites (recursively). The result of a query to Google is therefore a combination of the words in the query and the rating of the site.
This is a neat idea, because it assumes that useful and authorative sites are "automatically" detected by web users and that users create links to it in appreciation. For the web as whole, this more or less works (although Google seems to have serious problems filtering duplicate and similar pages).
For a particular company intranet the Google approach probably does not work. Within a company there is little point in adding explicit links, as these links exist implicitely in shared work, and formal and informal meetings.
The most important difference between Google and an intranet is therefore the user base. Google works on the entire internet, and its purpose is to help unrelated users find interesting sites given a query. For an intranet, the links are in all probability not explicit, so algorithms based on an analysis of "words only" probably produce better results.
In short, I am not surprised that Google produces poor results on Andy's intranet. A typical reviewer question on the projects I work on is: "Do the results scale up?". It appears the Google algorithm does not scale down!
I agree with all written here Anjo, I've been looking at companies who have introduced Automony search tool and will report back when all data has been collected.
Posted by: Andy Boyd | March 15, 2004 at 12:26 PM