The visit to the Community Informatics workshop resulted, a week later, in a paper with Aldo de Moor. Given that the paper was submitted to a conference with a double-blind review process I cannot go into too many details. Would the paper have a subtitle it would be something like How bloggers save the world. Unable to link to the paper, some thoughts on practical implications.
For the paper we needed a corpus of documents about a realistic case, for example a blog corpus about global warming (a topic of serious interest to those living in The Netherlands). I first looked at Google Blog Search. The results are less than mediocre, and the conclusion must be that Google indexes pages with RSS feeds rather than blog posts. A consequence is that one gets a lot of boilerplate text from news agencies. Next try was BlogPulse and the results are pretty good. BlogPulse only indexes blogs, precisely what we needed for the paper.
After spidering a few hundred posts returned by BlogPulse, tOKo was used to analyse the results. Above is a Sigmund network based on the global warming corpus. The network makes a lot of sense and this is not what I expected because BlogPulse links to HTML pages containing posts and not to the posts themselves. Apparently, Sigmund has succeeded in filtering out side matter such as blogrolls, tag clouds and so forth. For those unfamiliar with global warming: hockey stick refers to the graph that depicts the rapid increase in temperature, and drinking advantage to melting ice caps in Greenland which are being used to brew beer from. One finds this out by browsing the respective Sigmund networks.
After finishing the paper, I thought it would be an idea to add a Create corpus from BlogPulse option to tOKo. Enter a topic, like Dutch elections, cycling trips or personal knowledge management, and a few minutes later a corpus is ready to browse. This would also partly solve another problem. For the tOKo documentation I don't have a public corpus that can be used for illustration purposes.
The terms of use for BlogPulse states the following clause:
5. Meta-Searching and Automated Queries Prohibited
You may not "meta-search" BlogPulse or send automated queries of any sort to BlogPulse, unless Nielsen BuzzMetrics's prior written consent is obtained. Sending "automated queries" includes, among other things, using any software which sends queries to BlogPulse (for example, to determine how a blog "ranks" for various queries) or performing "offline" searches on BlogPulse). Please Contact Us for more information.
Google, and other search engines, have a similar clause on "meta-searching" (note the quotes). And, of course, it is a fully understandable restriction on the terms of use.
My first thought was: bad luck. Perhaps this is not the case. What BlogPulse is asking for is some active involvement of the user. In an HTML browser this active involvement is clicking on the Next page with results link. The same can be done in tOKo. The BlogPulse search is initiated by entering the desired topic into a special search box (like the search engine bar in Firefox), tOKo renders the results and then the user has to click a button for more results just as in any HTML browser. This, clearly, is not meta-searching according to Wikipedia and neither is it an automated query as the initiative is with the user (just as in any HTML browser).
One of the messages of this post, and also of the paper, is that there are a lot of thoughtful posts written by honest bloggers on the web. Connecting these posts, and the people behind them, is something well worth pursuing as a source of solutions to difficult and interesting problems. A tool like BlogPulse can significantly contribute to this and I hope that no one objects to adding the Create corpus from BlogPulse option to tOKo.
Anjo,
I love what you're doing with BlogPulse search results. However, this does fall as meta-search/automated querying by our understanding. E-mail me -- I think we'll have a solution that you can use.
Posted by: Natalie Glance | November 27, 2006 at 02:58 PM
Interesting chart. It explains a lot.
Posted by: Phallosan | April 11, 2011 at 09:41 AM
Luckily global warming is not affected by blogging. I would have gone crazy otherwise.
Posted by: Cazare Moeciu | December 13, 2011 at 04:40 PM
Yes, but we can say that computers are affecting the environment like any other machine. Together they are affecting the well being of nature.
Posted by: Cazare Bucovina | January 10, 2012 at 03:53 PM