Pulling the plug: saving energy

In the previous year (until October 2007) my electricity consumption was 3,062 kwH. Since then I replaced all lamps with the low energy variant, and pull the plug on any electrical device I don't use. The projected consumption for this year (since October 2007) is currently at 1,928 kwH. About 33% less!

Saving a lot of energy appears rather simple: pull the plug. In my case: microwave, digital television, radio, computer (the network card keeps on running while the computer is turned off), and cable modem. All these devices consume energy also when "turned off".

Pulling the plug: it really is that simple!

Weblog conversations: the big one

The above picture depicts a "weblog conversation" in a certain community (see Lilia's overview for more and background).

To somewhat appreciate the picture (called the "big one" by Lilia), if that is possible, the following information might be useful.

  • Each grey box is a weblog conversation, the bigger the box the more posts are part of the conversation. The conversation at upper-left has been opened up to illustrate that grey boxes are complex by themselves.
  • Coloured boxes are posts that link to a conversation, but are not part of it because it is a self-link. Conversations, by definition are not with oneself :-). Each blogger has her own colour.
  • Coloured links between conversations link one conversation to another using a self-link.

For an interpretation we have to wait until Lilia's thesis is finished. I'm particularly interested in the cover art :-).

That's not machine learning

Yesterday, I gave a talk on educational data mining at the University of Amsterdam (my previous employer). The talk was about how I approach the very difficult issue of trying to understand what learners are doing in learning environments. During the inevitable drinks afterwards, someone exclaimed that's not machine learning. He was right, I actually tried the standard data mining techniques and they don't seem to produce any useful results. The key, I think, is in understanding what you want to discover, and not whether you are using the "correct" algorithms. Educational data mining, and data mining in general, is these days biased by the "Microsoft/Google" of data mining packages (it is called WEKA and I refuse to provide a link). Researchers compress there data such that WEKA can handle it, and then presto one of the algorithms produces some results.

Today, I gave a course to third year psychology students on educational data mining. Explaining what the issues are, and showing the results produced by the new methods we have developed. The students were very responsive, asking good questions, and aligned with the idea that in educational data mining the purpose is to understand the behaviour of the learner which the standard data mining techniques hardly provide an opportunity for.

Last week, Lilia and I had a discussion about a chapter of her thesis. Lilia's little toddler Alexander (1 year and 3 months) was present. He liked emptying my trash bin, putting the trash into the bin again, emptying it and so forth indefinitely. Perhaps, researchers, once they have reached a certain level of maturity, become toddlers again.

Workshop: Knowledge acquisition from the social web

Some might be interested in a workshop on Knowledge acquisition from the social web. The workshop is part of Triple-I and located in Graz (Austria) in early September. Graz is a rather nice city, which I visited once ten years ago, also climbing the peculiar hill right in the middle. Objectives of the workshop are below.




This workshop aims to develop and bring together a community of researchers interested in discussing the manifold challenges and potentials of knowledge acquisition from the social web.

With the advent of the “Social Web”, a new breed of web applications has enriched the social dimension of the web. On the social web, actors can be understood as social agents - technological or human entities - that collaborate, pursue goals, are autonomous, and are capable of exhibiting flexible problem solving and social behavior. By participating in the social web, both technological and human agents leave complex traces of social interactions and their motivations behind, which can be studied, analyzed and utilized for a range of different purposes. The broad availability and open accessibility of these traces in social web corpora, such as in del.icio.us, Wikipedia, weblogs and others, provides researchers with opportunities for, for example, novel knowledge acquisition techniques and strategies, as well as large scale, empirically coupled “in the field” studies of social processes and structures.

This workshop aims to develop and bring together a diverse community of researchers interested in the social web by seeking submissions that are focusing on understanding and evaluating the role of agents, goals, structures, concepts, context, knowledge and social interactions in a broad range of social web applications. Examples for such applications include, but are not limited to social authoring (e.g. wikis, weblogs), social sharing (e.g. del.icio.us, flickr), social networking (Facebook, LinkedIn) and social searching (e.g. wikia, eurekster, mahalo) applications.


Conferences and their location

A little food for thought. Let us suppose you are a scientist and have written a conference paper. Two conferences would possibly accept the paper. They take place around the same time, one at an exotic location you always wanted to visit and one in a dull location you've unfortunately visited many times before. To which conference do you submit the paper? For those not familiar with scientific practice, it is not done to send the same paper to two conferences at the same time and publishing papers is core business for scientists.

After about a year in my new job on educational data mining I invited all collaborating colleagues (5) to write a paper on what we had discovered so far. They all took up the challenge and only after the paper was finished two asked: what is the conference location? Oh well, it is within cycling distance. I checked that before inviting them.

ADML workshop on Educational Data Mining

Last week I was in Crete, Sissi to be precise, to present our work on chat analysis at the ADML workshop (Applying Data Mining Techniques to e-Learning) part of EC-TEL (European Conference on Technology Enhanced Learning).

Crete: here we come

I went to Crete with Wilco Bonestroo, who had a paper at another workshop. When we arrived in the hotel around 21:00 and showed the voucher that would guarantee seven days of Cretean hospitality we were bitterly disappointed. Think of an arbitrary Monty Python sketch:

  • We: We have a booking for this hotel.
  • Hotel manager: I have not received your booking, and your voucher is not from a travel company I know. (Neckermann, TUI, etc.).
  • We: But the travel agency acknowledged the reservation.
  • Hotel manager: You don't look like a bus load of tourists to me.
  • We: Do you have free rooms?
  • Hotel manager: No, and if we had we would not give them to you.

What to do? With very little choice, we walked down the road and asked at other hotels whether they had rooms available. After about five attempts, and getting more desperate while time was running out, we finally got a room at a cheap place (€ 29 a night including leaking tap, breakfast and a view on the beach). The next morning we walked into the direction of the conference and found an appartment for € 40 per day for the rest of the week. After some searching we also found a place that serves a good (English) breakfast: the Pyramid bar at the beach front in Sissi.





ADML workshop

Wining and dining with workshop participants

About half a year ago I started working on Educational Data Mining (EDM). The general idea of EDM is to apply data mining techniques to educational data (e.g. log files of learning environments). The ADML workshop (Applying data mining techniques to e-learning) seemed an excellent opportunity to get in touch with the EDM community. So far, I had followed EDM research through the portal: educationaldatamining.org.

Seven papers were presented at the workshop, followed by a general discussion. My overall conclusion is that EDM still has to figure out precisely what the main issues are. Several papers presented work on (very small) case studies that are difficult to generalise. Agathe Merceron's paper was probably the most general: the application of association rule mining to find combinations of student errors in a learning environment. The potential would be that course material can be improved when it turns out student errors are caused by a poor organisation of the learning environment itself.

Agathe Merceron (left) and Anjo Anjewierden (right)

In the seven papers, no less than six different data mining techniques were proposed. This is also an indication the field is searching for direction, or perhaps even, interesting problems to attack. At the conference there were also a number of papers hinting at the use of data mining and one shocking example was a presentation in which the data had been fed into Weka (a data mining toolkit) and the algorithm that produced the best results was presented. So far, so good. Unfortunately, the person in question had neither an idea what the algorithms did, and much worse, no idea how to interprete the results.

The photo's, kindly provided by Galit Ben Zadok, were taken during the informal dinner after the workshop. A research community in the making, evidenced by the fact that the restaurant ran out of traditional Greek beer after midnight ... (No, it was not only me!)



Chat analysis

For my own 15 minute presentation I decided to present the background as slides, and give an interactive demo of the results. A screendump of the chat analysis tool used for the demo is shown below. The idea of the avatars is described in Avatars in learning environments. And, contrary to the reviewers :-), the workshop participants like it.





EC-TEL conference

After a couple of days José Kooken joined us. She had a paper and presentation at the EC-TEL conference, and given that this was her first conference presentation, she was a little nervous (that is what she told us anyway :-)). For me this was a little peek in the past, I still remember being very nervous for my first presentation a long time ago. I suppose that one thing every researcher faces sooner or later is that what appears self evident (or even trivial) to you, may be very interesting to others. José's presentation went very well and she also got a lot of questions from the audience.

I still have to find my way in the educational research world (what's hot and what's not) and tried to carefully select sessions to go to in the conference. A session of particular interest was called "Ontologies / Knowledge Management", topics I have a past attachment with. Amal Zouaq presented a tool called Knowledge Puzzle which "automatically" extracts domain concepts and relations from text (paper title "Building Domain Ontologies from Text for Educational Purposes"). From my perspective, it is at least interesting to see that this kind of language / semantic web technology is being introduced to the educational world.

On vacation ...

I actually did a little bit of cycling, although the mountains, the sun and the poor quality of the rent bicycles made this a demanding exploit. Crete is a nice place for a stressless short holiday near the beach. I did not have time to visit the historical locations. A look at the impressive mountains, the lack of proper roads, and not having a driving licence might explain this.

Weblog data as art

There are days one feels like winning the lottery. I discovered Lattice Uncertainty Visualization. This may not sound too appealing to many, but there are some very neat ideas about the visualisation of uncertainty (probabilities) in large data sets, particularly the use of various visualisation techniques for emphasis.

Something else I was searching for is a "reasonable" visualisation of events that occur over time. As it is, events happen over time all the time, but how to visualise them is not always that trivial. An example is the rate of blogging. Plotting a simple histogram of the number of posts per day is rather boring and difficult to interprete. And, for most of us, it would generate a rather silly visualisation of blogging (in)frequently.

An idea is to draw a dotted line for each post. Given that dotted lines contain white space, this also accommodates multiple posts a day (for the heavy bloggers among us; by filling some of the white dots). An example is given in the background of the image below. The foreground (see also a previous post) depicts self-linking (top) and linking to others (bottom).

Weblog conversations and self-linking visualised

Lilia is taking another look at weblog conversations (and I am too). Her posts so far:

A weblog conversation is a set of linked posts. The most natural counterparts are threads in email or newsgroups. From the beginning (e.g. weblog conversations; time based view), we defined that a weblog conversation does not include self-links.


Despite this definition, Lilia wants to study the relation between conversations and personal blogging practices, with a particular emphasis on self-linking. The combination of conversations (linked posts) and self-linking (which will result in linking conversations to each other) generates an amorphous and incomprehensable blurb when visualised. After some experimentation (fortune has it I'm not a mathematician "if it is linked, we called it a graph"), and following the general principle to keep it simple, the relation between conversations and self-linking is perhaps best studied by visualising the following:

  1. Black: a conversation. All details about the conversation are hidden (in an interactive environment zooming in is always possible).
  2. Yellow: a boundary link. A self-link into a conversation. That is, there is a personal link inside the conversation already, but no one else links to a boundary link.
  3. Pink: a secondary link. A self-link to a boundary link or another secondary link.

The boundary links are very close to the conversations (one self-link apart), the secondary links are further away (at least two self-links from a conversation). Surprising as it may seem, secondary links are very seldom in the data I looked at (might be interesting for social scientists to try to explain that!).



Four examples. (1) The small one at the top right only contains black squares: the person is involved in conversations but does not self-link to any of his/her posts inside the conversation. (2) The example above shows a more mature example of involvement in conversations and following them up. The fascinating thing is that not only there are self-links to conversations, but that some conversations get connected by these self-links, which suggest that according to this blogger the conversations may be related.

These two examples are typical in the data used. The last two examples below show bloggers who engage into conversations as well as adding to them personally on a large scale. And, finally we see some pink posts too! The last picture gives an idea how really seldom these pink posts are.

In A model (framework) for weblog research it was suggested that one should look at five dimensions to study weblogs. This post shows that one can obtain a fascinating peek into the blogosphere by looking at just two dimensions (links, persons). Perhaps it is an idea to also add time so that we can see whether the yellow and pink posts occur before (this is possible), during or after the conversation.

Avatars in learning environments

A view months ago I posted on the use of avatars in learning environments.

Since then we have worked out the idea and written a paper on it. Mauro Cherubini neatly summarised the idea in the paper on his weblog:

The premise of this paper is that learners cannot be expected to oversee the whole of their communication and also that chat communication tends to be less structured than face-to-face communication (Stromso et al., 2007). Therefore they aim to build a real-time feedback system that can regulate the collaborative interactions. This workshop paper presents a nice approach to use a part-of-speech tagger and a Bayesian classifier to categorize chat messages into 4 functional categories: regulatory, domain specific, social and technical messages. The authors used manual coders to assign each message to a category. Then they used this corpus to train the Bayesian classifier, showing high accuracy results.

The body parts of the avatars corresponding to the two learners communicating with each other grow or shrink when a new chat message arrives. For instance, when a domain oriented message ("the speed increases") is typed the head becomes a little bigger and other body parts become a little smaller, or when a regulatory message is typed ("I agree, the answer is 4") the body becomes a little bigger. Watching the shape of the avatars change when new messages come in is great fun, possibly even for the learners themselves. The reviewers did not think the avatars to be an appropriate visualisation of learner behaviour. They suggested to use histograms :(. I'm not entirely sure, but when learning the laws of momentum, the last thing a learner may want to look at is a histogram.

In order to make the avatars change shape two methods to analyse the chats were used: looking at the words, and looking at the grammatical structure (part-of-speech or POS-tagging) of a chat message. Both methods classify the chats well, looking at the words produces slightly better results, possibly because the vocabulary of the learning environment is very small and all domain-oriented words (speed, momentum, increases, etc.) get assigned to the domain class (the head of the avatar). One of the strongest grammatical structures the automatic analysis picks up for regulatory message is a verb followed by a personal pronoun. Funnily enough, this structure does not exist in English ("think I" is not grammatical). In our Dutch chats grammatical structure is more or less sufficient to select the regulatory chats, and the underlying algorithms can discover this automatically. The paper contains all technical details.

This is a small contribution to the emerging field of Educational Data Mining. Personally, I think it is stimulating the application of automated analysis techniques has the potential of improving both the understanding of learning environments and makes them nicer to use (especially when we find an even better visualisation of the avatars).

Reference: Anjo Anjewierden, Bas Kollöffel, and Casper Hulshof. Towards educational data mining: Using data mining methods for automated chat analysis to understand and support inquiry learning processes. In Proceedings of International Workshop on Applying Data Mining in e-Learning (ADML 2007) as part of the 2nd European Conference on Technology Enhanced Learning (EC-TEL 2007), Crete, Greece, 2007 (September).

Weblog self-linking visualised

A short post about the visualisation of self-linking in weblogs. The idea for having a look at self-linking behaviour is due to Lilia.

An interesting property of self-linking is that there is only one dimension: time. And, unfortunately, visualisations of a single dimension look rather flat. One solution is to temporarily escape the single dimension by using arcs connecting the posts, an idea that seems to have originated in Thread Arcs.

The above image is an example of a variant on the Thread Arcs idea. Left to right is time, and the arc that links connected posts is filled with a colour: the darker the colour the shorter the time span of the linked posts.

Another example. Visualisations like this can, at the very least, differentiate between those who use their weblog to create an intricate structure of linked posts over a long period of time, compared to bloggers who hardly refer to their own posts.

The final example depicts Lilia's self-linking practices. I see waves, woods ...