« Juggling | Main | Passion and Profession (2) »

Passion and Profession

Today Lilia is presenting a joint paper with Stephanie and myself at AOIR in Chicago. It also happens that it's Lilia's birthday today and thinking about a present resulted in this post. Happy birthday Lilia, and I hope you like it :-).

The paper is about mapping a weblog community, the principle device for finding members of the community is reciprocal linking. There are, apart from linking, obviously various other indicators of a community. For a community of "professional" bloggers around KM (e-learning, internet research), the terms used is probably also a good indicator and this post investigates this.

The data is the community as identified in the paper (all posts in 2004 from all members). A few members were deleted because their blogs are mainly in German. The full data-set for the text analysis consists of 59 blogs, 17,784 posts and 32Mb of text.

First Sigmund was used to find the terms used by the community as a whole with a frequency of at least 10. There are 12,882 of such terms. For example, the term birthday was used 102 times by the community. Next some statistics was applied to the data such that the following query could be made:

  t(Term, Focus, Background) = Weight

Here Term is a term (e.g. birthday), Focus a sub-set of the data (e.g. all posts by Lilia) and Background another sub-set of the data (e.g. all posts not by Lilia). Weight is a number (0.0 ... 1.0) which states whether we are in the Focus as compared to the Background set. (Technical note: the weight is not the same as a probability, this is not important for the purposes of this post). For example:

  t(birthday, LiliaEfimova, all) = 0.63

were all is shorthand for all other blogs. Another example:

  t(boyfriend, female, all) = 0.85

were female is the collection of all blogs authored by women and all is all non-female blogs. We can now define a passion as follows:

  t(Term, Focus, Background) > 0.9

a couple of the passions in the community are:

  t(anke, AndyBoyd, all) = 1.0
  t(digital video, AdrianMiles, all) = 0.95
  t(cycle, AnjoAnjewierden, all) = 0.97
  t(travel plan, LiliaEfimova, all) = 0.98
  t(fingers crossed, LiliaEfimova, all) = 0.92
  t(alarm clock, LiliaEfimova, all) = 0.90
  t(chocolate, NancyWhite, all) = 0.94

With all the technical machinery in place, we can now try to find whether the community exists because they blog about similar things (as well as personal passions). For this, we define the notion of the community blog as: the collection of all posts in which the post contains a link to another member of the community and compare these to all posts in which there is no link within the community. For normalisation purposes, we subtract a number from the community weights such that they all sum to zero. This results in, for example:

  n(collaborative note taking, linked, notlinked) = 0.43
  n(security, linked, notlinked) = -0.35

suggesting that collaborative note taking is a term used more frequently when linking within the community, whereas security is not used much in linked posts.

For all individual posts we can now determine whether they, according to terms used, fit in the community by adding the normalised weights of terms occurring in a post. If the sum is positive community terminology is used, if it is negative non-community terminology is used. The following table lists the results:

             Linked   Not-linked
Positive       1189      823       
Negative       3141    12631       

The posts classified as Positive-Linked and Negative-Not-Linked are correct if we assume linking and use of shared terminology are related. For all posts, 73% are correctly classified. Correctness rapidly increases to > 90% if the number of terms in a post is also taken into account. The preliminary conclusion, therefore, has to be that both linking and use of shared terminology are strong indicators of belonging to a weblog community.

Finally, below is a table that ranks the community blogs according to using common terms. The icing on the cake is that Lilia comes first. Happy birthday, once again.

http://blog.mathemagenic.com/
http://b2ob.blogspot.com/
http://www.zylstra.org/blog/
http://anjo.blogs.com/metis/
http://coniecto.blogspot.com/
http://blog.humlab.umu.se/therese/
http://www.readwriteweb.com/
http://carlav.blogs.com/km/
http://denham.typepad.com/km/
http://seblogging.cognitivearchitects.com/
http://elmine.wijnia.com/
http://www.scalefree.info/
http://headshift.com/
http://paolo.evectors.it/
http://radio.weblogs.com/0110772/
http://kaye.trammell.com/blog/
http://chocnvodka.blogware.com/blog/
http://croeso.typepad.com/croeso/
http://blog.jackvinson.com/
http://blog.mopsos.com/
http://www.fullcirc.com/weblog/
http://radio.weblogs.com/0121664/
http://blog.monkeymagic.net/
http://growingpains.blogs.com/home/
http://climbtothestars.org/
http://www.elearnspace.org/blog/
http://partnerships.typepad.com/civic/
http://www.wingedpig.com/
http://jaarons.typepad.com/dubbings/
http://worcester.typepad.com/pc4media/
http://www.steptwo.com.au/columntwo/
http://www.henshall.com/blog/
http://ross.typepad.com/blog/
http://marc.blogs.it/
http://www.sumofmyparts.com/blog/
http://dijest.com/aka/
http://hypertext.rmit.edu.au/vlog/
http://www.zephoria.org/thoughts/
http://weblog.infoworld.com/udell/
http://matt.blogs.it/
http://www.klastrup.dk/
http://curtrosengren.typepad.com/occupationaladventure/
http://www.professional-lurker.com/
http://www.myelin.co.nz/post/
http://blogs.msdn.com/heatherleigh/
http://jilltxt.net/
http://alex.halavais.net/news/
http://joi.ito.com/
http://www.meskill.net/weblogs/
http://www.plasticbag.org/
http://jeremy.zawodny.com/blog/
http://mamamusings.net/
http://jade.mcli.dist.maricopa.edu/cdb/
http://torillsin.blogspot.com/
http://www.hyperorg.com/blogger/
http://overstated.net/
http://www.newmediamusings.com/
http://www.buzzmachine.com/
http://blogs.salon.com/0002007/

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/17700/3325386

Listed below are links to weblogs that reference Passion and Profession:

» KM bloggers community (2) from Mathemagenic
[Read More]

» Zu "Finding 'the life between buildings': An approach for defining a weblog community" (Efimova/Hendrick/Anjewierden) from Das E-Business Weblog
Finding "the life between buildings": An approach for defining a weblog community ist ein neues interessantes Papier von Lilia Efimova, Stephanie Hendrick und Anjo Anjewierden, das sie zur "Internet Research"-Konferenz... [Read More]

» Passion and Profession from Ton's Interdependent Thoughts
Anjo Anjewierden comes with three postings aptly titled Passion and Profession Passion and Profession I Passion and Profession II Passion and Profession III (which is a nice encore about language being a barrier or not.) He and Lilia and Stephanie... [Read More]

» You say profession, I say passion from Monkeymagic
Missed this while on my travels. Anjo, Lilia and Stephanie have been doing some interesting things analysing blog communities through terminology rather than links. [Anjo writes it up here, here and here.] One of the upshots of it all is... [Read More]

» You say profession, I say passion from Monkeymagic
Missed this while on my travels. Anjo, Lilia and Stephanie have been doing some interesting things analysing blog communities through terminology rather than links. [Anjo writes it up here, here and here.] One of the upshots of it all is... [Read More]

Comments

Great!

A few things:
- I believe that there are clusters based on terminology - is there are way to dicover them (probably Rogier knows - he talked about something related for another project). If not - would be nice to look if there is language use correlated to the "group" codes I have in the data (I'd look for "KM" and "I" since they are largest groups).

- What comes up if you look for all posts, rather than only those linked? I guess shared interests should be much more visible then (next to the fact that sometimes ideas picked up by reading others and not necessary directly in the (linked conversations).

And - nice birthday present :)

Dear Anjo

First, I am horrified I missed Lilia's birthday. Uh oh. Well, I hear that the 30th can last for a month, so I have a few more days! :-)

Second, I am pleased that you are surfacing data to support my belief that chocolate is indeed a community indicator, especially in the passion department!

BIG SMILE!

Nancy

Nancy,

Don't worry, haven't seen any pictures yet of Lilia cutting the birthday cake, which tasted rather nice (and she cooked it herself!). No chocolate on it though :-).

Post a comment

If you have a TypeKey or TypePad account, please Sign In