Today Lilia is presenting a joint paper with Stephanie and myself at AOIR in Chicago. It also happens that it's Lilia's birthday today and thinking about a present resulted in this post. Happy birthday Lilia, and I hope you like it :-).
The paper is about mapping a weblog community, the principle device for finding members of the community is reciprocal linking. There are, apart from linking, obviously various other indicators of a community. For a community of "professional" bloggers around KM (e-learning, internet research), the terms used is probably also a good indicator and this post investigates this.
The data is the community as identified in the paper (all posts in 2004 from all members). A few members were deleted because their blogs are mainly in German. The full data-set for the text analysis consists of 59 blogs, 17,784 posts and 32Mb of text.
First Sigmund was used to find the terms used by the community as a whole with a frequency of at least 10. There are 12,882 of such terms. For example, the term birthday was used 102 times by the community. Next some statistics was applied to the data such that the following query could be made:
t(Term, Focus, Background) = Weight
Here Term is a term (e.g. birthday), Focus a sub-set of the data (e.g. all posts by Lilia) and Background another sub-set of the data (e.g. all posts not by Lilia). Weight is a number (0.0 ... 1.0) which states whether we are in the Focus as compared to the Background set. (Technical note: the weight is not the same as a probability, this is not important for the purposes of this post). For example:
t(birthday, LiliaEfimova, all) = 0.63
were all is shorthand for all other blogs. Another example:
t(boyfriend, female, all) = 0.85
were female is the collection of all blogs authored by women and all is all non-female blogs. We can now define a passion as follows:
t(Term, Focus, Background) > 0.9
a couple of the passions in the community are:
t(anke, AndyBoyd, all) = 1.0 t(digital video, AdrianMiles, all) = 0.95 t(cycle, AnjoAnjewierden, all) = 0.97 t(travel plan, LiliaEfimova, all) = 0.98 t(fingers crossed, LiliaEfimova, all) = 0.92 t(alarm clock, LiliaEfimova, all) = 0.90 t(chocolate, NancyWhite, all) = 0.94
With all the technical machinery in place, we can now try to find whether the community exists because they blog about similar things (as well as personal passions). For this, we define the notion of the community blog as: the collection of all posts in which the post contains a link to another member of the community and compare these to all posts in which there is no link within the community. For normalisation purposes, we subtract a number from the community weights such that they all sum to zero. This results in, for example:
n(collaborative note taking, linked, notlinked) = 0.43 n(security, linked, notlinked) = -0.35
suggesting that collaborative note taking is a term used more frequently when linking within the community, whereas security is not used much in linked posts.
For all individual posts we can now determine whether they, according to terms used, fit in the community by adding the normalised weights of terms occurring in a post. If the sum is positive community terminology is used, if it is negative non-community terminology is used. The following table lists the results:
Linked Not-linked Positive 1189 823 Negative 3141 12631
The posts classified as Positive-Linked and Negative-Not-Linked are correct if we assume linking and use of shared terminology are related. For all posts, 73% are correctly classified. Correctness rapidly increases to > 90% if the number of terms in a post is also taken into account. The preliminary conclusion, therefore, has to be that both linking and use of shared terminology are strong indicators of belonging to a weblog community.
Finally, below is a table that ranks the community blogs according to using common terms. The icing on the cake is that Lilia comes first. Happy birthday, once again.
http://blog.mathemagenic.com/
http://b2ob.blogspot.com/
http://www.zylstra.org/blog/
http://anjo.blogs.com/metis/
http://coniecto.blogspot.com/
http://blog.humlab.umu.se/therese/
http://www.readwriteweb.com/
http://carlav.blogs.com/km/
http://denham.typepad.com/km/
http://seblogging.cognitivearchitects.com/
http://elmine.wijnia.com/
http://www.scalefree.info/
http://headshift.com/
http://paolo.evectors.it/
http://radio.weblogs.com/0110772/
http://kaye.trammell.com/blog/
http://chocnvodka.blogware.com/blog/
http://croeso.typepad.com/croeso/
http://blog.jackvinson.com/
http://blog.mopsos.com/
http://www.fullcirc.com/weblog/
http://radio.weblogs.com/0121664/
http://blog.monkeymagic.net/
http://growingpains.blogs.com/home/
http://climbtothestars.org/
http://www.elearnspace.org/blog/
http://partnerships.typepad.com/civic/
http://www.wingedpig.com/
http://jaarons.typepad.com/dubbings/
http://worcester.typepad.com/pc4media/
http://www.steptwo.com.au/columntwo/
http://www.henshall.com/blog/
http://ross.typepad.com/blog/
http://marc.blogs.it/
http://www.sumofmyparts.com/blog/
http://dijest.com/aka/
http://hypertext.rmit.edu.au/vlog/
http://www.zephoria.org/thoughts/
http://weblog.infoworld.com/udell/
http://matt.blogs.it/
http://www.klastrup.dk/
http://curtrosengren.typepad.com/occupationaladventure/
http://www.professional-lurker.com/
http://www.myelin.co.nz/post/
http://blogs.msdn.com/heatherleigh/
http://jilltxt.net/
http://alex.halavais.net/news/
http://joi.ito.com/
http://www.meskill.net/weblogs/
http://www.plasticbag.org/
http://jeremy.zawodny.com/blog/
http://mamamusings.net/
http://jade.mcli.dist.maricopa.edu/cdb/
http://torillsin.blogspot.com/
http://www.hyperorg.com/blogger/
http://overstated.net/
http://www.newmediamusings.com/
http://www.buzzmachine.com/
http://blogs.salon.com/0002007/
Great!
A few things:
- I believe that there are clusters based on terminology - is there are way to dicover them (probably Rogier knows - he talked about something related for another project). If not - would be nice to look if there is language use correlated to the "group" codes I have in the data (I'd look for "KM" and "I" since they are largest groups).
- What comes up if you look for all posts, rather than only those linked? I guess shared interests should be much more visible then (next to the fact that sometimes ideas picked up by reading others and not necessary directly in the (linked conversations).
And - nice birthday present :)
Posted by: Lilia | October 08, 2005 at 03:11 PM
Dear Anjo
First, I am horrified I missed Lilia's birthday. Uh oh. Well, I hear that the 30th can last for a month, so I have a few more days! :-)
Second, I am pleased that you are surfacing data to support my belief that chocolate is indeed a community indicator, especially in the passion department!
BIG SMILE!
Nancy
Posted by: Nancy White | October 18, 2005 at 05:52 AM
Nancy,
Don't worry, haven't seen any pictures yet of Lilia cutting the birthday cake, which tasted rather nice (and she cooked it herself!). No chocolate on it though :-).
Posted by: Anjo | October 19, 2005 at 10:17 PM
This is her 15th Japanese-language single (21st overall) and has also featured in the game Kingdom Hearts II, developed by Square Enix and Buena Vista Games. "Passion"'s music video was filmed in China by her then husband, combined with an animation segment in the beginning of the video created by Koji Morimoto. Utada had also provided the theme song for Kingdom Hearts entitled "Hikari" in Japan and "Simple and Clean" in North America, therefore making "Passion" its successor.
Posted by: generic Viagra | March 12, 2010 at 09:00 PM
Passion is an intense emotion compelling feeling, enthusiasm, or desire for something. The term is also often applied to a lively or eager interest in or admiration for a proposal, cause, or activity or love.
Posted by: buy generic viagra | March 18, 2010 at 07:16 PM
Then compare the two lists to see how many items appear on both.
Posted by: online generic viagra | March 30, 2010 at 08:57 PM
First, I am horrified I missed Lilia's birthday. Uh oh. Well, I hear that the 30th can last for a month, so I have a few more days
Posted by: insomnia treatment | October 26, 2010 at 11:27 AM
Hi..
Nice post, I would like to request you to one more post about that
Posted by: Generic Viagra | December 02, 2010 at 11:40 AM
To be honest, this article made me understand the genuineness. I understood many things, Related to the topic. I keep on wondering why people keep on wondering so much on the topic. People should start reading your blog in order to get maximum information on the stuff. Things will become more significant after reading your articles. The way these article are full of information and the comments posted show the popularity of your posts.
Posted by: kamagra | December 06, 2010 at 07:37 AM
I like the spirit of you. I have always been an admirer to people who keep posting relevant information. As concerned with the subject, this information is effective.I got the information to its best. The best Information with respect to the topic. This is very best.
Posted by: Generic Viagra | December 06, 2010 at 07:38 AM
Wow, nice post,
there are many person searching about that now they will find enough resources by your post,
Thanks @@@@
Posted by: generic viagra | January 06, 2011 at 09:29 AM
Great ... loved it!!! will be waiting for your future posts!!
Thanxxx for sharing this knowledge !! :)
Posted by: Astrology Readings | January 13, 2011 at 11:02 AM
Dating lance creek. Dating bruning. Dating guide internet loveaccess.
Azdg dating in pakistan. [url=http://xn------6cdajccd1apjkv4affdkl7clxf41a.xn----8sbgsdardohfido0aqw.xn--p1ai/]свинг знакомства в тюмени[/url]. Dating river grove.
Dating east winthrop. Dating greenough. Dating selman city.
Posted by: BlakeabedlY | January 28, 2011 at 02:03 AM
I really agree with the facts that you have shared on this post. An interesting topic like this really enhances reader's mind to have more effective decisions over a certain issue.
Posted by: Penis Enlargement | March 12, 2011 at 03:39 AM
Hi ,. I keep on wondering why people keep on wondering so much on the topic. Its very interesting .
Posted by: logo design | May 13, 2011 at 12:34 PM
I love your article. It can help me get much useful information. Hope to see more words in it. I think you will have interests to see http://cashnowpaydayloans.net/
Posted by: payday loans | May 26, 2011 at 04:24 AM
That is what I need
Posted by: Motorcycle Clothing | May 26, 2011 at 12:15 PM
That is what I need. I really agree with the facts that you have shared on this post. An interesting topic like this really enhances reader's mind to have more effective decisions over a certain issuechina electronics from http://www.buybuyseller.com/
Posted by: china electronics | June 03, 2011 at 04:49 PM
Incredible voice, incredible smile - she sings for the joy of doing and sharing her gift! I'm in Japan too (18 years and counting) and Dre-com, along with Sazan are two joyful music making groups that make up for all the junk here ... and there ... maybe ?
Posted by: Ugg | August 18, 2011 at 10:45 AM
This is a brilliant post, im really glad I found it thank you very much.
Posted by: michael vick jersey | August 19, 2011 at 03:40 AM
My friendS told me that this blog is competitive. i will continue to read.
Posted by: renlewei | August 29, 2011 at 04:30 AM
This drug is due to avoid high for pregnant women and breast-feeding. Valium benefit was the development of confidence higher than online pharmacies are not sure ask your doctor to consult with your doctor, this one right medication for you.
Posted by: Buy Valium | December 28, 2011 at 11:41 AM