« Visual Settlements - First Implementation | Main | Software Release Block »

Making a Difference

While searching for a reasonable approach to create Visual Settlements based on language rather than linkage, I considered the following idea. In Information Retrieval the idea of inverse document frequency (TF/IDF algorithm) is often used to find "unique documents". In a (virtual) community there is a shared interest, but there will also be "personal" differences, and to identify the uniqueness of a single blog within the community the idea of inverse document frequency might come in handy.

Partial motivation came from a paper on identifying virtual communities using linking structures. One of the results reported in the paper was that more than 5000 sites had a common link: namely a link to LaTeX to HTML (i.e. the software that had generated the site). Goes to show, that automated analysis has its flaws. But as long as these flaws generate funny results, who cares.

Below are the results of the exercise. The procedure was as follows:

  • Sigmund was run on the blogs involved to generate the terms used.
  • From the Sigmund terms generated, all terms that occurred infrequently or very frequently were removed. Infrequently was defined as in less than sqrt(NoPosts)/2 and very frequently as in more than sqrt(NoPosts)*2.
  • Finally, all remaining terms that appeared in at least two blogs in the community were removed.

The result should be a list of terms, per blog, which defines that blog's uniqueness within the community. Several disclaimers can be made. Some blogs are mainly in languages other than English, I have deleted non-English terms that crept up manually (but of course the results will still be skewed because of the procedure above).

No conclusions, as usual.


Alex Halavais: agent, assignment, association, authority, campus, candidate, citizen, communication technology, grad student, grade, graduate student, guest, journalism, nation, peer, period, porn, slashdot, terrorist, undergrad, undergraduate, venue, web page, wikipedia.

Anjo Anjewierden: blog spider, chess, ontology, sentence, sigmund, text analysis, vocabulary.

Lilia Efimova: aggregation, artifact, awareness, blog community, blog conversation, blog network, blog reading, blog research, bookmark, contribution, discovery, energy, integration, mode, moscow, news aggregator, overlap, overview, personal information management, personal knowledge management, phd research, proceedings, progress, relation, rhythm, setting, skype, slide, theme, training, visualization.

Piers Young: ancient, british, cup, fame, flag, rhetoric, scientist, tea, treat.

Ton Zijlstra: addendum, barrier, blogwalk meeting, collective, corner, dialogue, enschede, neighbour, open space, parallel, poster, presenter, schedule, small scale, vienna, wiki page.

Carla Verwijs: marathon, submission, virtual community.

Andy Boyd: community of practice, furniture make, km europe, km work, shell, transfer, wale.

Elmine Wijnia: communicative action, habermas theory, ideal speech situation, masterthesis, passage, teacher.

Jill Walker: electronic literature, grant, hypertext, new media, wireless.

Jeremy Aarons: australia, brief, bureau, epistemology, flaw, forecast, forecasters, justification, keynote, knowledge management research, melbourne, meteorology, methodology, monash university, positivism, positivist, relevance, specification, task based knowledge management, weather forecast, weather.

Janine Swaak: knowledge animal, selfishness, territory.

Marc Canter: bet, buddy, cable, calendar, credit, dan, digital lifestyle aggregation, digital lifestyle aggregator, director, dude, editor, enterprise, fee, foafnet, founder, fuck, hire, hook, macromedia, manager, marqui, module, open standard, paris, plug, rap, really simple syndication feed, ship, shot, stream, suck, trieste, vancouver, video clip, wall.

Matt Mower: firefox, iraq, java, mozilla, salon, terrorism.

Paolo Valdemarin: folder.

Sebastien Pacquet.

Thomas Burg.

Torill Mortensen: computer game, dark, gamers, husband, lunch, padding, solid.

Peter Caputa: advertiser, beat, blogdex, boston, buck, cheap, competitor, directory, ebay, eurekster, feedster, gmail account, hotmail, marketer, minute abs, overture, permission, promotion, purchase, search engine, search result, social software blog, statement, syndicate, toolbar, waypath, weblogsinc.

Riccardo Cambiassi: dive, italy, layer, natural interaction, plugin, procedure, proof, smartmobs, speed, toy.

Suw Charman: bolt, business blogging, geek, irc, laptop, marketing blog, subethaedit, vodka, water.

Nancy White: blend, coach, distribute community, grey, on line community, on line facilitation, on line group, on line interaction, peek, principles, sector, telephone, thread, web based.

Jim McGee: book challenge, chicago, congratulations, pointer, productivity, trick.

Stephanie Hendrick: humlab, jokkmokk blog, master thesis, mental space, mylookingglass, proposal, research blog, social network analysis, sweden, warm.

Danah Boyd: abuse, adult, battle, blog entry, blow, california, critique, dance, drink, earth, everyday life, fight, frame, frustration, gay, gender, hair, intention, joy, land, laugh, lesson, mail list, nytimes, pain, parent, phenomenon, population, pressure, privilege, production, refuse, responsibility, scream, smile, sms, status, stranger, super, tear, technologist, tendency, upset, violence, wake, yasns.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83452af8f69e200d834229bb653ef

Listed below are links to weblogs that reference Making a Difference:

» Visualising Communities and Characters from Monkeymagic
Ancient British Scientist finds Anjo's work on communities or "settlements" a realt treat. [Read More]

» What do bloggers blog about, actually? from Emerging Communications
As you’ve probably seen elsewhere, Anjo Anjewierden has run a script to see what words are most often (uniquely) used in different blogs, and has come up with some quite fascinating results. It’s fun to see how the bloggers themselves... [Read More]

» What topics do I cover? from pc4media
I could tag my posts... or we could just Ask Anjo.... If you want to stay abreast of the following topics, I am a good bet: Peter Caputa: advertiser, beat, blogdex, boston, buck, cheap, competitor, directory, ebay, eurekster, feedster, gmail [Read More]

Comments

This and your earlier post on visual settlements is really intriguing. I'm not sure I've wrapped my head around it, but am looking forward to following your work now that I've found you!

Hi Anjo,

I arrived here via a link from Alex Halavais' site. I think this is an absolutely fascinating little project you've done here. With Six Apart's recent acquisition of LiveJournal http://www.sixapart.com/pronet/2005/01/professional_ne.html , I think of how cool it would be to apply this type of analysis to LJ communities (assuming it hasn't already been done). Six Apart might be game for something like this, who knows!

Very interesting! Are you making Sigmund available so we can run this on our own blogs as well?

Elin

This is fantastic! Please keep me posted on where you go with this!

It is my intention to keep track of my working life on this blog. Does not always work out that way!

Contact Lilia and/or myself if there is anything specific. (Also applies to others, of course).

anjo science uva nl (insert @ and some dots).

So I only use either very original words or very common words?

Seb, Had a quick look under the hood and it appears you are using the same words as others semi-frequently. So, blend well with this community ...

I'm still working at getting chocolate to show up on my list! ;-)

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment