Sunday, July 22, 2007

bloggarum thesaurus

According to this blog entry over on The Guardian, the Oxford University Press is conducting a study on the vocabulary of the blogosphere. Ipse dixit John Moore. Who? He’s a guest journalist for The Guardian who used to be a drummer for the band The Jesus and Mary Chain.

Next time you convey your velocipede along Walton Street in Oxford, spare a thought for the poor souls suffering behind its elegant facades. I am not referring to the mortal coil shufflers at the John Radcliffe, but to the researchers at the Oxford University Press, charged with the life-sapping task of monitoring the use of English in weblogs.

Secundum Moore, the OUP has determined that the top fifteen words used in the blogosphere are: “blogger, blog, stupid, me, myself, my, oh, yeah, ok, post, stuff, lovely, update, nice, shit.” Quite a list. I sat, and I pondered. I googled around for some news story on the OUP and its study. I found something on The Chronicle (by Jennifer Howard) which linked to a Telegraph article (by Mark Sanderson). The same old kernel of a story (except for a quaint em-dash in medias dirty word because—no doubt—of the The Telegraph’s style guide) but no links to an OUP press release. Then, I surfed on over to MySpace, and I took the first entry on Moore’s blog there, and then I ran the text through a word count program. Here’s Moore’s top fifteen words: “the, to, I, and, of, a, was, he, it, in, my, his, with, that, shop”. The first thing I noticed was that the little function words like the and and weren’t there, but then I was absent, though me, myself, and my weren’t. I realized that the boys chez OUP probably know a thing or two about counting words and what constitutes one, too: lemmatization and all that.

[Via Taccuino di traduzione.]

Labels: ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home