Phil Gyford


Thursday 20 February 2003

PreviousIndexNext I'll give them a word burst or two...

Accpording to the New Scientist it might be possible to track societal change by monitoring the frequency of phrases over time.

Kleinberg suggests that the method could be applied to weblogs to track new social trends. For example, identifying word bursts in the hundreds of thousands of personal diaries now on the web could help advertisers quickly spot an emerging craze.

Well, gee whiz, who’d a thunk it?


I think there are things, big things, going on that I don't know about. In an age of information overload and the Internet it seems strange that I feel ignorant and out of touch.

Whats really going on? Its a question I ask myself all the time. How do I find out whats really going on?

Some one once said that if you arrive in a town or city, ask a taxi driver whats going on. They will invariably know the inside story and give you a better picture than you would get from any media. Or listen to people talking in a pub after they have had a few.

In the pub last night I heard some fairly outrageous stories about Saudi Arabia from someone who worked there and I thought why have I never heard this before. I spent 10 minutes with Google trying to verify them but found nothing. They were the kind of stories that you should be able to find fairly easily because they involve unusual combinations of words.

Posted by Richard Hyett on 20 February 2003, 9:45 pm | Link

5{ord burst stuff is interesting but one thing it fails to capture are multiword terms which are equally important. (For example: Great Society or New Deal). I don't know enough about the maths he qas using (i tried to read the paper) but I think this is an pomission. I hope in future work he looks at multiword terms because they will tell us some interesting things too!

Posted by azeem on 21 February 2003, 3:15 pm | Link

Hi. I've made a word burst software application that lets you track a number of words at a time. It works with text files and URL's. You feed it a text file of keywords and a text file of URL's and it'll spider the web and return the results into another text file. Then, it'll put the words in context, offer a graph of the words in a way that makes intuitive sense, and, it'll allow chronological graphs, so that the multiple graphs of keyword returns cascade. Version 2.0 will include a URL stripper and some other tools that will make this tool easier to use.

Posted by reid harward on 19 July 2003, 4:00 am | Link

Hi reid, I'm thinking of applying word burst algorithm to other languages. Could you please share me some kernel codes on it? Or any license available?

Posted by Isaac on 16 November 2003, 4:20 am | Link

The Newsjuicer works with the newsmuse link rhizome which can be found here:

Posted by reid harward on 10 March 2004, 1:03 am | Link

It's been hectic year, but there have been some interesting developments on the text analysis front. One of these is a 20 line c module that rips sources into directories of word lists. Fast searches can be run on these word lists. This little module should work on any system no matter how tiny. When I created this programm, I adhered to a strict standard when it came to design. I was interested in portability. The result is a program that will run on anything. In this respect my tiny module is like a Turing Engine.

The applications for making sense of what the engines render continue to be streamlined. They resemble special-function browsers. In addition, the analysis engines have been swapped out for newer ones that are more efficient. This should bump up the quality of experience. For example, I suspect that being able to quickly couple and decouple keyword lists and search neighborhoods will be felt primarily in terms of user experience. Meaning becomes an important metric.

Hopefully this tool will provide new ways of understanding texts by providing meaningful representation of relational information.

Posted by reid harward on 3 May 2005, 4:43 pm | Link

Commenting is disabled on posts once they’re 30 days old.

Some sites linking to this entry (Trackbacks)

Automated memetracking
The original concept for memewatch involved a lot more than me just blogging trends, fads, and popular terms of expression. I was hoping to do some analysis of the rise and fall of expressions in various net domains. (The example I always pitched peopl...
At 'Meme List' on Thursday 27 February 2003, 11:57 PM