Geeks With Blogs

John Conwell: aka Turbo Research in the visual exploration of data
I've been playing with a new side project; wine data analytics mined from Twitter, and stored in MongoDb.

I took a first stab at writing a Python celery service to search for wine related twitter posts and dump them in MongoDb. First lesson learned is that a ton of people post about the word 'Wine' on Sunday afternoons. I wrote the service to pull 100 posts, then search again using the lowest post id as the max id to return. After storing 11K posts in MongoDb I looked at the earliest date and realized that it was just a few hours worth of data (vs what I assumed would be several days worth). 'Wine' is just way too broad. So I changed my list of search terms to NOT include just the single word 'Wine'. Depending on how all this data stores, and how much disk space it takes, I might have to go with more of a streaming analytics approach, where I just read the data, add it to the aggregations, and let it go. We'll see.

During all this, it was good to play with MongoDb. I have an instance running on my macbook. Managing it via the terminal was ok, but found a good mac based UI tool called MongoHub that works pretty good. There is also a JNLP browser based tool called MongoBrowser. Its ok for very simple things when you dont have a lot of data, but its not very functional.

Posted on Monday, May 2, 2011 4:34 PM Data Mining , MongoDb | Back to top

Comments on this post: First experience with Twitter Search API and MongoDB

No comments posted yet.
Your comment:
 (will show your gravatar)

Copyright © John Conwell | Powered by: