Data Mining

First experience with Twitter Search API and MongoDB

I've been playing with a new side project; wine data analytics mined from Twitter, and stored in MongoDb. I took a first stab at writing a Python celery service to search for wine related twitter posts and dump them in MongoDb. First lesson learned is that a ton of people post about the word 'Wine' on Sunday afternoons. I wrote the service to pull 100 posts, then search again using the lowest post id as the max id to return. After storing 11K posts in MongoDb I looked at the earliest date and realized ......

Seeking questions about creating Microsoft Live Labs Pivot collections

I've spent the past 3 weeks working a lot with Pivot from Microsoft Live Labs (http://getpivot.com/). Pivot is a tool that allows you to visually explore data. Its an interesting take on visual data mining. Anyway, I've been writing a lot of code that creates a hierarchy of Pivot collections, where one item in the collection drills down into an entirly new collection. The dev community around Pivot is still very young, so there isnt much tribal knowledge built up yet. I've spent a lot of time trying ......

Code Analysis Tools

I've started working on a collection of code analysis tools, that are open source and available for anyone to use. I've got a descriptive article located at CodeProject.com (http://www.codeproject.com... which includes the source code and binaries. The three main tools that I have so far are the following: A “Not Used Finder”: searches through a list of assemblies and looks for any type, method or field that isnt ever used. Points out code that you should ......

Multithreaded K-Means Clustering

I've been playing around with K-Means clustering lately and found an example in C# here. I worked on some optimizations to this code and made the clustering work on multiple threads which gave it a 40% - 60% decrease in execution time depending on how many vectors are being clustered, how many clusters are generated, and how many dimentions there are per vector. Here is the source for the optimized K-Means algorithm ......