Geeks With Blogs
John Conwell: aka Turbo Research in the visual exploration of data
NOTE: These patterns assume you have a working understanding of Akka and Akka's JavaTestKitTesting Actors In Complete IsolationAKKA is all about building hierarchies of actors to represent a system. I've found this is a great way to decompose, design, and if needed distribute a system. This means your system becomes a hierarchy of actors, where parent ......

About a month ago'ish I read some very sad news. Microsoft announced that they were killing off the DryadLinq (or LINQ-to-HPC) project in favor of Hadoop. I was one of the first users of DryadLinq outside of Microsoft, back when it was a pre-alpha project inside Microsoft Research. My company had a running HPC cluster and my boss convinced me to install ......

In my previous post I mentioned a possible bug with MongoDb's MapReduce. Well, I played with it a bit further. I went ahead and added a string field to each document to hold the numeric Twitter post id as a string value. Then changed my map function to use this string value instead of the numeric id. This time the duplicate post query worked perfectly. ......

I wanted to run a simple sanity check to make sure I didn't have any duplicate twitter posts in my database. I didn't think I had any, but you never can be too sure, so I whipped up a simple MapReduce query to check. Right now I'm storing twitter posts in MongoDb using the document schema shown below: { category post { post_id created_date from_user ......

I'm still getting used to this whole schema-less, document oriented database thing. I've been trying to determine out what document structure to store my Twitter data in, and I keep flipping back and forth. I've been writing software in the land of relational databases for 15 years, and consider myself pretty good at designing database schemas for enterprise ......

When using the Twitter Search API, the returned posts contain a "created_at" date time stamp with the time set to GMT time. This becomes an issue when using pymongo to store the datetime object in MongoDB. When a datetime object gets stored in MongoDb, mongo (or the pymongo library) updates the datetime object to reflect the current timezone. So in ......

In my quest to dig into how people use social media to communicate about wine I ran into a snag. One of the things I really wanted to do was collect geographic data about each post. When you do a twitter search, a post can contain latitude and longitude data if the source supports GPS, but so far almost none of the posts about wine actually have this ......

Ok lesson learned. I now know why twitter data gives all ids as numeric and string data format. When I first saw this, I thought "what a waste of data, repeating all numeric fields as both strings and numeric. Thats stupid". Then I started noticing something odd. In my python script, my twitter ids weren't matching up to what I had in MongoDB. Turns ......

I've been playing with a new side project; wine data analytics mined from Twitter, and stored in MongoDb. I took a first stab at writing a Python celery service to search for wine related twitter posts and dump them in MongoDb. First lesson learned is that a ton of people post about the word 'Wine' on Sunday afternoons. I wrote the service to pull 100 ......

In the past three weeks, I've noticed that my laptop (dual core 2.1GHz, 2Gb RAM) has become amazingly sluggish. I only uses for communications and data lookup workflows, so the slowness was tolerable. But today I finally got fed up with the suckyness and decided to get to the root of the problem (I do have strong performance roots after all). It actually ......

Despite what execs at Microsoft think, Windows 7 is NOT a tablet OS. Just because you can install some software (or OS) on a device, doesn't mean that device is meant to run that software. This seems to be the step that the non-engineer execs at Microsoft have seem to not understood. In order to seamlessly work with a device, the software needs to be ......

I've been playing with Microsoft Live Labs Pivot to create a hierarchy of collections all linked together to allow someone to explore a hierarchy of data visually. The problem has been the generation time of the entire hierarchy. I end up creating 500 - 600 collections total and it takes hours and hours using the CollectionCreator class that comes with ......

Learning how to programmatically make collections for Microsoft Live Labs Pivot has been a pretty interesting ride. There are very few examples out there, and the folks at MS Live Labs are often slow on any feedback. But that is what Reflector is for, right? Well, I was creating these InfoCard images (similar to the Car images in the "New Cars" sample ......

I've spent the past 3 weeks working a lot with Pivot from Microsoft Live Labs (http://getpivot.com/). Pivot is a tool that allows you to visually explore data. Its an interesting take on visual data mining. Anyway, I've been writing a lot of code that creates a hierarchy of Pivot collections, where one item in the collection drills down into an entirly ......

This is something I've been wondering about for about a year now. Microsoft has a history of creating very useful products, with lots of useful features. But useful does not mean usable. A lot of stuff coming out of Redmond the past 10 years don't really seem to have been well thought out from a user design point of view. Lots of extra steps, lots of ......

I've spent the past 7 years focused primarily on code and database performance. It's an area that I have a passion for, as well as a propensity. But what I've found is that its very hard to change the culture of a development environment. You can teach performance, you can encourage performance, you might see slight shift in how devs think about performance. ......

Recently a performance bug came my way. A highly multithreaded application, that can run for hours depending on the amount of data its processing, was observed having all its CPUs ramping up to 100% utilization, and the amount of data processed per second dropped down to nothing. Ok, no big deal here. I've most likely got a state where all threads are ......

Although most of topics I've written about are pretty random, I'll try to focus in on a much more narrow (yet incredibly broad) topic: multi core vs many core processing, parallel processing, and the paradigm shift that we software engineers are on the leading edge of having to face. To put it in short Intel, AMD, and other hardware manufacturers are ......

Generally I'm not one to write a post that does nothing but highlight someone else's blog post...BUT...this one was important enough (IMHO) that I decided to break my own rule. Are you building a Leatherman or a Samurai sword? (stupid linker isnt working) http://petewarden.typepad.c... As programmers we always ......

For grins I looked at my code that calls: T tmp = new T(); in Reflector, so see if it could shed any light into T instance creation badness. Well, it turns out that the C# compiler spits out code to call Activator.CreateInstance T tmp = Activator.CreateInstance<... I kind of get why the C# compiler does this, because it doesnt know what T is ......

I recently needed to change how an array lookup worked to make it more efficient, and decided to use the List<T>.BinarySearch to do the lookup. The class that contained this lookup had a generic parameter, and was constrained like so: public class SortedNameList<T> where T : class, INameValueItem, new() {...} where the T of List<T> ......

I get a bit sick of checking for null on my IEnumerable objects before doing a foreach over them. In my opinion I think the CLR should check if the list is null, and if it is just exit out of the foreach iteration as if there were no items in it. Well, I was goofing around with Extension Methods a bit and figured out how to get this kind of functionality ......

I recently profiled a sproc that makes heavy use of the TSQL SUBSTRING function (hundreds of thousands of times) to see how it performs on a SQL 2005 database compared to a SQL 2000 database. Much to my surprise the SQL 2005 database performed worse...dramatically worse than SQL 2000. After much researching it turns out the problem is that the column ......

I was looking at Refletor addins the other day and ran across one that would be an amazing time saver. Its an addin that generates the Reflection.Emit code! Anyone who has ever spent any time with the Reflection.Emit namespace should immediately realize how wonderful this tool has the potential to be (as long as the generated code is of good quality ......

I've spent a lot of time lately thinking about instrumentation and how to integrate it into software projects. As a performance engineer I tend to think about instrumentation from the point of view of someone who wants to record the details of what a system is doing, and then dig through the data and use it to figure out what is wrong. But I’ve been ......

Copyright © John Conwell | Powered by: GeeksWithBlogs.net | Join free