Ok lesson learned. I now know why twitter data gives all ids as numeric and string data format. When I first saw this, I thought "what a waste of data, repeating all numeric fields as both strings and numeric. Thats stupid". Then I started noticing something odd. In my python script, my twitter ids weren't matching up to what I had in MongoDB.

Turns out Python (simplejson to be exact) takes all numerics from the Twitter json document and turns it into an int. But if the number is too big, it overflows back to a negative number, or worse a smaller positive number. So, note to self: explicitly take all numeric data from twitter from the string fields, and manually turn them into longs.