Geeks With Blogs

  • JReuben1 AngularJS Directive templateUrl --> halfway to W3C WebComponents ! about 684 days ago
  • JReuben1 Yeoman AngularJS generator - generate controllers, views, routes, services - NICE! about 685 days ago
  • JReuben1 A comparison of HTML5 Canvas 2D JS libs KineticJS , EaselJS, fabric.js, Paper.js, processing.js seen as the leaders about 686 days ago

Josh Reuben

I recently read the Big Data Glossary -



Big Data is essentially a MapReduce stack for scatter-gather-aggregate scaleout of compute jobs.

The core tools are:

  • Apache Hadoop – a MapReduce scale-out infrastructure
  • Hive – SQL language for Hadoop
  • Pig – procedural language for Hadoop
  • Cascading – orchestration of jobs on Hadoop
  • Datameer – BI on Hadoop
  • Mahout – distributed machine learning library on Hadoop
  • ZooKeeper – work coordinator / monitor

On top of these are various tools & extensions, as well as ports (e.g. HDInsight )

You also need to be aware of elastic cloud platforms to run on, and the various NoSQL DBs tend to be leveraged in this space as well.

Additionally, MapReduce is just an infrastructure pattern for distributed processing of algorithms – you will not get much usage out of it without knowledge of the appropriate algorithms to leverage on the nodes in your compute grid – the whole point of Big Data.

Posted on Tuesday, December 25, 2012 11:39 AM | Back to top

Comments on this post: Big Data–Where to Start

No comments posted yet.
Your comment:
 (will show your gravatar)

Copyright © JoshReuben | Powered by: | Join free