Geeks With Blogs

@JReuben1
  • JReuben1 AngularJS Directive templateUrl --> halfway to W3C WebComponents ! about 556 days ago
  • JReuben1 Yeoman AngularJS generator - generate controllers, views, routes, services - NICE! about 557 days ago
  • JReuben1 A comparison of HTML5 Canvas 2D JS libs http://t.co/fcB7jvnhqY KineticJS , EaselJS, fabric.js, Paper.js, processing.js seen as the leaders about 558 days ago

Josh Reuben

I recently read the Big Data Glossary - http://www.amazon.com/Big-Data-Glossary-Pete-Warden/dp/1449314597

 

big_data_glossary

Big Data is essentially a MapReduce stack for scatter-gather-aggregate scaleout of compute jobs.

The core tools are:

  • Apache Hadoop – a MapReduce scale-out infrastructure
  • Hive – SQL language for Hadoop
  • Pig – procedural language for Hadoop
  • Cascading – orchestration of jobs on Hadoop
  • Datameer – BI on Hadoop
  • Mahout – distributed machine learning library on Hadoop
  • ZooKeeper – work coordinator / monitor

On top of these are various tools & extensions, as well as ports (e.g. HDInsight )

You also need to be aware of elastic cloud platforms to run on, and the various NoSQL DBs tend to be leveraged in this space as well.

Additionally, MapReduce is just an infrastructure pattern for distributed processing of algorithms – you will not get much usage out of it without knowledge of the appropriate algorithms to leverage on the nodes in your compute grid – the whole point of Big Data.

Posted on Tuesday, December 25, 2012 11:39 AM | Back to top


Comments on this post: Big Data–Where to Start

No comments posted yet.
Your comment:
 (will show your gravatar)


Copyright © JoshReuben | Powered by: GeeksWithBlogs.net | Join free