I recently read the Big Data Glossary - http://www.amazon.com/Big-Data-Glossary-Pete-Warden/dp/1449314597
Big Data is essentially a MapReduce stack for scatter-gather-aggregate scaleout of compute jobs.
The core tools are:
- Apache Hadoop – a MapReduce scale-out infrastructure
- Hive – SQL language for Hadoop
- Pig – procedural language for Hadoop
- Cascading – orchestration of jobs on Hadoop
- Datameer – BI on Hadoop
- Mahout – distributed machine learning library on Hadoop
- ZooKeeper – work coordinator / monitor
On top of these are various tools & extensions, as well as ports (e.g. HDInsight )
You also need to be aware of elastic cloud platforms to run on, and the various NoSQL DBs tend to be leveraged in this space as well.
Additionally, MapReduce is just an infrastructure pattern for distributed processing of algorithms – you will not get much usage out of it without knowledge of the appropriate algorithms to leverage on the nodes in your compute grid – the whole point of Big Data.