
· Introduction
· The Motivation for Hadoop
o Problems with traditional large-scale systems
o Requirements for a new approach
· Hadoop: Basic Concepts
o An Overview of Hadoop
o The Hadoop Distributed File System
o Hands-On Exercise
o How Map Reduce Works
o Anatomy of a Hadoop Cluster
o Other Hadoop Ecosystem Components
· Writing a Map Reduce Program
o The Map Reduce Flow
o Examining a Sample Map Reduce Program
o Basic Map Reduce API Concepts
o The Driver Code
o The Mapper
o The Reducer
o Hadoop’s Streaming API
o Using Eclipse for Rapid Development
o Hands-on exercise
o The New Map Reduce API
· Integrating Hadoop Into The Workflow
o Relational Database Management Systems
o Storage Systems
o Importing Data from RDBMSs With Sqoop
o Hands-on exercise
o Importing Real-Time Data with Flume
o Accessing HDFS Using Fussed and Hoop
· Delving Deeper Into The Hadoop API
o More about Tool Runner
o Testing with MR Unit
o Reducing Intermediate Data With Combiners
o The configure and close methods for Map/Reduce Setup and Teardown
o Writing Practitioners for Better Load Balancing
o Hands-On Exercise
o Directly Accessing HDFS
o Using the Distributed Cache
· Common Map Reduce Algorithms
o Sorting and Searching
o Indexing
o Machine Learning With Mahout
o Term Frequency – Inverse Document Frequency
o Word Co-Occurrence
o Hands-On Exercise
· Using Hive and Pig
o Hive Basics
o Pig Basics
o Hands-on exercise
· Practical Development Tips and Techniques
o Debugging Map Reduce Code
o Using Local Job Runner Mode For Easier Debugging
o Retrieving Job Information with Counters
o Logging
o Split table File Formats
o Determining the Optimal Number of Reducers
o Map-Only Map Reduce Jobs
o Hands-On Exercise
· More Advanced Map Reduce Programming
o Custom Writable and Writable Comparables
o Saving Binary Data using Sequence Files and Avro Files
o Creating Input Formats and Output Formats
o Hands-On Exercise
· Joining Data Sets in Map Reduce
o Map-Side Joins
o The Secondary Sort
o Reduce-Side Joins
· Graph Manipulation in Hadoop
o Introduction to graph techniques
o Representing graphs in Hadoop
o Implementing a sample algorithm: Single Source Shortest Path
· Creating Workflows With Oozie
o The Motivation for Oozie
o Oozie’s Workflow Definition Format
o Hands-On Exercise