Course Duration: 3 Months

Course Details

·        Introduction

·        The Motivation for Hadoop

o   Problems with traditional large-scale systems

o   Requirements for a new approach


·        Hadoop: Basic Concepts

o   An Overview of Hadoop

o   The Hadoop Distributed File System

o   Hands-On Exercise

o   How Map Reduce Works

o   Anatomy of a Hadoop Cluster

o   Other Hadoop Ecosystem Components


 ·        Writing a Map Reduce Program

o   The Map Reduce Flow

o   Examining a Sample Map Reduce Program

o   Basic Map Reduce API Concepts

o   The Driver Code

o   The Mapper

o   The Reducer

o   Hadoop’s Streaming API

o   Using Eclipse for Rapid Development

o   Hands-on exercise

o   The New Map Reduce API


·        Integrating Hadoop Into The Workflow

o   Relational Database Management Systems

o   Storage Systems

o   Importing Data from RDBMSs With Sqoop

o   Hands-on exercise

o   Importing Real-Time Data with Flume

o   Accessing HDFS Using Fussed and Hoop


·        Delving Deeper Into The Hadoop API

o   More about Tool Runner

o   Testing with MR Unit

o   Reducing Intermediate Data With Combiners

o   The configure and close methods for Map/Reduce Setup and Teardown

o   Writing Practitioners for Better Load Balancing

o   Hands-On Exercise

o   Directly Accessing HDFS

o   Using the Distributed Cache


·        Common Map Reduce Algorithms

o   Sorting and Searching

o   Indexing

o   Machine Learning With Mahout

o   Term Frequency – Inverse Document Frequency

o   Word Co-Occurrence

o   Hands-On Exercise


·        Using Hive and Pig

o   Hive Basics

o   Pig Basics

o   Hands-on exercise


·        Practical Development Tips and Techniques

o   Debugging Map Reduce Code

o   Using Local Job Runner Mode For Easier Debugging

o   Retrieving Job Information with Counters

o   Logging

o   Split table File Formats

o   Determining the Optimal Number of Reducers

o   Map-Only Map Reduce Jobs

o   Hands-On Exercise




·        More Advanced Map Reduce Programming

o   Custom Writable and Writable Comparables

o   Saving Binary Data using Sequence Files and Avro Files

o   Creating Input Formats and Output Formats

o   Hands-On Exercise


·        Joining Data Sets in Map Reduce

o   Map-Side Joins

o   The Secondary Sort

o   Reduce-Side Joins


·        Graph Manipulation in Hadoop

o   Introduction to graph techniques

o   Representing graphs in Hadoop

o   Implementing a sample algorithm: Single Source Shortest Path


·        Creating Workflows With Oozie

o   The Motivation for Oozie

o   Oozie’s Workflow Definition Format

o   Hands-On Exercise

Are you providing Training Classes
IT Courses / Govt Exam Preparation
Higher Studies / Studies Abroad
NEW Free Companies Hiring Updates //nu PM