Hadoop for Developers

Course Curriculum

Module 1:

1.Introduction to Big Data and Hadoop

2.Components of Hadoop and Hadoop Architecture

3.HDFS, Map Reduce & Yarn Deep Dive

4.Installation & Configuration of Hadoop in a VM(Single Node)

5.Multinode Installation(3 Nodes)

a.On Premise in Local Machines


6.Performance tuning, Advanced administration activities, Monitoring the Hadoop Cluster

a.Hadoop Bench Marking(Teragen & Terasort on 10 GB Data)

b.Hadoop Web UI monitoring

c.Advanced Hadoop Administration commands from Cli

d.Tuning the Hadoop cluster by tweaking the Performance tuning Parameters for HDFS & MapReduce framework

e.Node Commissioning(addition) and Decomissioning(Removing)

f.Running Balancer to redistribute the Data in Hadoop

7.Writing MapReduce programs in Java: Wordcount

a.Webserver Log Analysis

b.Recommendation Engine(Product Recommendation generator)

c.Sentiment Analysis

d.Custom Record Readers, Partitioners, Combiners

e.Distributed Copy

8.Introduction and learning to Pig, Pig Latin: Installation & Wordcount

a.Webserver Log analysis

b.Sentiment Analysis

c.Processing JSON data in Pig using Elephant Bird library

d.Advanced Pig processing using Piggybank Library

e.Building Pig UDFs and calling from Pig scripts

9.Advanced Pig Concepts

a.Performance Tuning parameters

b.Controlling parallelism

c.Running Pig Scripts on Tez

10.Introduction and learning to Hive: Installation & Wordcount

a.Webserver Log analysis

b.(Product Based Recomendation)

c.(Product Based Recomendation)

d.Hive Performance Tuning Parameters

e.Loading CSV data, JSON data, etc in Hive

f.Hive File Formats including Text, ORC, Parquet

11.Introduction and learning to Sqoop

a.Advanced Sqoop Import export options using Queries

b.Controlling Parallelism

12.Introduction to Hbase, Installation and HBase Queries

13.Zookeeper for Coordination, Hbase Multinode installation with Zookeeper

14.Cloudera and Hortonworks Distribution of Hadoop

15.Deploying a Multinode Hadoop Cluster using Ambari

16.Workflow Scheduling using Oozie for Automation



Module 2:

  1. Other Components of the Hadoop ecosystem
  2. Flume for Relatime data collection
  3. Kafka for Realtime Log analysis: Log Filtering
  4. Spark for Realtime In memory Analytics
  5. Advanced Spark Concepts, Spark Programming APIs, Spark RDDs
  6. Spark Controlling Parallelism, Partitions & Persistence
  7. Spark SQL
  8. Spark Streaming
  9. Scala Programming Basics to Advanced
  10. Python Introduction & Python Spark programming using PySpark
  11. Spark for Realtime Log analysis: Analytics
  12. Creating and Deploying End-to-End Web Log Analysis Solution
  13. Realtime Log collection using Flume
  14. Filtering the Logs in Kafka
  15. Realtime Threat detection in Spark using Logs from Kafka Stream
  16. Click Stream analysis using Spark
  17. Hadoop MR2 deployment(Yarn) Integration with Spark
  18. Spark Machine Learning concepts and Lambda Architecture
  19. Machine Learning using ML Lib
  20. Customer Churn Modeling using Spark ML Lib
  21. Zeppelin for Data Visualization, Spark Programming in Zeppelin using iPython Notebooks
  22. Case studies & POC – Run Hadoop on a Medium size dataset(~5GB Data), POC can be on relatime project from your company or Duratech's Live project


Duratech Solutions

We offer Web development, Mobile Applications, Enterprise cloud deployment services and data analytics services under one roof. We are an end-to-end solutions provider across platforms and devices.

Web, Mobile, cloud and big data are our key focus areas. We offer solutions built on cutting edge technologies that are state-of-the-art in nature and help transform your business to face increasing customer demands and deliver customer delight.


  +91-89400 03640


  • Vishnav

    One of the best place to explore technology. I did an internship program in DURATECH SOLUTIONS during my final semester. I was doing my project on cloud computing platform in DURATECH SOLUTIONS. It helped to get placed in TCS .Guys this is an awesome place for those who are in thirst of computer knowledge and technologies. You can drench over here. Career growth is guaranteed.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

Keep in Touch