Call Us @ +91-89400 03640 | On Demand Course

Hadoop for Developers

Course Curriculum

Module 1:

1.Introduction to Big Data and Hadoop

2.Components of Hadoop and Hadoop Architecture

3.HDFS, Map Reduce & Yarn Deep Dive

4.Installation & Configuration of Hadoop in a VM(Single Node)

5.Multinode Installation(3 Nodes)

a.On Premise in Local Machines

b.Cloud

6.Performance tuning, Advanced administration activities, Monitoring the Hadoop Cluster

a.Hadoop Bench Marking(Teragen & Terasort on 10 GB Data)

b.Hadoop Web UI monitoring

c.Advanced Hadoop Administration commands from Cli

d.Tuning the Hadoop cluster by tweaking the Performance tuning Parameters for HDFS & MapReduce framework

e.Node Commissioning(addition) and Decomissioning(Removing)

f.Running Balancer to redistribute the Data in Hadoop

7.Writing MapReduce programs in Java: Wordcount

a.Webserver Log Analysis

b.Recommendation Engine(Product Recommendation generator)

c.Sentiment Analysis

d.Custom Record Readers, Partitioners, Combiners

e.Distributed Copy

8.Introduction and learning to Pig, Pig Latin: Installation & Wordcount

a.Webserver Log analysis

b.Sentiment Analysis

c.Processing JSON data in Pig using Elephant Bird library

d.Advanced Pig processing using Piggybank Library

e.Building Pig UDFs and calling from Pig scripts

9.Advanced Pig Concepts

a.Performance Tuning parameters

b.Controlling parallelism

c.Running Pig Scripts on Tez

10.Introduction and learning to Hive: Installation & Wordcount

a.Webserver Log analysis

b.(Product Based Recomendation)

c.(Product Based Recomendation)

d.Hive Performance Tuning Parameters

e.Loading CSV data, JSON data, etc in Hive

f.Hive File Formats including Text, ORC, Parquet

11.Introduction and learning to Sqoop

a.Advanced Sqoop Import export options using Queries

b.Controlling Parallelism

12.Introduction to Hbase, Installation and HBase Queries

13.Zookeeper for Coordination, Hbase Multinode installation with Zookeeper

14.Cloudera and Hortonworks Distribution of Hadoop

15.Deploying a Multinode Hadoop Cluster using Ambari

16.Workflow Scheduling using Oozie for Automation

 

 

Module 2:

  1. Other Components of the Hadoop ecosystem
  2. Flume for Relatime data collection
  3. Kafka for Realtime Log analysis: Log Filtering
  4. Spark for Realtime In memory Analytics
  5. Advanced Spark Concepts, Spark Programming APIs, Spark RDDs
  6. Spark Controlling Parallelism, Partitions & Persistence
  7. Spark SQL
  8. Spark Streaming
  9. Scala Programming Basics to Advanced
  10. Python Introduction & Python Spark programming using PySpark
  11. Spark for Realtime Log analysis: Analytics
  12. Creating and Deploying End-to-End Web Log Analysis Solution
  13. Realtime Log collection using Flume
  14. Filtering the Logs in Kafka
  15. Realtime Threat detection in Spark using Logs from Kafka Stream
  16. Click Stream analysis using Spark
  17. Hadoop MR2 deployment(Yarn) Integration with Spark
  18. Spark Machine Learning concepts and Lambda Architecture
  19. Machine Learning using ML Lib
  20. Customer Churn Modeling using Spark ML Lib
  21. Zeppelin for Data Visualization, Spark Programming in Zeppelin using iPython Notebooks
  22. Case studies & POC – Run Hadoop on a Medium size dataset(~5GB Data), POC can be on relatime project from your company or Duratech's Live project

 

Testimonials

  • Sashi K | Cloud

    The depth of content is very clear from the scratch to the industry needs and I have the confidence to build my own applications.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

Duratech Solutions

Duratech Solutions is incorporated in 2012 and has successfully operated in the global software development industry for 7 Years.

We are the leaders in Coimbatore offering Trainings in Bigdata and Data Science, we are the only training provider in Coimbatore offering Deep Learning, the highest level of Machine Learning & Artificial Intelligence Technology. Our students have got placed in various companies like IBM, Sonata Software, Deloitte, etc

Contact Us

  enquiry@duratechsolutions.in
  +0422-4200383
  +91-89400 03640
 Sai Baba Colony Branch & Peelamedu Branch,   Coimbatore

Keep in Touch