The Big Data and Hadoop Certification is an ideal course for individuals who aspire to become analysts. With this course, you will be able to understand how the data is processed in huge companies and also you can understand how the switching from Excel-based analytics to real-based analytics can improve the productivity of an organization. Mansard offers the best training in Bangalore in Big Data and Hadoop course. Our training faculty having a decade of experience in providing Big Data and Hadoop training to corporate sectors will ensure to train and guide every learner to meet the expectations of the clients. 

1. Big Data Overview

The Big Data Analytics offers huge benefits for Big Data & Hadoop professionals. This certified course is designed to provide you rich hands-on training in Big Data and help you get an opportunity to work in a reputed company.

2. Course Features

40 – 45 hours of Instructor-led classes, assignments after every class, a live project to get hands-on experience, quizzes and real-time project, self-paced online learning, certification after successful completion of course

3. Who is Eligible to Take Up This Course?

The following professionals can take up the Big Data course offered by Mansard Software Solution.

    • Software Developers
    • Project Managers
    • Software Architects
    • ETL and Data Warehousing Professionals
    • Data Engineers
    • Data Analysts & Business Intelligence Professionals
    • DBAs and DB Professionals
    • Testing Professionals
    • Senior IT Professionals
    • Graduates looking to build a career in the field of Big Data

4. Prerequisites

You should have a basic knowledge of Big Data, Apache Hadoop & Hadoop Tools and basic knowledge of Core Java and SQL.

About the Course

Big Data Training and Certification course gives you the knowledge of the Hadoop Ecosystem and best practices about HDFS, MapsReduce, and many other concepts that are currently in demand in IT companies.

Understanding Big Data and Hadoop

Goal: In this module, you will understand what Big Data is, the limitations of the traditional solutions for Big Data problems, how Hadoop solves those Big Data problems, Hadoop Ecosystem, Hadoop Architecture, HDFS, Anatomy of File Read and Write & how MapReduce works.

  • Introduction to Big Data & Big Data Challenges
  • Limitations & Solutions of Big Data Architecture
  • Hadoop & its Features
  • Hadoop Ecosystem
  • Hadoop 2.x Core Components
  • Hadoop Storage: HDFS (Hadoop Distributed File System)
  • Hadoop Processing: MapReduce Framework
  • Different Hadoop Distributions

Hadoop Architecture and HDFS

Goal: In this module, you will learn Hadoop Cluster Architecture, important configuration files of Hadoop Cluster, Data Loading Techniques using Sqoop & Flume, and how to set up Single Node and Multi-Node Hadoop Cluster.

  • Hadoop 2.x Cluster Architecture
  • Federation and High Availability Architecture
  • Typical Production Hadoop Cluster
  • Hadoop Cluster Modes
  • Common Hadoop Shell Commands
  • Hadoop 2.x Configuration Files
  • Single Node Cluster & Multi-Node Cluster set up
  • Basic Hadoop Administration

Hadoop MapReduce Framework

Goal: In this module, you will understand the Hadoop MapReduce framework comprehensively, the working of MapReduce on data stored in HDFS. You will also learn the advanced MapReduce concepts like Input Splits, Combiner & Partitioner.

  • Traditional way vs. MapReduce way
  • Why MapReduce
  • YARN Components
  • YARN Architecture
  • YARN MapReduce Application Execution Flow
  • YARN Workflow
  • Anatomy of MapReduce Program
  • Input Splits, Relation between Input Splits and HDFS Blocks
  • MapReduce: Combiner & Partitioner
  • Demo of Health Care Dataset
  • Demo of Weather Dataset

 

 

  • Advanced Hadoop MapReduce

Goal: In this module, you will learn Advanced MapReduce concepts such as Counters, Distributed Cache, MRunit, Reduce Join, Custom Input Format, Sequence Input Format and XML parsing.

  • Counters
  • Distributed Cache
  • MRunit
  • Reduce Join
  • Custom Input Format
  • Sequence Input Format
  • XML file Parsing using MapReduce

Apache Pig

Goal: In this module, you will learn Apache Pig, types of use cases where we can use Pig, tight coupling between Pig and MapReduce, and Pig Latin scripting, Pig running modes, Pig UDF, Pig Streaming & Testing Pig Scripts. You will also be working on healthcare dataset.

  • Introduction to Apache Pig
  • MapReduce vs. Pig
  • Pig Components & Pig Execution
  • Pig Data Types & Data Models in Pig
  • Pig Latin Programs
  • Shell and Utility Commands
  • Pig UDF & Pig Streaming
  • Testing Pig scripts with Punit
  • Aviation use-case in PIG
  • Pig Demo of Healthcare Dataset

 Apache Hive

Goal: This module will help you in understanding Hive concepts, Hive Data types, loading and querying data in Hive, running hive scripts and Hive UDF.

  • Introduction to Apache Hive
  • Hive vs Pig
  • Hive Architecture and Components
  • Hive Metastore
  • Limitations of Hive
  • Comparison with Traditional Database
  • Hive Data Types and Data Models
  • Hive Partition
  • Hive Bucketing
  • Hive Tables (Managed Tables and External Tables)
  • Importing Data
  • Querying Data & Managing Outputs
  • Hive Script & Hive UDF
  • Retail use case in Hive
  • Hive Demo on Healthcare Dataset

 Advanced Apache Hive and Base

Goal In this module, you will understand advanced Apache Hive concepts such as UDF, Dynamic Partitioning, Hive indexes and views, and optimizations in Hive. You will also acquire in-depth knowledge of Apache HBase, HBase Architecture, HBase running modes and its components.

  • Hive QL: Joining Tables, Dynamic Partitioning
  • Custom MapReduce Scripts
  • Hive Indexes and views
  • Hive Query Optimizers
  • Hive Thrift Server
  • Hive UDF
  • Apache HBase: Introduction to NoSQL Databases and HBase
  • HBase v/s RDBMS
  • HBase Components
  • HBase Architecture
  • HBase Run Modes
  • HBase Configuration
  • HBase Cluster Deployment

Advanced Apache HBase

Goal: This module will cover advanced Apache HBase concepts. We will see demos on HBase Bulk Loading & HBase Filters. You will also learn what Zookeeper is all about, how it helps in monitoring a cluster & why HBase uses Zookeeper.

  • HBase Data Model
  • HBase Shell
  • HBase Client API
  • Hive Data Loading Techniques
  • Apache Zookeeper Introduction
  • ZooKeeper Data Model
  • Zookeeper Service
  • HBase Bulk Loading
  • Getting and Inserting Data
  • HBase Filters

Processing Distributed Data with Apache Spark

Goal: In this module, you will learn what is Apache Spark, SparkContext & Spark Ecosystem. You will learn how to work in Resilient Distributed Datasets (RDD) in Apache Spark. You will be running the application on Spark Cluster & comparing the performance of MapReduce and Spark.

  • What is Spark
  • Spark Ecosystem
  • Spark Components
  • What is Scala
  • Why Scala
  • SparkContext
  • Spark RDD

Oozie and Hadoop Project

Goal: In this module, you will understand how multiple Hadoop ecosystem components work together to solve Big Data problems. This module will also cover Flume & Sqoop demo, Apache Oozie Workflow Scheduler for Hadoop Jobs, and Hadoop Talend integration.

  • Oozie
  • Oozie Components
  • Oozie Workflow
  • Scheduling Jobs with Oozie Scheduler
  • Demo of Oozie Workflow
  • Oozie Coordinator
  • Oozie Commands
  • Oozie Web Console
  • Oozie for MapReduce
  • Combining flow of MapReduce Jobs
  • Hive in Oozie
  • Hadoop Project Demo
  • Hadoop Talend Integration