Hadoop is an open-source, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. This course will introduce Hadoop, and its key tools and their applications.
Learning Objectives
Introduction to Hadoop
- start the course
- recognize what Big Data is, sources and types of data, evolution and characteristics of Big Data, and use cases of Big Data
- identify Big Data infrastructure issues, and explain benefits of Hadoop
- recognize basics of Hadoop, history, milestones, and core components
- set up a virtual machine
- install Linux on a virtual machine
UNIX and JAVA Modeling
- recognize basic and most useful UNIX commands
Hadoop Data Internals and Interactions
- identify Hadoop components
- define HDFS components
- recognize how to read and write in HDFS
- use HDFS
MapReduce and YARN
- recognize basics of YARN
- define basics of MapReduce
- identify how MapReduce processes information
- use code that runs on Hadoop
Ecosystem and Data Type Handlings
- define Pig, HIVE, and HBase
- define Sqoop, Flume, Mahout, and Oozie
- recognize storing and modeling data in Hadoop
- identify available commercial distributions for Hadoop
- recognize Spark and its benefits over traditional MapReduce
Practice: Practice Filtering in Hadoop
- filter information in Hadoop