• Online, Self-Paced
Course Description

Hadoop is an open-source, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. This course will introduce Hadoop, and its key tools and their applications.

Learning Objectives

Introduction to Hadoop

  • start the course
  • recognize what Big Data is, sources and types of data, evolution and characteristics of Big Data, and use cases of Big Data
  • identify Big Data infrastructure issues, and explain benefits of Hadoop
  • recognize basics of Hadoop, history, milestones, and core components
  • set up a virtual machine
  • install Linux on a virtual machine

UNIX and JAVA Modeling

  • recognize basic and most useful UNIX commands

Hadoop Data Internals and Interactions

  • identify Hadoop components
  • define HDFS components
  • recognize how to read and write in HDFS
  • use HDFS

MapReduce and YARN

  • recognize basics of YARN
  • define basics of MapReduce
  • identify how MapReduce processes information
  • use code that runs on Hadoop

Ecosystem and Data Type Handlings

  • define Pig, HIVE, and HBase
  • define Sqoop, Flume, Mahout, and Oozie
  • recognize storing and modeling data in Hadoop
  • identify available commercial distributions for Hadoop
  • recognize Spark and its benefits over traditional MapReduce

Practice: Practice Filtering in Hadoop

  • filter information in Hadoop

Framework Connections

The materials within this course focus on the Knowledge Skills and Abilities (KSAs) identified within the Specialty Areas listed below. Click to view Specialty Area details within the interactive National Cybersecurity Workforce Framework.