• Online, Self-Paced
Course Description

Hadoop is an Apache Software Foundation project and open source software platform for scalable, distributed computing. Hadoop can provide fast and reliable analysis of both structured data and unstructured data. In this course you will learn about the design principles, the cluster architecture, considerations for servers and operating systems, and how to plan for a deployment. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Learning Objectives

Big Data Engineering

  • start the course
  • describe the principles of supercomputing
  • recall the roles and skills needed for the Hadoop engineering team
  • recall the advantages and shortcomings of using Hadoop as a supercomputing platform

 

Principles of Hadoop Clusters

  • describe the three axioms of supercomputing
  • describe the dumb hardware and smart software, and the share nothing design principles
  • describe the design principles for move processing not data, embrace failure, and build applications not infrastructure

 

Architecture a Hadoop Cluster

  • describe the different rack architectures for Hadoop
  • describe the best practices for scaling a Hadoop cluster

 

Network for the Hadoop Cluster

  • recall the best practices for different types of network clusters

 

Hardware for the Hadoop Cluster

  • recall the primary responsibilities for the master, data, and edge servers
  • recall some of the recommendations for a master server and edge server
  • recall some of the recommendations for a data server

 

Operating Systems for the Hadoop Cluster

  • recall some of the recommendations for an operating system
  • recall some of the recommendations for hostnames and DNS entries

 

Storage for the Hadoop Cluster

  • describe the recommendations for HDD
  • calculate the correct number of disks required for a storage solution
  • compare the use of commodity hardware with enterprise disks

 

Deployment of an Admin Server

  • plan for the development of a Hadoop cluster
  • set up flash drives as boot media
  • set up a kickstart file as boot media
  • set up a network installer

 

Practice: Design a Hadoop Cluster

  • identify the hardware and networking recommendations for a Hadoop cluster

 

Framework Connections

The materials within this course focus on the Knowledge Skills and Abilities (KSAs) identified within the Specialty Areas listed below. Click to view Specialty Area details within the interactive National Cybersecurity Workforce Framework.