Hadoop is an Apache Software Foundation project and open source software platform for scalable, distributed computing. Hadoop can provide fast and reliable analysis of both structured data and unstructured data. In this course you will learn about the design principles, the cluster architecture, considerations for servers and operating systems, and how to plan for a deployment. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.
Learning Objectives
Big Data Engineering
- start the course
- describe the principles of supercomputing
- recall the roles and skills needed for the Hadoop engineering team
- recall the advantages and shortcomings of using Hadoop as a supercomputing platform
Principles of Hadoop Clusters
- describe the three axioms of supercomputing
- describe the dumb hardware and smart software, and the share nothing design principles
- describe the design principles for move processing not data, embrace failure, and build applications not infrastructure
Architecture a Hadoop Cluster
- describe the different rack architectures for Hadoop
- describe the best practices for scaling a Hadoop cluster
Network for the Hadoop Cluster
- recall the best practices for different types of network clusters
Hardware for the Hadoop Cluster
- recall the primary responsibilities for the master, data, and edge servers
- recall some of the recommendations for a master server and edge server
- recall some of the recommendations for a data server
Operating Systems for the Hadoop Cluster
- recall some of the recommendations for an operating system
- recall some of the recommendations for hostnames and DNS entries
Storage for the Hadoop Cluster
- describe the recommendations for HDD
- calculate the correct number of disks required for a storage solution
- compare the use of commodity hardware with enterprise disks
Deployment of an Admin Server
- plan for the development of a Hadoop cluster
- set up flash drives as boot media
- set up a kickstart file as boot media
- set up a network installer
Practice: Design a Hadoop Cluster
- identify the hardware and networking recommendations for a Hadoop cluster