Hadoop is a framework written in Java for running applications on large clusters of commodity hardware. In this course we will examine many of the HDFS administration and operational processes required to operate and maintain a Hadoop cluster. We will take a look at how to balance a Hadoop cluster, manage jobs, and perform backup and recovery for HDFS. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.
Learning Objectives
Hadoop Operations
- start the course
- monitor and improve service levels
- deploy a Hadoop release
- describe the purpose of change management
Racks Awareness for Hadoop
- describe rack awareness
- write configuration files for rack awareness
File System Management for HDFS
- start and stop a Hadoop cluster
- write init scripts for Hadoop
- describe the tools fsck and dfsadmin
- use fsck to check the HDFS file system
- set quotas for the HDFS file system
- install and configure trash
DataNode Management for HDFS
- manage an HDFS DataNode
- use include and exclude files to replace a DataNode
- describe the operations for scaling a Hadoop cluster
- add a DataNode to a Hadoop cluster
Balancing a Hadoop Cluster
- describe the process for balancing a Hadoop cluster
- balance a Hadoop cluster
Backup and Recovery for HDFS
- describe the operations involved for backing up data
- use distcp to copy data from one cluster to another
Managing Jobs
- describe MapReduce job management on a Hadoop cluster
- perform MapReduce job management on a Hadoop cluster
Upgrades for a Hadoop Cluster
- plan an upgrade of a Hadoop cluster
Practice: High Availability
- write and complete a plan to install Hbase with high availability