The Apache Hadoop software library is a framework that allows for the distributed processing of large datasets across clusters of computers using a simple programming model. Hadoop can scale up from single servers to thousands of machines, each offering local computation and storage. This course will focus on performance tuning of the Hadoop cluster. We will examine best practices and recommendations for performance tuning of the operating system, memory, HDFS, YARN and MapReduce. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.
Learning Objectives
Performance Tuning Hadoop Clusters
- start the course
- recall the three main functions of service capacity
- describe different strategies of performance tuning
Performance Tuning Networks
- list some of the best practices for network tuning
- install compression
Performance Tuning Servers
- describe the configuration files and parameters used in performance tuning of the operating system
- describe the purpose of Java tuning
- recall some of the rules for tuning the datanode
Performance Tuning Memory
- describe the configuration files and parameters used in performance tuning of memory for daemons
- describe the purpose of memory tuning for YARN
- recall why the Node Manager kills containers
- performance tune memory for the Hadoop cluster
Performance Tuning HDFS
- describe the configuration files and parameters used in performance tuning of HDFS
- describe the sizing and balancing of the HDFS data blocks
- describe the use of TestDFSIO
- performance tune HDFS
Performance Tuning YARN
- describe the configuration files and parameters used in performance tuning of YARN
- configure Speculative execution
- describe the configuration files and parameters used in performance tuning of MapReduce
- tune up MapReduce for performance reasons
- describe the practice of benchmarking on a Hadoop cluster
- describe the different tools used for benchmarking a cluster
- perform a benchmark of a Hadoop cluster
Modeling Applications
- describe the purpose of application modeling
Practice: Performance Tuning
- optimize memory and benchmark a Hadoop cluster