Course Description
Discover how to work with Spark and its in-memory capabilities of data management. How to manage and troubleshoot HDInsight clusters using Ambari and the Azure CLI tool is also covered.
Learning Objectives
Data Warehousing with Hadoop: Spark, HDInsight and Cluster Management
- specify the essential capabilities of Spark and its essential architectural components
- list the data structures along with the RDD and lineage concepts that are used in Spark
- set up Spark clusters using PowerShell and Azure Resource Manager template
- describe the relationship between Spark SQL and Hive
- specify the essential concepts of Spark SQL and DataFrame
- demonstrate the approach of customizing HDInsight clusters using bootstrap
- install Hadoop applications on Azure HDInsight
- illustrate the usage of Ambari as a tool in order to manage clusters
- manage Hadoop clusters in HDInsight using Azure CLI
- specify the approach of troubleshooting and tuning HDInsight clusters
- monitor Hadoop clusters in HDInsight to collect metrics for analysis
- set up Spark clusters and manage the clusters using Ambari GUI