• Online, Self-Paced
Course Description

In this course you will learn about performing data analysis using Spark SQL and Hive. It is one in a series of courses that prepares learners for exam 70-775: Perform Data Engineering on Microsoft Azure HDInsight.

Learning Objectives

Data Analysis using Spark SQL

  • start the course
  • describe Jupyter and Apache Zeppelin
  • merge DataFrames using Spark SQL
  • describe Apache Parquet
  • manage interactive Livy sessions

Data Analysis using Hive

  • describe what interactive querying is and how its used with Hive
  • use Ambari Views
  • use HiveOL
  • describe how to parse files such as CSV files with Hive
  • use ORC for caching
  • use Hive tables
  • use Zeppelin to visualize data

Practice: Using Spark Data Analysis

  • use data analysis for Spark SQL

Framework Connections

The materials within this course focus on the NICE Framework Task, Knowledge, and Skill statements identified within the indicated NICE Framework component(s):

Specialty Areas

  • Data Administration