• Online, Self-Paced
Course Description

In this course, you will be introduced to Apache Spark SQL, Datasets, and DataFrames.

Learning Objectives

Apache Spark SQL Introduction

  • start the course
  • describe Apache Spark SQL
  • create a SparkSession
  • create DataFrames with Spark SQL
  • use aggregations with the built-in DataFrames functions
  • run SQL queries programmatically
  • create a global temporary view
  • create Datasets with Spark SQL
  • use JSON Datasets with Spark SQL
  • use Load/Save functions
  • manually specify a data source
  • run SQL directly on files
  • use SaveMode to handle save operations
  • write parquet files with Spark SQL
  • use Spark SQL to save a DataFrame as a persistent table
  • use partitioning when saving persistent tables

Practice: Using Spark SQL

  • use Spark SQL to create Datasets and DataFrames

Framework Connections

The materials within this course focus on the Knowledge Skills and Abilities (KSAs) identified within the Specialty Areas listed below. Click to view Specialty Area details within the interactive National Cybersecurity Workforce Framework.