• Online, Self-Paced
Course Description

In this course, you will be introduced to Apache Spark SQL, Datasets, and DataFrames.

Learning Objectives

Apache Spark SQL Introduction

  • start the course
  • describe Apache Spark SQL
  • create a SparkSession
  • create DataFrames with Spark SQL
  • use aggregations with the built-in DataFrames functions
  • run SQL queries programmatically
  • create a global temporary view
  • create Datasets with Spark SQL
  • use JSON Datasets with Spark SQL
  • use Load/Save functions
  • manually specify a data source
  • run SQL directly on files
  • use SaveMode to handle save operations
  • write parquet files with Spark SQL
  • use Spark SQL to save a DataFrame as a persistent table
  • use partitioning when saving persistent tables

Practice: Using Spark SQL

  • use Spark SQL to create Datasets and DataFrames

Framework Connections

The materials within this course focus on the NICE Framework Task, Knowledge, and Skill statements identified within the indicated NICE Framework component(s):

Specialty Areas

  • Data Administration

Feedback

If you would like to provide feedback for this course, please e-mail the NICCS SO at NICCS@hq.dhs.gov.