• Online, Self-Paced
Course Description

Apache Spark is an open-source cluster-computing framework used for data science and it has become the de facto big data framework. In this Skillsoft Aspire course, you will learn how to analyze a Spark DataFrame by treating it as though it were a relational database table. Discover how to create a view from a Spark DataFrame and run SQL queries against it and how to define and explore data in Windows.

Learning Objectives

Accessing Data with Spark: Data Analysis using Spark SQL

  • Course Overview
  • recall the different stages involved in optimizing any query or method call on the contents of a Spark DataFrame
  • create views out of a Spark DataFrame's contents and run queries against them
  • trim and clean a DataFrame before a view is created as a precursor to running SQL queries on it
  • perform an analysis of data by running different kinds of SQL queries, including grouping and aggregations
  • recognize how Spark DataFrames infer the schema of data loaded into them and configure a DataFrame with an explicitly defined schema
  • define what a window is in the context of Spark DataFrames and when they can be used
  • create and analyze categories of data in a dataset using Windows
  • analyze data using Spark SQL

Framework Connections

The materials within this course focus on the Knowledge Skills and Abilities (KSAs) identified within the Specialty Areas listed below. Click to view Specialty Area details within the interactive National Cybersecurity Workforce Framework.