Apache Spark is an open-source cluster-computing framework used for data science and it has become the de facto big data framework. In this Skillsoft Aspire course, you will learn how to analyze a Spark DataFrame by treating it as though it were a relational database table. Discover how to create a view from a Spark DataFrame and run SQL queries against it and how to define and explore data in Windows.
Learning Objectives
Accessing Data with Spark: Data Analysis using Spark SQL
- Course Overview
- recall the different stages involved in optimizing any query or method call on the contents of a Spark DataFrame
- create views out of a Spark DataFrame's contents and run queries against them
- trim and clean a DataFrame before a view is created as a precursor to running SQL queries on it
- perform an analysis of data by running different kinds of SQL queries, including grouping and aggregations
- recognize how Spark DataFrames infer the schema of data loaded into them and configure a DataFrame with an explicitly defined schema
- define what a window is in the context of Spark DataFrames and when they can be used
- create and analyze categories of data in a dataset using Windows
- analyze data using Spark SQL