• Online, Self-Paced
Course Description

Explore the basics of Apache Spark, an analytics engine used for big data processing. It's an open source, cluster computing framework built on top of Hadoop. Discover how it allows operations on data with both its own library methods and with SQL, while delivering great performance. Learn the characteristics, components, and functions of Spark, Hadoop, RDDS, the spark session, and master and worker notes. Install PySpark. Then, initialize a Spark Context and Spark DataFrame from the contents of an RDD and a DataFrame. Configure a DataFrame with a map function. Retrieve and transform data. Finally, convert Spark and Pandas DataFrames and vice versa.

Learning Objectives

Explore the basics of Apache Spark, an analytics engine used for big data processing. It's an open source, cluster computing framework built on top of Hadoop. Discover how it allows operations on data with both its own library methods and with SQL, while delivering great performance. Learn the characteristics, components, and functions of Spark, Hadoop, RDDS, the spark session, and master and worker notes. Install PySpark. Then, initialize a Spark Context and Spark DataFrame from the contents of an RDD and a DataFrame. Configure a DataFrame with a map function. Retrieve and transform data. Finally, convert Spark and Pandas DataFrames and vice versa.

Framework Connections

The materials within this course focus on the NICE Framework Task, Knowledge, and Skill statements identified within the indicated NICE Framework component(s):

Specialty Areas

  • Data Administration

Specialty Areas have been removed from the NICE Framework. With the recent release of the new NICE Framework data, updates to courses are underway. Until this course can be updated, this historical information is provided to give better context as to how it can help you with your cybersecurity goals.

Feedback

If you would like to provide feedback for this course, please e-mail the NICCS SO at NICCS@hq.dhs.gov.