• Online, Self-Paced
Course Description

Traditional data warehousing is transitioning to be more cloud-based and this can be a key area that must be mastered for data science. In this course, you will discover how to build a data lake on the AWS cloud by storing data in S3 buckets and indexing this data using AWS Glue. Explore how to run crawlers to automatically crawl data in S3 to generate metadata tables in Glue.

Learning Objectives

Data Silos, Lakes, and Streams: Data Lakes on AWS

  • configure a custom role with specific permissions on AWS
  • create an S3 bucket and upload files
  • recognize the different operations that can be performed using the AWS Glue console
  • create metadata tables in Glue using the web console
  • perform queries on the Glue data catalog using Athena
  • perform data crawling on S3 to automatically detect schemas
  • execute queries on data in crawled tables
  • perform crawling operations with multiple files in the same path
  • merge data stored in multiple files in the same folder path
  • merge data when files have the exact same schema
  • recall the roles and features of the different AWS services used in the data lake architecture

Framework Connections

The materials within this course focus on the NICE Framework Task, Knowledge, and Skill statements identified within the indicated NICE Framework component(s):

Specialty Areas

  • Data Administration
  • Systems Architecture

Feedback

If you would like to provide feedback for this course, please e-mail the NICCS SO at NICCS@hq.dhs.gov.