Course Description
Discover how to implement data lakes for real-time data management. Explore data ingestion, data processing, and data life-cycle management using AWS and other open-source ecosystem products.
Learning Objectives
Data Lake: Architectures & Data Management Principles
- implement Lambda and Kappa architectures to manage real-time big data
- identify the benefits of adopting Zaloni data lake reference architecture
- describe data ingestion approaches and compare Avro and Parquet file format benefits
- demonstrate how to ingest data using Sqoop
- describe the data processing strategies provided by MapReduce V2, Hive, Pig, and Yam for processing data with data lakes
- recognize how to derive value from data lakes and describe the benefits of critical roles
- describe the steps involved in the data life cycle and the significance of archival policies
- implement an archival policy to transition between S3 and Glacier, depending on adopted policies
- ingest data using Sqoop and implement an archival policy to transition from S3 to adopted policies