Apache Beam, Cloud Dataflow, and Cloud Dataprep can be used to create data pipelines. In this course, you will learn how areas of Beam, Apache Beam SDK, Cloud Dataflow, and Cloud Dataprep assist in pipeline management.
Learning Objectives
Expressing Data Pipelines
- start the course
- define Apache Beam concepts and SDKs
- describe the Python SDK and its connection with data pipelines
- describe the Java SDK and its connection with data pipelines
- initialize Cloud Dataprep
- demonstrate how to ingest data into a pipeline
- create recipes in a Cloud Dataprep pipeline
- work with the import/export process and demonstrate how to run Dataflow jobs in Cloud Dataprep
Big Data Processing
- describe MapReduce and the benefits of Cloud Dataflow over MapReduce
- outline serverless architecture and some of the GCP products supporting data analytics
Practice: Create and Manage Pipelines
- describe the use of Apache Beam, Cloud Dataflow, and Cloud Dataprep in GCP to create and manage pipelines