Executing Dataproc implementations with big data can provide a variety of methods. This course will continue the study of Dataproc implementations with Spark and Hadoop using the cloud shell and introduce BigQuery PySpark REPL package.
Learning Objectives
Implementation using Dataproc
- start the course
- describe the various Spark and Hadoop processes that can be performed with Dataproc
- recognize the benefits of separating storage and compute services using Cloud Dataproc
- recall the process of monitoring and logging Dataproc jobs
- demonstrate the process of using an SSH tunnel to connect to the master and worker nodes in a cluster
- define the Spark REPL package and how it's used in Linux
Implementation using Cloud Shell
- describe the compute and storage processes and the benefits of their separation and the virtualized distribution of Hadoop
- define BigQuery and its benefits for large-scale analytics
- describe the MapReduce programming model
- demonstrate the process of submitting multiple jobs with Dataproc
Practice: Dataproc Implementations
- recognize the various Dataproc and Cloud Shell job operations and implementations