CPB101: Serverless Data Analysis with BigQuery and Cloud Dataflow
Through a combination of instructor-led presentations, demonstrations, and hands-on labs, students learn how to carry out no-ops data warehousing, analysis and pipeline processing.
This 1 day instructor led course builds upon the CPB100 (which is a prerequisite).
This class is intended for data analysts and data scientists responsible for: analyzing and visualizing big data, implementing cloud-based big data solutions, deploying or migrating big data applications to the public cloud, implementing and maintaining large-scale data storage environments, and transforming/processing big data.
Google Cloud Platform Big Data & Machine Learning Fundamentals to the level of CPB 100 Experience using a SQL-like query language to analyze data Knowledge of either Python or Java Duration
Instructor led or virtual class
Available in English and Spanish
Build up a complex BigQuery using clauses, inner selects, built-in functions and joins
Load and export data to/from BigQuery Identify need for nested, repeated fields and user-defined functions
Understand pipeline processing, terms and concepts
Write pipelines in Java or Python and launch them locally or in the Cloud
Implement Map, Reduce transforms in Dataflow pipelines
Join datasets as side inputs
Interoperate Dataflow, BigQuery and Cloud Pub/Sub for real-time streaming
The course consists of a 3-hour deep dive into the details of BigQuery followed by a 3-hour deep dive into the details of Cloud Dataflow.
Module 0: Welcome [⅓ hr]
We assume that attendees may attended CPB100.
Module 1: Serverless data analysis with BigQuery [3 hr]
A 3 hour (1.5 hours lecture + 1.5 hours hands-on) deep dive into details of BigQuery.
What is BigQuery?
Queries and functions + lab
Load and export data + lab
Performance and pricing
Module 2: Serverless, autoscaling data pipelines with Dataflow [3 hr]
A 3 hour (1.5 hours lecture + 1.5 hours hands-on) deep dive into details of Cloud Dataflow.
What is Dataflow?
Data pipeline + lab
MapReduce in Dataflow + lab
Side inputs + lab
Streaming + demo
Module 3: Summary [⅓ hr]
Where to go from here