CPB101: Serverless Data Analysis with BigQuery and Cloud Dataflow

Extrema Sistemas Created 2 years ago by extrema extrema

Course Description

Through a combination of instructor-led presentations, demonstrations, and hands-on labs, students learn how to carry out no-ops data warehousing, analysis and pipeline processing.

This 1 day instructor led course builds upon the CPB100 (which is a prerequisite).


This class is intended for data analysts and data scientists responsible for: analyzing and visualizing big data, implementing cloud-based big data solutions, deploying or migrating big data applications to the public cloud, implementing and maintaining large-scale data storage environments, and transforming/processing big data.


Google Cloud Platform Big Data & Machine Learning Fundamentals to the level of CPB 100 Experience using a SQL-like query language to analyze data Knowledge of either Python or Java Duration

Delivery Method

Instructor led or virtual class


Available in English and Spanish


Build up a complex BigQuery using clauses, inner selects, built-in functions and joins

Load and export data to/from BigQuery Identify need for nested, repeated fields and user-defined functions

Understand pipeline processing, terms and concepts

Write pipelines in Java or Python and launch them locally or in the Cloud

Implement Map, Reduce transforms in Dataflow pipelines

Join datasets as side inputs

Interoperate Dataflow, BigQuery and Cloud Pub/Sub for real-time streaming


The course consists of a 3-hour deep dive into the details of BigQuery followed by a 3-hour deep dive into the details of Cloud Dataflow.


Module 0: Welcome [⅓ hr]

We assume that attendees may attended CPB100.



Module 1: Serverless data analysis with BigQuery [3 hr]

A 3 hour (1.5 hours lecture + 1.5 hours hands-on) deep dive into details of BigQuery.

What is BigQuery?

Queries and functions + lab

Load and export data + lab

Advanced Capabilities

Performance and pricing

Module 2: Serverless, autoscaling data pipelines with Dataflow [3 hr]

A 3 hour (1.5 hours lecture + 1.5 hours hands-on) deep dive into details of Cloud Dataflow.

What is Dataflow?

Data pipeline + lab

MapReduce in Dataflow + lab

Side inputs + lab

Streaming + demo

Module 3: Summary [⅓ hr]

Where to go from here