Skip to content

Latest commit

 

History

History
73 lines (54 loc) · 3.28 KB

leveraging-unstructured-data.md

File metadata and controls

73 lines (54 loc) · 3.28 KB

Leveraging Unstructured Data with Cloud Dataproc on Google Cloud Platform

Module 1: Introduction to Cloud Dataproc

  • Unstructured data & google street view

Lab Creating Dataproc Clusters

In this lab, you will create, customize and delete Dataproc clusters using the Web console and the command line interface (CLI). You will also connect to the cluster using SSH, and run a couple simple jobs. You will also access the cluster's Hadoop and HDFS services from the browser.

What you learn In this lab, you:

Module 2: Running Jobs

Lab 2a: Running Pig and Spark programs

In this lab, you will run Pig and Spark programs on a Dataproc cluster.

What you learn In this lab, you:

Lab 2b: Running Dataproc jobs

In this lab, you will create a Dataproc cluster. You will then submit some jobs to the cluster using the Web Console and the CLI. You will also monitor job progress, view job details and view the results of jobs.

What you learn In this lab, you:

Module 3: Leveraging GCP

Lab Overview: Leveraging Google Cloud Platform Services

In this lab, you will create a Dataproc cluster that includes Datalab and the Google Python Client API. You will then create iPython notebooks that integrate with BigQuery and storage and utilize Spark.

What you learn In this lab, you:

Module 4: Analyzing Unstructured Data

Lab: Adding Machine Learning to Big Data Processing

In this lab, you integrate the machine learning APIs into your data analysis. You will write the code to use the Speech, Vision, Translate and Natural Language APIs. You will see how to execute these APIs on your Spark clusters. You will also integrate these services with BigQuery and Storage.

What you learn In this lab, you: ...