- Unstructured data & google street view
In this lab, you will create, customize and delete Dataproc clusters using the Web console and the command line interface (CLI). You will also connect to the cluster using SSH, and run a couple simple jobs. You will also access the cluster's Hadoop and HDFS services from the browser.
What you learn In this lab, you:
- Create a Dataproc cluster from the Web console
- SSH into the cluster and run PySpark jobs
- Add a firewall rule that allows access to your cluster from the browser
- Create, manage and delete Dataproc clusters from the CLI
Begin the lab https://codelabs.developers.google.com/codelabs/cpb102-creating-dataproc-clusters/
In this lab, you will run Pig and Spark programs on a Dataproc cluster.
What you learn In this lab, you:
- SSH into the cluster to run Pig and Spark job
- Create a Cloud Storage bucket to store job input files
- Work with HDFS
Begin the lab https://codelabs.developers.google.com/codelabs/cpb102-running-pig-spark/
In this lab, you will create a Dataproc cluster. You will then submit some jobs to the cluster using the Web Console and the CLI. You will also monitor job progress, view job details and view the results of jobs.
What you learn In this lab, you:
- Create a Cloud Storage bucket to store job input, output and application files
- Submit jobs using the Web Console
- Submit jobs using the CLI
- Monitor job progress and view results
Begin the lab https://codelabs.developers.google.com/codelabs/cpb102-running-dataproc-jobs/
In this lab, you will create a Dataproc cluster that includes Datalab and the Google Python Client API. You will then create iPython notebooks that integrate with BigQuery and storage and utilize Spark.
What you learn In this lab, you:
- Create a Dataproc cluster with an Initialization Action that installs Google Cloud Datalab
- Run Jupyter Notebooks on the Dataproc cluster using Google Cloud Datalab
- Create Python and PySpark jobs that utilize Google Cloud Storage, BigQuery and Spark.
Begin the lab
https://codelabs.developers.google.com/codelabs/cpb102-dataproc-with-gcp/
In this lab, you integrate the machine learning APIs into your data analysis. You will write the code to use the Speech, Vision, Translate and Natural Language APIs. You will see how to execute these APIs on your Spark clusters. You will also integrate these services with BigQuery and Storage.
What you learn In this lab, you: ...
- Enable the Google Cloud Platform machine learning APIs
- Find specific text in a corpus of scanned documents
- Translate a book from English to Spanish using the Translate API
- Perform sentiment analysis on text resulting from a BigQuery query
Begin the lab https://codelabs.developers.google.com/codelabs/cpb102-machine-learning-to-big-data-processing/