DSCI-272: Predicting with Cloudera Machine Learning

28 hours

1840,00 €

Classroom or Live Virtual Class

Description
Addressed to
Objectives
Topics
Request Info

Request Info

Currently there are no classrooms scheduled for this course

Description

Enterprise data science teams need collaborative access to business data, tools, and computing resources required to develop and deploy machine learning workflows. Cloudera Machine Learning (CML), part of the Cloudera Data Platform (CDP), provides the solution, giving data science teams the required resources.

This course covers machine learning workflows and operations using CML. Participants explore, visualize, and analyze data. You will also train, evaluate, and deploy machine learning models.

The course walks through an end-to-end data science and machine learning workflow based on realistic scenarios and datasets from a fictitious technology company. The demonstrations and exercises are conducted in Python (with PySpark) using CML.

Audience and prerequisites

The course is designed for data scientists who need to understand how to utilize Cloudera Machine Learning and the Cloudera Data Platform to achieve faster model development and deliver production machine learning at scale. Data engineers, developers, and solution architects who collaborate with data scientists will also find this course valuable.

Participants should have a basic understanding of Python or R and some experience exploring and analyzing data and developing statistical or machine learning models. Knowledge of Spark, Hadoop, or the Cloudera platform is not required.

Objectives

Students who successfully complete this course will be able to:

Utilize Cloudera SDX and other components of the Cloudera Data Platform to locate data for machine learning experiments.
Use an Applied ML Prototype (AMP).
Manage machine learning experiments.
Connect to various data sources and explore data.
Utilize Apache Spark and Spark ML.
Deploy an ML model as a REST API.
Manage and monitor deployed ML models.

Topics

Introduction to CML

Overview.
CML Versus CDSW.
ML Workspaces.
Workspace Roles.
Projects and Teams.
Settings.
Runtimes/Legacy Engines.

Introduction to AMPs and the Workbench

Editors and IDE.
Git.
Embedded Web Applications.
AMPs.

Data Access and Lineage

SDX Overview.
Data Catalog.
Authorization.
Lineage.

Data Visualization in CMLe

Data Visualization Overview.
CDP Data Visualization Concepts.
Using Data Visualization in CML.

Experiments

Experiments in CML.

An Introduction to the CML Native Workbench

Entering Code.
Getting Help.
Accessing the Linux Command Line.
Working With Python Packages.
Formatting Session Output.

Spark Overview

How Spark Works.
The Spark Stack.
File Formats in Spark.
Spark Interface Languages.
Introduction to PySpark.
How DataFrame Operations Become Spark Jobs.
How Spark Executes a Job.

Running a Spark Application

Running a Spark Application.
Reading data into a Spark SQL DataFrame.
Examining the Schema of a DataFrame.
Computing the Number of Rows and Columns of a DataFrame.
Examining a Few Rows of a DataFrame.
Stopping a Spark Application.

Inspecting a Spark DataFrame

Inspecting a DataFrame.
Inspecting a DataFrame Column.

Transforming DataFrames

Spark SQL DataFrames.
Working with Columns.
Working with Rows.
Working with Missing Values.

Transforming DataFrame Columns

Spark SQL Data Types.
Working with Numerical Columns.
Working with String Columns.
Working with Date and Timestamp Columns.
Working with Boolean Columns.

Complex Types

Complex Collection Data Types.
Arrays.
Maps.
Structs.

User-Defined Functions

User-Defined Functions.
Example 1: Hour of Day.
Example 2: Great-Circle Distance.

Reading and Writing DataFrames

Working with Delimited Text Files.
Working with Text Files.
Working with Parquet Files.
Working with Hive Tables.
Working with Object Stores.
Working with Pandas DataFrames.

Combining and Splitting DataFrames

Combining and Splitting DataFrames.
Joining DataFrames.
Splitting a DataFrame.

Summarizing and Grouping DataFrames

Summarizing Data with Aggregate Functions.
Grouping Data.
Pivoting Data.

Window Functions

Window Functions.
Example: Cumulative Count and Sum.
Example: Compute Average Days Between Rides for Each Rider.

Machine Learning Overview

Introduction to Machine Learning.
Machine Learning Tools.

Apache Spark MLlib

Introduction to Apache Spark MLlib.

Exploring and Visualizing DataFrames

Possible Workflows for Big Data.
Exploring a Single variable.
Exploring a Pair of Variables.

Monitoring, Tuning, and Configuring Spark Applications

Monitoring Spark Applications.
Configuring the Spark Environment.

Fitting and Evaluating Regression Models

Assemble the Feature Vector.
Fit the Linear Regression Model.

Fitting and Evaluating Classification Models

Generate Label.
Fit the Logistic Regression Model.

Tuning Algorithm Hyperparameters Using Grid Search

Requirements for Hyperparameter Tuning.
Tune the Hyperparameters Using Holdout Cross- Validation.
Tune the Hyperparameters Using K-Fold Cross- Validation.

Fitting and Evaluating Clustering Models

Print and Plot the Home Coordinates.
Fit a Gaussian Mixture Model.
Explore the Cluster Profiles.

Processing Text: Fitting and Evaluating Topic Models

Fit a Topic Model Using Latent Dirichlet Allocation.

Fitting and Evaluating Recommender Models

Recommender Models.
Generate Recommendations.

Working with Machine Learning Pipelines

Fit the Pipeline Model.
Inspect the Pipeline Model.

Applying a Scikit-Learn Model to a Spark DataFrame

Build a Scikit-Learn Model.
Apply the Model Using a Spark UDF.

Deploying a Machine Learning Model as a REST API in CML

Load the Serialized Model.
Define a Wrapper Function to Generate a Prediction.
Test the Function.

Autoscaling, Performance, and GPU Settings

Autoscaling Workloads.
Working with GPUs.

Model Metrics and Monitoring

Why Monitor Models?.
Common Models Metrics.
Models Monitoring With Evidently.
Continuous Model Monitoring.

Appendix: Workspace Provisioning

Workspace and Environment.

Open calls

Currently there are no classrooms scheduled for this course