Description
This training teaches participants the basics of Apache Kudu, a data storage system for the Hadoop platform that is optimized for analytical queries. The course covers common Kudu use cases and Kudu architecture.
This course enables participants to create, manage, and query Kudu tables, and to develop Spark applications that use Kudu.
After taking this course, participants will be prepared to differentiate with other storage systems, use cases that best implemented with Kudu, design tables that will store data for optimum performance, principal data management techniques, develop Apache Spark applications with Kudu.
PUE, Cloudera Strategic Partner, is authorized by this multinational to provide official training in Cloudera technologies.
PUE is accredited and recognized for realize consulting services and mentoring on implementing Cloudera solutions in business environment with the added value in the practical business-centred focus of knowledge that is transfer to the official courses.
Audience and prerequisites
This training is designed for people involved with either software development or data analysis, like software developers, data engineers, DBAs, data scientists, and data analysts.
- Students should know SQL.
- Familiarity with Impala is preferred but not required.
- Students should also know how to develop Apache Spark applications using either Python or Scala.
- Basic Linux experience is expected.
Objectives
Through instructor-led discussion, as well as hands-on exercises, participants will learn topics including:
- A high-level explanation of Kudu.
- How does it compares to other relevant storage systems and which use cases would be best implemented with Kudu.
- Learn about Kudu’s architecture as well as how to design tables that will store.
- data for optimum performance.
- Learn data management techniques on how to insert, update, or delete records from Kudu tables using Impala, as well as bulk loading methods.
- Finally, develop Apache Spark applications with Apache Kudu.
Topics
Introduction
Overview and Architecture
- What Is Kudu?
- Why Use Kudu?
- Kudu Use Cases
- Architecture Overview
- Kudu Tools
- Essential Points
Apache Kudu Tables
- Kudu Tables
- Data Storage Options
- Designing Schemas
- Partitioning Tables for Best Performance
- Using Kudu Tools with Tables
- Essential Points
Using Apache Kudu with Apache Impala
- Apache Impala Overview
- Creating and Querying Tables
- Deleting Tables
- Loading and Modifying Data in Kudu Tables
- Defining Partitioning Strategy
- Essential Points
Developing Apache Spark Applications with Apache Kudu
- Apache Spark and Apache Kudu
- Kudu, Spark SQL, and DataFrames
- Managing Kudu Table Data with Scala
- Creating Kudu Tables with Scala
- Essential Points
Conclusion