Cloudera

Cloudera Training for Apache Kafka - Virtual English

28 hours
2970,00 €
Live Virtual Class
Live Virtual Class

Description

This four-day instructor-led course begins by introducing Apache Kafka, explaining its key concepts and architecture, and discussing several common use cases. Building on this foundation, you will learn how to plan a Kafka deployment, and then gain hands-on experience by installing and configuring your own cloud-based, multi-node cluster running Kafka on the Cloudera Data Platform (CDP).

You will then use this cluster during more than 20 hands-on exercises that follow, covering a range of essential skills, starting with how to create Kafka topics, producers, and consumers, then continuing through progressively more challenging aspects of Kafka operations and development, such as those related to scalability, reliability, and performance problems. Throughout the course, you will learn and use Cloudera’s recommended tools for working with Kafka, including Cloudera Manager, Schema Registry, Streams Messaging Manager, and Cruise Control.

PUE, Cloudera Strategic Partner, is authorized by this multinational to provide official training in Cloudera technologies.

PUE is also accredited and recognized to carry out consulting and mentoring services in the implementation of Cloudera solutions in the business field with the added value in the practical and business approach to knowledge that is translated in its official courses.

Audience and prerequisites

This course is designed for:

  • System administrators
  • Data engineers
  • Developers

Prerequisites

All students are expected to have basic Linux experience, and basic proficiency with the Java programming language is recommended. No prior experience with Apache Kafka is necessary.

Objectives

Students who successfully complete this course will be able to:

  • Plan, deploy, and operate Kafka clusters
  • Create and manage topics
  • Develop producers and consumers
  • Use replication to improve fault tolerance
  • Use partitioning to improve scalability
  • Troubleshoot common problems and performance issues

Topics

Module 1: Kafka Overview

  • High-Level Architecture
  • Common Use Cases
  • Cloudera's Distribution of Apache Kafka

Module 2: Deploying Apache Kafka

  • System Requirements and Dependencies
  • Service Roles
  • Planning Your Deployment
  • Deploying Kafka Services
  • Exercise: Preparing the Exercise Environment
  • Exercise: Installing the Kafka Service with Cloudera Manager
  • Exercise (optional): Create Metrics Dashboards
  • Exercise (optional): Using the CM API

Module 3: Kafka Command Line Basics

  • Create and Manage Topics
  • Running Producers and Consumers

Module 4: Using Streams Messaging Manager (SMM)

  • Streams Messaging Manager Overview
  • Producers, Topics, and Consumers
  • Data Explorer
  • Brokers
  • Topic Management
  • Exercise: Managing Topics using the CLI
  • Exercise: Connecting Producers and Consumers from the Command Line

Module 5: Kafka Java API Basics

  • Overview of Kafka's APIs
  • Topic Management from the Java API
  • Exercise (optional): Managing Kafka Topics Using the Java API
  • Using Producers and Consumers from the Java API
  • Exercise: Developing Producers and Consumers with the Java API

Module 6: Improving Availability through Replication

  • Replication
  • Exercise: Observing Downtime Due to Broker Failure
  • Considerations for the Replication Factor
  • Exercise: Adding Replicas to Improve Availability

Module 7: Improving Application Scalability

  • Partitioning
  • How Messages are Partitioned
  • Exercise: Observing How Partitioning Affects Performance
  • Consumer Groups
  • Exercise: Implementing Consumer Groups
  • Consumer Rebalancing
  • Exercise: Using a Key to Control Partition Assignment

Module 8: Improving Application Reliability

  • Delivery Semantics
  • Demonstration (optional): ISRs vs. ACKs
  • Producer Delivery
  • Exercise: Idempotent Producer
  • Transactions
  • Exercise: Transactional Producers and Consumers
  • Handling Consumer Failure
  • Offset Management
  • Exercise: Detecting and Suppressing Duplicate Messages
  • Exercise: Handling Invalid Records
  • Handling Producer Failure

Module 9: Analyzing Kafka Clusters with SMM

  • End-to-End Latency
  • Notifiers
  • Alert Policies
  • Use Cases

Module 10: Monitoring Kafka

  • Monitoring Overview
  • Monitoring using Cloudera Manager
  • Charts and Reports in CM
  • Monitoring Recommendations
  • Metrics for Troubleshooting
  • Diagnosing Service Failure
  • Exercise: Monitoring Kafka

Module 11: Managing Kafka

  • Managing Kafka Topic Storage
  • Demonstration (optional): Message Retention Period
  • Log Cleanup and Collection
  • Rebalancing Partitions
  • Cruise Control
  • Exercise: Installing Cruise Control
  • Exercise: Troubleshooting Kafka Topics
  • Unclean Leader Election
  • Exercise: Unclean Leader Election
  • Adding and Removing Brokers
  • Exercise: Adding and Removing Brokers
  • Best Practices

Module 12: Message Structure, Format, and Versioning

  • Message Structure
  • Schema Registry
  • Defining Schemas
  • Schema Evolution and Versioning
  • Schema Registry Client
  • Exercise: Using an Avro Schema

Module 13: Improving Application Performance

  • Message Size
  • Batching
  • Compression
  • Exercise: Observing How Compression Affects Performance

Module 14: Improving Kafka Service Performance

  • Performance Tuning Strategies for the Administrator
  • Cluster Sizing
  • Exercise: Planning Capacity Needed for a Use Case

Module 15: Securing the Kafka Cluster

  • Encryption
  • Authentication
  • Authorization
  • Auditing

Open calls