Description
This course teaches participants techniques for monitoring and improving infrastructure and application performance in Google Cloud.
Using a combination of presentations, demos, hands-on labs, and real-world case studies, attendees gain experience with full-stack monitoring, real-time log management and analysis, debugging code in production, tracing application performance bottlenecks, and profiling CPU and memory usage.
Audience and prerequisites
This class is intended for the following job roles:
- Cloud Architects, Administrators and SysOps personnel.
- Cloud Developers and DevOps personnel.
Prerequisites:
To get the most of out of this course, participants should have:
- Google Cloud Platform Fundamentals: Core Infrastructure or equivalent experience.
- Basic scripting or coding familiarity.
- Proficiency with command-line tools and Linux operating system environments.
Objectives
This course teaches participants the following skills:
- Explain the purpose and capabilities of Google Cloud’s operations suite.
- Implement monitoring for multiple cloud projects.
- Create alerting policies, uptime checks and alerts.
- Install and manage Ops Agent to collect logs for Compute Engine.
- Explain Cloud Operations for GKE.
- Analyze VPC Flow Logs and firewall rules logs.
- Analyze and export Cloud Audit Logs instances.
- Profile and identify resource-intensive functions in an application.
- Analyze resource utilization cost for monitoring related components within Google Cloud.
Topics
Module 1: Introduction to Google Cloud Operations Suite.
- Describe the purpose and capabilities of Google Cloud’s operations suite.
- Explain the purpose of the Cloud Monitoring tool.
- Explain the purpose of Cloud Logging and Error Reporting tools.
- Explain the purpose of Application Performance Management tools.
Module 2: Monitoring Critical Systems.
- Use Cloud Monitoring to view metrics for multiple cloud projects.
- Explain the di?erent types of dashboards and chars that can be built.
- Create an uptime check.
- Explain the cloud operations architecture.
- Explain and demonstrate the purpose of using Monitoring Query Language (MQL) for monitoring.
Module 3: Alerting Policies.
- Explain alerting strategies.
- Explain alerting policies.
- Explain error budget.
- Explain why server-level indicators (SLIs), service-level objectives (SLOs), and service-level agreements (SLAs) are important.
- Identify types of alerts and common uses for each.
- Use Cloud Monitoring to manage services.
Module 4: Advanced Logging and Analysis.
- Use Log Explorer features.
- Explain the features and benefits of logs-based metrics.
- Define log sinks (inclusion filters) and exclusion filters.
- Explain how BigQuery can be used to analyze logs.
- Export logs to BigQuery for analysis.
- Use log analytics on Google Cloud.
Module 5: Working with Audit Logs.
- Explain Cloud Audit Logs.
- List and explain di?erent audit logs.
- Explain the features and functionalities of the di?erent audit logs.
- List the best practices to implement audit logs.
Module 6: Configuring Google Cloud Services for Observability.
- Use the Ops Agent with Compute Engine.
- Enable and use Kubernetes Monitoring.
- Explain the benefits of using Google Cloud Managed Service for Prometheus.
- Explain the usage of PromQL to query Cloud Monitoring metrics.
- Explain the uses of Open Telemetry.
- Explain custom metrics.
Module 7: Monitoring Google Cloud Network and Data Access.
- Collect and analyze VPC Flow Logs and firewall rules logs.
- Enable and monitor Packet Mirroring.
- Explain the capabilities of the Network Intelligence Center.
Module 8: Investigating Application Performance Issues.
- Explain the features and benefits of Error Reporting, Cloud Trace and Cloud Profiler.
- Explain the functionalities of the Error Reporting, Cloud Trace and Cloud Profiler.
Module 9: Optimizing the Costs for Operations Suite.
- Analyze resource utilization cost for monitoring related components within Google Cloud.
- Implement best practices for controlling the cost of monitoring within Google Cloud.