• Online, Self-Paced
Course Description

Site reliability engineers (SREs) are typically good problem solvers. They need to think logically to identify problems, correct them, and prevent them from happening again.

In this course, you'll explore several built-in and open-source troubleshooting tools SREs can use for resolving system issues. You'll start by examining the techniques of logging and whitebox and blackbox monitoring used to monitor system events. You'll then work with the various built-in Windows troubleshooting tools, namely the Event Viewer, Resource Monitor, and System Information tools.

Next, you'll use Google Cloud Dataflow to process logs, before outlining the purpose and benefits of the StatsD standard and the /api/search endpoint. Lastly, you'll identify how Google's Dapper is used for troubleshooting distributed systems, and the open standards tool, Prometheus, for instrumenting software and exposing metrics.

Learning Objectives

{"discover the key concepts covered in this course"}

Framework Connections

The materials within this course focus on the Knowledge Skills and Abilities (KSAs) identified within the Specialty Areas listed below. Click to view Specialty Area details within the interactive National Cybersecurity Workforce Framework.