Intermediate to Advanced • 1 day

Observability at Scale

For teams that already have Grafana running and want to mature their practice: SLOs, stakeholder dashboards, infrastructure-as-code, and incident response that actually works.

Format: In-person or remote workshop

This page describes a typical starting point for teams looking to mature their observability practice. The actual content is shaped around your current setup, your biggest pain points, and what good looks like for your organisation. No two sessions are the same.

What this training covers

Getting Grafana running is the easy part. Getting it to actually serve your engineering organisation, your product teams, and your stakeholders is a different problem entirely. This training is for teams that have passed the basics and are now dealing with the harder questions: how do you manage dashboards at scale without them becoming a mess? How do you define SLOs that mean something? How do you build alerting that people actually trust?

By the end of the day you’ll have a clearer picture of where your observability practice stands, what to fix first, and how to automate the parts that don’t need to be manual.

Agenda

Assessing observability maturity across teams
Establishing SLOs and golden signals
Dashboards for stakeholders: exec, product, and engineering views
Automating Grafana with Terraform and provisioning APIs
Alert routing, runbook automation, and incident reviews

Labs

Build team-level observability scorecards
Provision dashboards-as-code across environments
Simulate incident response with observability-driven decisions

Who this is for

Platform teams and SREs who own the observability stack and want to scale it properly. Engineering leads who need dashboards that work for audiences beyond their own team. Teams that have grown past ad-hoc alerting and need a more structured approach to incidents.

This training is built around your situation

Before the session, I want to understand where you are. How mature is your current observability practice? Where are the gaps? Which teams are struggling most? The agenda above is a guide, not a fixed program. If Terraform provisioning is already sorted but incident response is a mess, we spend more time there.

Get in touch at info@obserfana.com to talk through what makes sense for your team.

Who should attend

Platform teams, SREs, and engineering leads responsible for observability across multiple teams or services.

Prerequisites

Hands-on experience with Grafana and at least one data source. Familiarity with basic alerting concepts.

Key outcomes

Assess and improve observability maturity across your engineering organisation
Define meaningful SLOs and golden signals that map to real user experience
Manage Grafana at scale using Terraform and provisioning APIs
Build alert routing and runbook automation that speeds up incident response