Observability at Scale
For teams that already have Grafana running and want to mature their practice: SLOs, stakeholder dashboards, infrastructure-as-code, and incident response that actually works.
This page describes a typical starting point for teams looking to mature their observability practice. The actual content is shaped around your current setup, your biggest pain points, and what good looks like for your organisation. No two sessions are the same.
What this training covers
Getting Grafana running is the easy part. Getting it to actually serve your engineering organisation, your product teams, and your stakeholders is a different problem entirely. This training is for teams that have passed the basics and are now dealing with the harder questions: how do you manage dashboards at scale without them becoming a mess? How do you define SLOs that mean something? How do you build alerting that people actually trust?
By the end of the day you’ll have a clearer picture of where your observability practice stands, what to fix first, and how to automate the parts that don’t need to be manual.
Agenda
- Assessing observability maturity across teams
- Establishing SLOs and golden signals
- Dashboards for stakeholders: exec, product, and engineering views
- Automating Grafana with Terraform and provisioning APIs
- Alert routing, runbook automation, and incident reviews
Labs
- Build team-level observability scorecards
- Provision dashboards-as-code across environments
- Simulate incident response with observability-driven decisions
Who this is for
Platform teams and SREs who own the observability stack and want to scale it properly. Engineering leads who need dashboards that work for audiences beyond their own team. Teams that have grown past ad-hoc alerting and need a more structured approach to incidents.
This training is built around your situation
Before the session, I want to understand where you are. How mature is your current observability practice? Where are the gaps? Which teams are struggling most? The agenda above is a guide, not a fixed program. If Terraform provisioning is already sorted but incident response is a mess, we spend more time there.
Get in touch at info@obserfana.com to talk through what makes sense for your team.