Background

Previously our Grafana setup hinged on and ec2 instance with a local sqlite db as persistent storage. In an effort to improve our reliability and availability of Grafana, we needed to change this.

Process

We’ve broken the work out into a few migration phases:

Migrate off sqlite db to postgres for a sharable storage backend
Move Grafana to a scalable workload orchestration (ECS)
- Lower TTL for associated records for more responsive switches when a Grafana task has to be cycled out (ECS Service Discovery defaults to 10s which we can match)
Configure high availability (along with unified alerting)
- To de-duplicate alert propagation (Grafana evaluates alert rules on each instance) we will need to properly configure peering for multiple instances dynamically.

Migrating backend to Postgres

Grafana was backed with a local sqlite db which was restricting our ability to scale the application. For a more reliable experience for users,

Migrating Grafana to ECS

Upgrading to Grafana v12

While Grafana does have support for rolling updates, since this was a major version with known breaking changes, and we havent enabled anything yet - I figured it would be smoother to upgrade at this point and then enable HA.

Some notable changes for us in the new version:

Git Sync support (Experimental)
Dashboard schema changes (Experimental)
Drilldown

See the full breakdown of changes here.

🪴 Notes

Explorer

grafana_migration

Background

Process

Migrating backend to Postgres

Migrating Grafana to ECS

Upgrading to Grafana v12

Enabling High Availability

References

Graph View

Table of Contents

Backlinks