Post Mortems Post Mortem - July 06, 2024 Incident: Main Database server unavailable. Affected services: All web services hosted by Vultr and required a Database connection to function. Incident start: 12:15 pm ART - 05:15 pm NZST Incident end: 07:24 pm ART -  12:24 am NZST Resolution steps: 15 minutes after the incident started, the team got notified by Vultr about the outage. 2 hours into the outage the team opened the ticket with ID BJP-53CGO. 7 hours into the outage the team observed the Database with status "RUNNING" and proceeded to configure the Firewall and internal routing to getting it working again. At 19:24 ART, the connection to all site was restored. 27 hours after the incident started we got a reply from Vultr stating: "The host node on which your instance was previously located failed, necessitating a manual recovery of the data with the assistance of our onsite engineer. Following the recovery, the instance was migrated to a healthy node. Unfortunately, this process took longer than expected." Mitigation steps: The team informed all clients about the issue. Improvements and de-risking solutions: The team configured a second Database server within Vultr with replication to the main Database. The team defined and consolidated SOPs for switching Databases in case of a new outage.