# Post Mortems

# Post Mortem - July 06, 2024

### Incident:

Main Database server unavailable.

### Affected services:

All web services hosted by [Vultr](https://vultr.com) and required a Database connection to function.

### Incident start:

12:15 pm ART - 05:15 pm NZST

### Incident end:

07:24 pm ART - 12:24 am NZST

### Resolution steps:

1. 15 minutes after the incident started, the team got notified by [Vultr](https://vultr.com) about the outage.
2. 2 hours into the outage the team opened the ticket with ID BJP-53CGO.
3. 7 hours into the outage the team observed the Database with status "RUNNING" and proceeded to configure the Firewall and internal routing to getting it working again.
4. At 19:24 ART, the connection to all site was restored.
5. 27 hours after the incident started we got a reply from [Vultr](https://vultr.com) stating:  
    "The host node on which your instance was previously located failed, necessitating a manual recovery of the data with the assistance of our onsite engineer. Following the recovery, the instance was migrated to a healthy node. Unfortunately, this process took longer than expected."

### Mitigation steps:

1. The team informed all clients about the issue.

### Improvements and de-risking solutions:

1. The team configured a second Database server within [Vultr](https://vultr.com) with replication to the main Database.
2. The team defined and consolidated SOPs for switching Databases in case of a new outage.