Executive summary
On 15 April, a network switch failure at our hosting provider’s data centre caused a disruption impacting all MatchMakers. Customers were unable to establish new VPN connections. Existing VPN connections were not impacted and remained active throughout the incident.
The service degradation lasted approximately five hours, from 18:50 to 00:10 (UTC+3), after which the platform resumed normal operation.
The issue was detected via automated alerts shortly after onset. On‑call engineers responded immediately, escalated the incident to the hosting provider, and worked to stabilise the service. Customer updates were provided via the public status page and email communications.
The root cause was a network switch failure at the hosting provider’s data centre. This led to excessive reconnection attempts that exceeded other servers capacity for new connections. To prevent recurrence, we are eliminating single points of failure through duplicated connectivity, increasing system capacity and redundancy, and improving failover behaviour and escalation processes with the hosting provider.
Root Cause Analysis report
All systems were running normally until the incident.
Fault
At 18:52 (UTC+3), a network switch at the hosting data centre failed, isolating a critical MatchMaker server and blocking all traffic to and from it. Clients attempted to reconnect to other MatchMaker (MM) servers, triggering a sudden load spike. Those servers became unresponsive in sequence, resulting in all MM servers becoming unavailable.
Impact
Following the switch failure, remaining MM servers received a surge of reconnection attempts and progressively hung. Active MM connections dropped until recovery commenced about five hours later, once the isolated server was restored.
Customers could not start new VPN connections during the outage. Existing connections stayed up. Impact was greater for US-based users due to local business hours.
Timeline
18:52 – Network switch failed; critical MatchMaker server isolated (UTC+3).
19:00 – Monitoring alert to on‑call; investigation started (UTC+3).
19:27 – Public status page reported widespread MatchMaker unavailability (UTC+3).
22:43 – Customer notification sent via email (UTC+3).
23:08 – Network configuration corrected on the critical server (UTC+3).
23:56 – All servers accepting connections again (UTC+3).
00:10 – Connections stabilised; public status page updated (UTC+3).
00:16 – Recovery notification emailed to customers (UTC+3).
2026-04-16 (UTC+3)
08:45 – Public status page incident closed (UTC+3).
Follow-up
Capacity & resilience: Add additional MM capacity and headroom; improve load balancing.
Failover: Improve network connectivity; validate automatic failover under load.
Process & comms: Speed up external comms and clarify roles and processes; ensure timely status page and email updates.