Platform Slowness
Incident Report for Proto
Postmortem

Our platform experienced performance issues due to a sudden spike in database traffic. This led to slow response times, connection timeouts, and service disruptions for users.

The root cause was excessive concurrent queries that pushed the database CPU usage to 99%, causing operations to slow down significantly. As a result, the number of active connections exceeded the maximum limit, forcing users into a queue and leading to further overload. We also observed QueuePool limit overflow, indicating that the system could not handle the surge in requests efficiently.

To resolve the issue, we restarted the database to clear the backlog of connections and applied a fix to optimize connection handling. Additionally, we adjusted database settings and implemented temporary traffic throttling to stabilize performance.

Moving forward, we will optimize connection pooling, improve query execution, and enhance monitoring to detect early signs of overload. The platform is now stable, and we will continue monitoring to prevent similar incidents.

Posted Feb 11, 2025 - 00:50 PST

Resolved
βœ”οΈ Our team identified the root cause and applied a fix.
πŸ“ˆ Performance has returned to normal levels.
πŸ” We have completed monitoring and confirmed system stability.
Posted Feb 11, 2025 - 00:35 PST
Monitoring
βœ… Our team has identified the root cause and applied a fix.
πŸ” We are actively monitoring system performance to ensure stability.
πŸ“Š Initial signs indicate improved response times, but we will continue to observe closely.
Posted Feb 11, 2025 - 00:13 PST
Identified
The issue has been identified, and our engineers are working on mitigating the impact. We are monitoring system performance closely and will provide updates as progress is made.
Posted Feb 10, 2025 - 23:59 PST
Update
The issue has been identified, and our engineers are working on mitigating the impact. We are monitoring system performance closely and will provide updates as progress is made.
Posted Feb 10, 2025 - 23:58 PST
Investigating
We are currently experiencing slowness on the platform, which may affect response times and overall performance. Some users may notice delays in loading pages, executing actions, or processing requests. Our team is actively investigating the root cause and working on a resolution.
Posted Feb 10, 2025 - 23:51 PST
This incident affected: Inbox, Chatbots, and Livechats.