Livechat partial outage and degraded performance
Incident Report for Proto
Postmortem

The release of June 13, 2023 included additions to several API’s and SQL queries that ended up throttling our Gateway, and Gateway websocket services. Depending on the user, this throttling resulted in degraded performance, or downtime for our livechat feature.

The deployment for a fix resulted in a large spike of Database CPU usage, resulting in chatbots to stop working for about 10~15 minutes, effecting clients and endusers. We have investigated, and suspect that the throttling had built up a large number of requests and processes over time, and the deployment of the fix had caused a second throttle of the built up processes during its restart, resulting in a total timeout.

During this downtime, we doubled our DB storage, and CPU resources to make sure that the same symptom does not occur during our releases.

The issue with livechat however, remained persistent. The issue was not replicable in our previous phases of testing environments, and we have reverted the platform to its previous version before our release on June 13, 2023 until we can better address this issue.

We will update our testing environments to replicate the issues raised in this incident, and hope to deliver our features in a more safe, reliable manner.

We deliver our sincerest apologies for the incident, and will make sure that the same issue does not happen again.

Posted Jun 14, 2023 - 09:25 PDT

Resolved
We have confirmed that all services are back to normal, and operational. We apologize for the long inconvenience.
Posted Jun 14, 2023 - 06:18 PDT
Monitoring
We have reverted to an older version of our platform, and are receiving reports that everything is back to normal. We will continue monitoring the issue.
Posted Jun 14, 2023 - 06:03 PDT
Update
We have identified that users with large number of conversations may not be able to access livechat and its functionalities. We are currently working on addressing this issue.
Posted Jun 14, 2023 - 03:37 PDT
Update
We are still working on a fix. Thank you for your patience.
Posted Jun 13, 2023 - 15:27 PDT
Identified
We have identified an issue, and currently working on a fix. Thank you for your patience.
Posted Jun 13, 2023 - 14:07 PDT
Update
We are still investigating into the issue to identify the cause. Thank you for your patience.
Posted Jun 13, 2023 - 11:22 PDT
Investigating
Livechat is having degraded performance. Users may be experiencing slowness.
Posted Jun 13, 2023 - 08:09 PDT
This incident affected: Platform (Livechat).