The release of June 13, 2023 included additions to several API’s and SQL queries that ended up throttling our Gateway, and Gateway websocket services. Depending on the user, this throttling resulted in degraded performance, or downtime for our livechat feature.
The deployment for a fix resulted in a large spike of Database CPU usage, resulting in chatbots to stop working for about 10~15 minutes, effecting clients and endusers. We have investigated, and suspect that the throttling had built up a large number of requests and processes over time, and the deployment of the fix had caused a second throttle of the built up processes during its restart, resulting in a total timeout.
During this downtime, we doubled our DB storage, and CPU resources to make sure that the same symptom does not occur during our releases.
The issue with livechat however, remained persistent. The issue was not replicable in our previous phases of testing environments, and we have reverted the platform to its previous version before our release on June 13, 2023 until we can better address this issue.
We will update our testing environments to replicate the issues raised in this incident, and hope to deliver our features in a more safe, reliable manner.
We deliver our sincerest apologies for the incident, and will make sure that the same issue does not happen again.