Recently, there was a large traffic surge of about 4~5 times the regular traffic in less than an hour, caused by different events and campaigns held in the Asia region. Our Database CPU’s were only setup to anticipate 3~4 times the regular traffic, which consequently lead to this throttling impact.
This sudden surge had suddenly overloaded our Bot and Gateway services, causing degraded performance, and even downtime for certain clients/endusers using our chatbots.
The engineering team quickly upgraded our resources, by increasing the Database CPU threshold to 9~10 times the regular traffic, and horizontally scaling our Kubernetes pods to remediate the processing of operations/requests for the Database.
Since the upgrade, our services had recovered to normal.
This incident had impacted our users for 1 hour, and we sincerely apologize for the sudden inconvenience and impact caused by this occurrence. We promise to be more anticipative, and robust towards such hazards, and potential dangers in the future.
Thank you for your patience and cooperation.