A large traffic surge of about 100~150 times the regular traffic was detected recently, causing majority of our systems to crash and restart repetitively. The auto-scaling on our Kubernetes CPU was not able to accommodate to such acceleration and volume of requests, causing our pods to remain affected after the surge.
We are also investigating into the possibility of an infinite loop/recurrence/memory leak, as well as attacks.
The engineering team quickly remediated the issue by killing, and restarting all affected pods while investigating into the issue.
We are currently implementing a solution on our auto-scalers to behave more effectively in such events.
We apologize for the negative impacts caused by this sudden event, and the engineering team will continue to monitor, and resolve the areas of concern.