Resolved -
The backlog of events has been processed and any new data coming in will be processed in near real time.
TL;DR; Aside from the data backlog (peaked at 2 hours delay), and a ~10% error rate for loading Segments of users (in the Chameleon dashboard; no impact for your end-users), it was a relatively normal day at Chameleon in terms of the "rate of experiences delivered" for the baseline. Chameleon delivered approx. 200k downtime notices
I hope today was not too stressful for you / your business and as always feel free to reach out with questions
Oct 20, 19:13 PDT
Update -
According to AWS "AWS services which rely on EC2 instance launches such as [Heroku] are working through their backlog of EC2 instance launches successfully and we anticipate full recovery of the backlog over the next two hours" <= for Chameleon this means out data backlog should start processing more quickly automatically over the same time frame
Oct 20, 15:12 PDT
Update -
Our current application / background servers are running as-is but our ability to scale our cluster is impacted. We will need additional resources provisioned within the next few hours to bring the backlog back into a manageable length. We will continue to monitor and implement any fixes that we can along the way and will plan to send another update within the next 2-3 hours
Oct 20, 09:21 PDT
Update -
Updating to have a "Minor" impact
Oct 20, 06:48 PDT
Monitoring -
During the AWS outage, Chameleon remained online and was not directly affected by the ongoing issues. All Chameleon Experiences were published on schedule, and new experiences went live as expected. Many customers also used Chameleon to notify *their customers* via in-product banners linking to their own status pages.
During this period, we ingested and queued a large volume of data — typical for EU and US Eastern morning hours. However, today our auto-scaling was unable to bring additional resources online due to the AWS and Heroku issues. As a result, we are currently processing a backlog of data and background jobs. The current processing delay is approximately 30 minutes (Experience metrics are delayed by this amount, but microsurvey responses are not). This backlog will decrease quickly once auto-scaling resumes normal operation.
Oct 20, 06:48 PDT