High error rates in Chameleon APIs

Incident Report for Chameleon

Resolved

A series of overlapping updates flooded a specific write queue rendering it 'blocked' until the writes completed, meanwhile a failover caused all of these writes to retry all at once when the new instance came back online.

Posted Feb 03, 2021 - 11:30 EST

Update

Database cluster health has returned to normal. We will continue to monitor this incident. Will post a more in-depth look at this when we have a more complete idea of the root cause.

Posted Feb 03, 2021 - 10:45 EST

Monitoring

Our database appears to have failed over two times back to back and is taking some time to fully warm a new node for the replica set

Posted Feb 03, 2021 - 10:23 EST

Investigating

We have seen increased error rates in responding to requests to identify users, loading the Chameleon Builder etc

Posted Feb 03, 2021 - 10:19 EST

This incident affected: User profile API, Sidebar API, and Data API.