Around 2018-07-12 15:20 PDT, we continued the process of migrating to a more secure and stable API gateway as part of a larger system improvement project. At 2018-07-12 15:34 PDT, we received the first report from a customer that the Enterprise JS SDK was not accessible. At 2018-07-12 15:52 PDT we published in the first incident. At 2018-07-12 16:15 PDT, we pushed a fix but it did not persist. On 2018-07-13 8:28 PDT we created a new incident based on new reports, though the incident/reproducibility of the problem was significantly less. On 2018-07-13 10:23 PDT we resolved the incident.
The outage that happened was a result of migrating to a more secure and stable API gateway. We use multiple external DNS providers and one of them had the incorrect record set. Compounding the problem, the current gateway servers were pointing to an internal DNS that had cached the incorrect record set.
This happened as we are conducting an important project to improve the Enterprise line. We expect to complete that initiative in a week or so. Once it is completed, we'll resume efforts to improve monitoring that could have detected this problem, and automate some some notifications to our status page.