Enterprise Server Issue
Incident Report for TokBox
Postmortem

Summary

On September 10th, while adding capacity to the enterprise environment, three of our new enterprise video servers experienced a network configuration issue. This caused sessions allocated to these three servers to experience connectivity issues. This incident started at 13:40 PST and was resolved by 14:50 PST.

Root Cause

While adding capacity to the enterprise environment, three of our new video servers experienced a network configuration issue. This was due to a specific port not being open to the internet. As a result, any session allocated to one of these three video servers would have experienced connectivity issues.

Timeline

Time Period (PST) Major Incident Milestones
2019-09-10 13:48 PST Issue reported to Support by customer.
2019-09-10 14:36 PST Issue escalated to Engineering by Support team.
2019-09-10 14:50 PST All impacted server configurations updated and issue resolved.
2019-09-10 15:21 PST Incident posted on TokBox Status Page.
2019-09-10 16:33 PST Incident closed on TokBox Status Page.

Preventative Measures

Before adding these three enterprise video servers into rotation, we ensured that our internal spec tests were all successful. However, these tests were run from within the OpenTok infrastructure and did not emulate all possible customer configurations. In lieu of this, we will take adequate measures to broaden our testing infrastructure to emulate additional configurations.

Remediation

As a result of our post-mortem investigation we have identified the following areas of improvement:

  • Expand our spec tests to more accurately simulate a variety of possible configurations
  • Review the security groups which are currently in use
  • Improve monitoring and alerting
  • Adopt a faster response-time to inform customers of such incidents in a timely manner
Posted 10 days ago. Sep 11, 2019 - 16:05 PDT

Resolved
On Sept 10th, 2019, while adding capacity to the enterprise environment, 3 of the new Video Servers experienced a network configuration issue. New sessions allocated to these servers would have experienced connection issues resulting in timeouts.

This incident started at 13:40 PST and was resolved by 14:50 PST.

We are currently working on a detailed post mortem of this incident, and we will communicate this update to our customers once it has been completed.

Please contact support@nexmo.com if you have any questions in the meantime.
Posted 11 days ago. Sep 10, 2019 - 16:33 PDT
Monitoring
On Sept 10th, 2019, we experienced an issue with our Enterprise servers. Existing sessions may have been interrupted. We will share more information as it becomes available. Please contact our Support team if you have any further questions.
Posted 11 days ago. Sep 10, 2019 - 15:21 PDT
This incident affected: Enterprise (Enterprise Video, Enterprise API, Enterprise Broadcast, Enterprise SIP, Enterprise Session Monitoring, Enterprise Archiving).