All Systems Operational
Boundary Premium Service ? Operational
Boundary Plugins Repository ? Operational
Boundary Enterprise Service ? Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Past Incidents
Mar 4, 2015
At 1:30 PM CST today, several of the services in the backend re-started themselves due to a race condition. The display of data was temporarily affected, but there was no data loss. The services came back online and the system is currently running smoothly. There was no data loss in either this disruption or the previous disruption.
Mar 4, 14:19 CST
At 12:00 PM today, several of the services in the backend re-started themselves due to a race condition. The display of data was temporarily affected, but there was no data loss. The services came back online and the system is currently running smoothly.
Mar 4, 12:12 CST
Mar 3, 2015
Resolved - While we are continuing to make updates in order to improve the recoverability of the backend components, we believe that we have resolved the primary issues which caused this issue in the first place. As previously stated, it centered around timing issues between the backend services and was exacerbated by an issue in the Linux kernel. We have updated both components and have seen no service interruption since the components were updated.
Mar 3, 15:44 CST
Update - For those who have found the status updates less than illuminating, I apologize. The issue that we have been dealing with is one which stems from a kernel issue in the operating system. It exacerbated a timing issue within the product which caused the backend data store to get out of sync. We spent the entire day getting the system to the point where the data was not out of sync. We should have seen little to no data loss during this outage period because of the queueing technology (Kafka) that we had recently implemented to prevent data loss. We understand that this is irrelevant for the purposes of our customers as the real time nature of Boundary is its chief selling point, but I do want all of the customers to understand that this was an anomaly and not a fatal product flaw. We will continue to closely monitor the system for the rest of the week. We have and will continue to modify our test bed to be able to simulate the conditions which precipitated this outage in an effort to put even more preventative measures in place to avoid the type of situation in which we found ourselves yesterday.
Mar 3, 08:38 CST
Update - The read path is back online. We will continue to monitor the system.
Mar 3, 07:35 CST
Monitoring - The system is catching up on a day's worth of data. We will turn the read path on in the morning in order to allow the backend a chance to process all data overnight. At that point we will systematically check the dashboards and make sure that the OS fixes and the application fixes have increased the system's stability.
Mar 2, 23:42 CST
Update - There are still issues with the connectivity between the Boundary services which are currently being investigated. For the moment, the read path has been disabled in order to enable faster processing of the data collected today. We will continue to update this incident as the situation changes
Mar 2, 18:06 CST
Update - All of the instances have been updated and the system is being rolled to capture backlog data
Mar 2, 16:23 CST
Update - The problem seems to be related to a kernel patch which came out on February 24th dealing with buffer overruns in TCP/IP. We are currently applying this patch and will see if it has the intended effect.
Mar 2, 15:54 CST
Investigating - We are investigating a problem with the dashboards not keeping up with the backend system. The problem is not on the data capture side and no data is being lost. The issue is in the API connection to the backend. We are resolving this issue at the moment and expect the system to be back to functioning at 100% in a short period of time.
Mar 2, 10:26 CST
Mar 2, 2015
Resolved - This incident has been resolved.
Mar 2, 10:24 CST
Monitoring - A fix has been implemented and we are monitoring the results.
Feb 26, 01:55 CST
Identified - The issue has been identified and a fix is being implemented.
Feb 26, 01:36 CST
Investigating - We are currently investigating this issue.
Feb 26, 01:15 CST
Monitoring - A fix has been implemented and we are monitoring the results.
Feb 24, 19:00 CST
Investigating - Our Premium service is currently experiencing data issues. We are actively triaging the incident.
Feb 24, 18:48 CST
Mar 1, 2015

No incidents reported.

Feb 28, 2015

No incidents reported.

Feb 27, 2015

No incidents reported.

Feb 26, 2015
Resolved - System almost caught up
Feb 26, 14:38 CST
Investigating - There are issues with backend components deadlocking. We are investigating and solving the problem
Feb 26, 14:36 CST
Feb 25, 2015

No incidents reported.

Feb 23, 2015

No incidents reported.

Feb 22, 2015

No incidents reported.

Feb 21, 2015

No incidents reported.

Feb 20, 2015

No incidents reported.

Feb 19, 2015

No incidents reported.

Feb 18, 2015

No incidents reported.