(Resolved) Event Notification: ORD01/ALO01 (Chicago/Cedar Falls) Latency with BGP Peering 22-MAR-2010

EVENT START DATE: 3/22/2010
EVENT START TIME: 07:00 CST
EVENT END DATE: 3/23/2010
EVENT END TIME: 11:00 CST
EVENT IMPACT: Latency and/or loss of connectivity
EVENT DETAILS:

FINAL EVENT UPDATE 3/30/2010 10:00 CST:  On Monday 3/22/2010 at approximately 07:00 CST, some customers in the Cedar Falls and Chicago Data Centers began reporting higher than normal latency and file transfer times.  LT Network Engineering isolated the problem to one of our data center connectivity providers.  This condition caused instability in portions of our routing infrastructure and resulted in loss of connectivity for some users in the Midwest area of the LT Network.  Our carrier identified the source of the problem and replaced some faulty interconnection cabling that restored connectivity to all affected users.  We will continue to work with our connectivity providers to clarify problem isolation processes, to identify and strengthen escalation procedures and associated metrics and to implement regular review of all key performance indicators.

EVENT UPDATE 23-MAR-10 16:15 CDT: Following a period of monitoring of customer connectivity and ticket volume by Operations and of the BGP Peering session with Internap, Network Services is prepared to call this situation resolved.  All customer tickets opened and all lingering routing anomalies as a result of this issue have been resolved and the network has regained stability.  We reiterate our gratitude to our customers for their patience and understanding during this process.

EVENT UPDATE 23-MAR-10 14:30 CDT:  Following the activation of BGP on the newly repaired Internap circuit, all outstanding intermittent routing and connectivity issues in the Chicago an Cedar Falls LT Data Centers have been resolved.  The resolution of these issues suggests an underlying routing issue at Internap as well as physical-layer connection problem that was isolated and resolved earlier in the day.  Operations and Network Engineering are monitoring customer connectivity and the status of the BGP peering session, respectively.  We appreciate the patience and understanding of our customers during this process.

EVENT UPDATE 23-MAR-10 10:45 CDT:  As a result of our investigation with Internap, they have discovered and repaired our primary routing connection.  Network Services believes that this is and has been a primary contributor to the routing anomalies we have experienced over the past 36 hours.  Because of our strong desire to recover the service, we will be enabling BGP routing on that repaired circuit in an emergency window in the next 15 minutes.  While this may cause some momentary latency as routes re-converge, this will be a transitory experience and should result in significantly better connectivity for LT customers in Cedar Falls and Chicago.  We appreciate your patience as we work to resolve this issue permanently.

EVENT UPDATE 23-MAR-10 05:30 CDT:  In an attempt to stabilize the networks in Cedar Falls and Chicago and to restore full connectivity to all IP subnets, the Network Services Team performed several intrusive hardware testing procedures on devices in the Chicago network.  The results of these maintenance events suggest that most of the routing anomalies have been identified and resolved.  However, there is a component of variability that exists as a result of our interaction with Internap.  We have escalated these residual issues to Internap management and will pursue them to final resolution. We appreciate your patience during this entire process.  An update will be posted as additional information is available.

EVENT UPDATE 22-MAR-10 22:15 CDT:  In an attempt to stabilize the networks in Cedar Falls and Chicago and to restore full connectivity to all IP subnets, the Network Services Team will perform some emergency intrusive hardware testing of one of the devices in Chicago on 23-MAR-2010 at 00:00.  During this time, we expect customer connectivity to experience brief periods of unavailability as the testing progresses, with a total disruption of 20 minutes or less. We appreciate your patience while we complete this emergency work tonight.  An update will be posted as we have additional information

EVENT UPDATE 22-MAR-10 18:55 CDT: Network Services continues to monitor the overall status of the Chicago and Cedar Falls networks.  The Network Services Team is aware of routing issues reported by customers in the 74.200.0.0 network.  We are working to identify any common elements of these networks and to identify a solution to these issues.  We appreciate your patience as we work to identify the source of the routing anomalies.  An update will be posted as we have additional information

EVENT UPDATE 22-MAR-10 16:45 CDT: Network Services continues to monitor the overall status of the Chicago and Cedar Falls networks.  The issue that impacted customers with route latency is, as previously report, largely resolved.  There are continued reported issues with customers with services which reside in the 74.200.0.0 network when traversing Internap.  We are working with Internap to address these issues.  Thank you for your continued patience as we work through these remaining issues.  An update will be posted as we have additional information.

EVENT UPDATE 22-MAR-10 13:15 CDT: Network Services has completed the transition of routing for the Cedar Falls location which thus far has had an improvement in route stability.  The overall situation has stabilized in Cedar Falls, although our Network Engineering team continues to monitor the situation and will provide updates to this notification posting as appropriate.

EVENT UPDATE 22-MAR-10 12:15 CDT:  At this time we have no substantial updates to provide.  Our actions to resolve the underlying route stability issues are still underway.  As soon as those actions are completed we will update this notification.

EVENT UPDATE 22-MAR-10 11:30 CDT:  Our Network Engineering team continues to work to resolve the connectivity stability with upstream providers.  We continue to have reports of inbound routing issues that appear to be largely isolated to our Internap connections.  We are working with that provider while we take additional measure to isolate our upstream peering.  We will have an additional status update in approximately 45 minutes.

EVENT UPDATE 22-MAR-10 10:00 CDT:  The situation continues to impact connectivity stability in Cedar Falls on a sporadic basis.  Network Engineering is taking additional steps to eliminate a potentially problematic upstream connection from the provider mix in order to return stability and throughput to the site.  We will have an additional update shortly.

EVENT UPDATE 22-MAR-10 10:00 CDT: At this time our Network Engineering team continues to work to resolve lingering issues which are impacting connectivity in our Cedar Falls location.  The Chicago network is largely stabilized, but customers with services in both Chicago and Cedar Falls may be experiencing sporadic issues.

Another status update will be posted at 10:30CDT.

EVENT UPDATE 22-MAR-10 09:15 CDT:  LT Network Engineering identified the condition with Internap’s upstream connectivity which was impacting routing and throughput to our Chicago and Cedar Falls locations.  Customers in Cedar Falls and a portion of Chicago would’ve experienced additional increased latency while BGP routing updates were applied to take advantage of alternate paths.  Our Network Engineering team is continuing to monitor overall network performance for the impacted sites to ensure operations return to normal.

We will update this informational posting at 10:00 CDT with any additional information.

ORIGINAL POST: Layered Technologies Network Engineering has detected a routing anomaly with Internap, our peering partner in Chicago.  While there is no actual outage, this condition may result in latency for some customers in the LT Chicago and Cedar Falls data centers.  Network Engineering is actively engaged with Internap to isolate and resolve this condition.

Comment are closed.