|
|
|
|
|||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
![]() |
|
|
Thread Tools | Rate Thread | Display Modes |
|
#1
|
|||
|
Virginia Datacenter outage 2006-06-25
Dear Virginia Datacenter Customers,
We have been having problems with some of our main routers in our Virginia Datacenter. The cause seems to be started with either a power surge or a power loss to our main routers starting at approximately 12:05 CDT. ServerBeach.com and our ticketing system are back online. If you have any questions please feel free to open a ticket. We will post an update and post-mortem as soon as possible both here and in our ticketing system for affected customers. In addition, if you would like to speak to a technician, please feel free to call us at 800-741-9939 option 1. We apologize for the disruption of service and hope to have all the issues resolved ASAP. Regards, Robert Miggins ServerBeach Vice President |
|
#2
|
|||
|
Dear ServerBeach Customer,
Background: Over the past week, we have been conducting our annual maintenance on our power infrastructure in our Virginia data center. This began on Thursday June 22nd and was completed on Sunday, June 25th. What Happened? Starting at approximately 11:53 am CDT on Sunday June 25th there was an anomaly in the power feeding the network equipment and some customer servers. The power anomaly happened 30 minutes after the ATS (Automatic Transfer Switch) was put into bypass mode to do testing on the system. We are still investigating the true cause, but we strongly suspect a power frequency fluctuation as the catalyst. (The other possibility is a possible lightning strike from severe thunderstorms which were moving through Northern Virginia at that time.) Our UPS is in place, but was not able to smooth out or eliminate the power fluctuation as expected. We are conducting further investigation as to the true cause of the fluctuation and will post additional details as they arise. What happened next? The power fluctuation caused both core routers and both distribution routers (which are paired for redundancy) to simultaneously reboot. Both core routers (COR1 and COR2) came back online as expected. However, one of the distribution routers (DIS1) did not come back up after the reboot because of a corrupt flash card. All traffic failed over to the second distribution router (DIS2) and it was handling all of the traffic for all of the VLANs in the data center. In addition a second failure happened, one of the five GigE cards failed to initialize on DIS2, thus causing a longer network outage for a limited number of VLANs serviced by that GigE card. VLANs being served on that segment were inaccessible until the fault was discovered and the failing card replaced. This process was complete by 3:09 pm CDT. Why was www.ServerBeach.com, our ticketing system, and our forums also down? Our internal servers serving our web site, ticketing system, caching name servers and back office systems were also affected by the power anomaly because they are on a VLAN segment which was also connected to the faulty GigE card. Incidentally, we recently moved our production servers (www.serverbeach.com, ticketing servers, forums, etc.) to our Virginia datacenter as part of a larger project called SafetyNet. Phases 2 and 3 of SafetyNet are setting up our staging and failover systems in our Los Angeles datacenter (Phase 2) and our development servers here in San Antonio (Phase 3). Unfortunately, we have only completed Phase 1 of SafetyNet, which prevented us from failing over to our staging/failover servers in LAX. We expect to have those systems running in the next two weeks. Bottom line and Next steps? We take very seriously our job of being a hosting partner to you and understand that network outages, whether caused by bad weather, power glitches or anything else, can have serious negative impacts to your business. On behalf of all of us here at ServerBeach and Peer1, we sincerely apologize for this disruption of service and promise to make changes to eliminate the possibility of this happening again. With the results of a more detailed investigation, we will determine and communicate to you the corrective actions we are taking. Please stay tuned for more details. In the meantime, if you have any questions or lingering problems with your server(s), please don’t hesitate to call or submit a ticket to our support team. Our phone number is 1-800-741-9939 option 1. Sincerely, Robert Miggins Vice President ServerBeach and the rest of the ServerBeach support team |
|
#3
|
|||
|
Here is another copy of the outage letter that includes a network diagram.
__________________
Charnell Lucich Community Evangelist ServerBeach | By Geeks, For Geeks Twitter: @CharnellLucich |
![]() |
| Thread Tools | |
| Display Modes | Rate This Thread |
|
|