View Full Version : Taking advantage of a one week recommendation
I sure hope those involved in the maintenance tonight took advantage of the one week recommendation to power your servers on and off to make sure that it comes back online ok.
Waiting until the last minute (today) will definitely cause confusion, frustration, and delay in bringing it back up due to most of our staff preparing our internal servers and DC for tonight.
I hope for all of our sake, this will be a smooth window.
knightfoo
2004-06-16, 14:54 PM
Yeah, those 5 steps weren't something we just pulled out of thin air. :) I've been through several planned and unplanned power outages, and have seen many servers bite the dust without notice. Servers that have been online for a very long time have a tendency to flake out when they get rebooted. A hard drive that has been running for 2 years straight may not spin up once it spins down. This is one of the many reasons why it is important to keep regular, offsite backups .. just because a server is on a UPS does not mean it will never lose power.
We are only 10 hours away from the maintenance window and it is not too late to prepare your server. At the very least you should make backups and perform a reboot to make sure it will come back online properly. We will have extra staff to troubleshoot hardware issues and make sure servers boot to the point where you can login through SSH or RDC, but we will not be able to check every service on every server. The ticketing system will be one of the first systems brought back online, but you should not submit tickets until after the window is over. We will not be able to respond to tickets and servers will be checked in the order they are powered on anyway.
-knightfoo
Originally posted by knightfoo
Servers that have been online for a very long time have a tendency to flake out when they get rebooted. A hard drive that has been running for 2 years straight may not spin up once it spins down.
Yep, I have definately seen that happen before on machines with uptimes in excess of 6 months.
Definately take precautions as he suggests and backup your data offsite!
Its an interesting fact that there is sometimes no difference physically between various harddrives from the same manufacturer.
For example a harddrive from Foo Harddrive Inc, at 120 gigs, and their 160gig product, may in fact be the same underlying equipment. But they tossed the 120s into a different bin than the 160s because of tests performed after production.
(That's not always the case, but I am just using it to illustrate that not all harddrives are created equal!)
Backup your data regularly!!!!!!
--fork()
awsolutions
2004-06-16, 17:25 PM
Hey Knightfoo or QT,
Tongiht at 12:00CST the upgrades start. Is that when we should all power down the servers?
Also, if something happens in the upgrade that is does not get completed - or gets postponed will SB check the racks to ensure the servers are back online since most or some people will have already powered them down?
Thanks in advance!
Infopro
2004-06-16, 23:16 PM
I think all of that's been posted in the announcement.
Originally posted by awsolutions
Hey Knightfoo or QT,
Tongiht at 12:00CST the upgrades start. Is that when we should all power down the servers?
Also, if something happens in the upgrade that is does not get completed - or gets postponed will SB check the racks to ensure the servers are back online since most or some people will have already powered them down?
Thanks in advance!
It would probably be best to power down your server before midnight CDT. The power in the datacenter will probably be cut by 12:05am CDT.
More information can be found here:
http://forums.serverbeach.com/showthread.php?s=&postid=21153#post21153
:)
awsolutions
2004-06-17, 10:29 AM
I stayed up to do it, than I was on another forum and missed it! It was 1:07 EST and I wnt to issue the commands, one of my servers was already down the other one I issued the command.
The one I actually issued the command had a problem comming back up, but rapid reboot got it. (That server is always a PITA, no matter what is done to it, it always fights with you)
itxcel
2004-06-17, 11:24 AM
QT,
I had the best of plans to test my server would reboot earlier in the week. Unfortunately my machine here (with all my server backups) went awol with a dead drive. I only managed to get everything buck up and running yesterday and decided the first thing I needed to do was test my server (The first thing I should have done was take more backups). I was about to say this week could not get any worse but that is tempting fate, it could still get a lot worse :(.
I'm starting to get a terrible :flush: feeling. My ticket has been open for since 2004-06-16 10:02:38-05 :(
Ewan.
Originally posted by itxcel
QT,
I had the best of plans to test my server would reboot earlier in the week. Unfortunately my machine here (with all my server backups) went awol with a dead drive. I only managed to get everything buck up and running yesterday and decided the first thing I needed to do was test my server (The first thing I should have done was take more backups). I was about to say this week could not get any worse but that is tempting fate, it could still get a lot worse :(.
I'm starting to get a terrible :flush: feeling. My ticket has been open for since 2004-06-16 10:02:38-05 :(
Ewan.
I've taken control of your ticket and you will be hearing from me shortly. :)
awsolutions
2004-06-17, 13:44 PM
QT,
You must be exhausted....hope ur making time and 1/2!!
:P
Originally posted by awsolutions
QT,
You must be exhausted....hope ur making time and 1/2!!
:P
It's been a pretty long 29 hours so far..but I managed to catch a catnap. :) I'm sure I'll be snoozing pretty good tonight when I go to bed.
:)
what is this time and half thing you speak of?
;)
QT,
Sorry, did you miss the memo?
The sleep() library has been removed from your path! We're working on the upgrade that will yield sleep 2.0, but it is still too buggy to release.
Perhaps you should check into clone() as that promises to alleviate some of your issues.
:dizzy:
--fork()
:rofl:
I didn't get the memo....
awsolutions
2004-06-17, 14:10 PM
class cloneQT
{
QT = best;
return QT;
}
QTarray[] qt = new QTarray[numCustomers];
for( int i = 0; i<numCustomers; i++){
qt[i] = cloneQT();
}
There we go, now all SB customers should be happy!
Man...
I love you guys!
:emlaugh:
Originally posted by QT
I've taken control of your ticket and you will be hearing from me shortly. :)
How about taking control of my ticket and letting me know what is going on. Ticket # 111660
Originally posted by Doug
How about taking control of my ticket and letting me know what is going on. Ticket # 111660
I already have, which is why you received a phone call.
knightfoo
2004-06-17, 16:35 PM
Originally posted by Doug
How about taking control of my ticket and letting me know what is going on. Ticket # 111660
There really is no need to become sarcastic with forum members or ServerBeach staff. Not only that, these forums are user to user and not a method of requesting support. No employee is required to read or respond to posts on these forums. We are all very busy here and QT was kind enough to take time out of her busy schedule to offer some assistance on the forums.
As I understand it, the hard drive in your server failed completely and the support techs did was necessary to get a working server online, albeit without your data. There is nothing we can do with a completely dead hard drive. I realize the communication could have been better, but at the time the support techs were simply concerned with getting the server online so you could login and begin recovery.
As a side note, this is a perfect example of why backups are important. A hard drive failure can occur at any time and it could result in a drive being completely useless. If that happens, your options are a) restore from backups or b) pay thousands of dollars for data recovery and hope they can get something useful from the drive.
-knightfoo
Originally posted by knightfoo
There really is no need to become sarcastic with forum members or ServerBeach staff. As I understand it, the hard drive in your server failed completely and the support techs did was necessary to get a working server online, albeit without your data. There is nothing we can do with a completely dead hard drive. I realize the communication could have been better, but at the time the support techs were simply concerned with getting the server online so you could login and begin recovery.
As a side note, this is a perfect example of why backups are important. A hard drive failure can occur at any time and it could result in a drive being completely useless. If that happens, your options are a) restore from backups or b) pay thousands of dollars for data recovery and hope they can get something useful from the drive.
-knightfoo
I don't think that was sarcastic! I just wanted to know what was going on. I did everything I should have. I rebooted my server well in advance and again last night. I did a back up last night as I should have. I powered down my sever at 11:50 last night. After that I still had a drive die. The problem was that SB support was not telling me what was going on. I started a support ticket at 5:37. I got a new admin password put in my ticket at 9:29 with out any comments about why I had a new admin password.
At 9:35 I asked what happend and could not get an answer. I had no idea what had or was happening and no one in the support department was offering up any information until Chris Miller called me about 2pm and told me what had happend. Now if SB would have told me what had happend when I was given the new password this issue would have been done then. I hope that you will work on your support departments commuicaton skills. I have no hard feeling and I know that XXXX happens. Knowing what was going on would have made it a less stressful day for all.
vBulletin® v3.6.8, Copyright ©2000-2008, Jelsoft Enterprises Ltd.