Welcome, Guest. Please Login
 
  HomeHelpSearchLogin FAQ Radified Ghost.Classic Ghost.New Bootable CD Blog  
 
Page Index Toggle Pages: 1
Send Topic Print
Server down - RAID issue, PERC 4/SC (Read 8476 times)
jf38081
Dude
*
Offline


Here we go again...

Posts: 22


Back to top
Server down - RAID issue, PERC 4/SC
Sep 23rd, 2008 at 9:18am
 
Hello Everyone,

I'm having troubles with a Dell Poweredge 800 and its PERC 4/SC Raid card running SBS2003.  And 3 SCSI hard drives in a RAID 5.  Things seem to have gone from bad to worse.  Fortunally the client is good about backup, so I believe the worst case scenario would be reinstalling from scratch and relying on backups.The background is this:  Was referred to client when server started beeping.  Immediately I thought it must be a RAID issue.  I booted into the RAID config tool and found 1 of the 3 drives had failed (drive 2.)  Because this is not hot-swapable I shut down the server and removed the drive cage (had to disconnect all the drives to do so) to see brand/size/model number etc for the replacement.  

Reconnected normally and rebooted.  The system got as far as the SBS2003 splash screen just on the first boot attempt.  The RAID beep code switched to a long beep and the system rebooted.  Again I went into the RAID config tool and its now reports drive 0 has also failed, and the RAID has failed as well.  What are the odds??  It is no longer even getting to the Windows splash screen.  

So now I have ordered new drives and am waiting for arrival.  I have not tried to rebuild the RAID.  I wanted to at least replace the first drive to fail before attempting.  Would anyone care to make any suggestions?  It would be really nice not to have to reinstall from scratch.  

The only documentation I found is:  

http://www.cs.uwaterloo.ca/~brecht/servers/docs/PowerEdge-2600/en/Perc4scdc/UG/h...

Also it is a bit confusing which is drive 2.  According to the link, it sounds like the top one is, but in the raid software, it seems to indicate its the bottom one.  Also, the Dell drive cage is labeled top to bottom 0-1-2-3 (there is no 3)

Thank you all,
Jim
 
 
IP Logged
 

MrMagoo
Übermensch
*****
Offline


Resident Linux Guru

Posts: 1026
Phoenix, AZ (USA)


Back to top
Re: Server down - RAID issue, PERC 4/SC
Reply #1 - Sep 23rd, 2008 at 12:14pm
 
If 2 drives have failed in a RAID 5 array, you will not be able to rebuild the array and the data is lost.  If you can keep at least 2 of the old drives running long enough to rebuild the array with 1 new drive, then you can replace the second drive and rebuild again.
 
WWW  
IP Logged
 
jf38081
Dude
*
Offline


Here we go again...

Posts: 22


Back to top
Re: Server down - RAID issue, PERC 4/SC
Reply #2 - Sep 23rd, 2008 at 1:32pm
 
I suppose its possible.  I'm not convinced the second is really lost because it didn't fail until I disconnected and reconnected.  I was hoping there is a way to 'refresh' the drive status to see if it will maybe come back to life.
 
 
IP Logged
 
Nigel Bree
Ex Member




Back to top
Re: Server down - RAID issue, PERC 4/SC
Reply #3 - Sep 23rd, 2008 at 6:07pm
 
jf38081 wrote on Sep 23rd, 2008 at 9:18am:
What are the odds??

Actually, it's pretty common. Although it's typical to model hard drive failures by assuming statistical independence, that's mostly justified by hand-waving. In reality, you have to do additional work to ensure it's true.

Drive failures can be in many circumstances tightly correlated. Typical common failure modes are not just due to environmental conditions like temperature and power supply (although they matter), but we've also seen in our test environments that drives from manufacturers that come from the same manufacturing batch numbers (used in broadly similar environments) definitely do tend to fail in groups too.
 
 
IP Logged
 
Rad.Test
Technoluster
***
Offline


Rad's non-Admin test-profile
in Firefox

Posts: 108


Back to top
Re: Server down - RAID issue, PERC 4/SC
Reply #4 - Sep 24th, 2008 at 12:44pm
 
Quote:
we've also seen in our test environments that drives from manufacturers that come from the same manufacturing batch numbers (used in broadly similar environments) definitely do tend to fail in groups too.

Interesting. Food for thought. What is your sample size? (How many drives?)

It's hard for home users like me to come to any insightful conclusions cuz our sample size (usually 1 or 2 or maybe 3 drives) is so small.
 
 
IP Logged
 
jf38081
Dude
*
Offline


Here we go again...

Posts: 22


Back to top
Re: Server down - RAID issue, PERC 4/SC
Reply #5 - Sep 24th, 2008 at 1:03pm
 
Fair enough.  FYI - I gave up on trying to repair the RAID array as-is.  Replaced all three drives and reinstalled/restored from backup.  I'm sitting here watching one of the LOB applications import its data.  This may take a while so I'll be browsing the forums.  Smiley

Side note:  within the last week I also had another client (home user) bring me a Pc that lost its Raid.  He had 2 drives in a striped array which has failed.  I'm starting to wonder if there is any usefulness at all for these small type RAID setups.  It seems like with a 2 or 3 drive setup, if you have any problems its almost as much trouble as I you just had a regular hard-drive failure.  If we are talking about a large hot-swapable array where multiple drives can fail, I see the benefit, but for a small setup it just seems like trouble waiting to happen...

Many thanks to all who have posted there thoughts!
Jim
 
 
IP Logged
 

MrMagoo
Übermensch
*****
Offline


Resident Linux Guru

Posts: 1026
Phoenix, AZ (USA)


Back to top
Re: Server down - RAID issue, PERC 4/SC
Reply #6 - Sep 25th, 2008 at 3:07pm
 
jf38081 wrote on Sep 24th, 2008 at 1:03pm:
He had 2 drives in a striped array which has failed.I'm starting to wonder if there is any usefulness at all for these small type RAID setups.It seems like with a 2 or 3 drive setup, if you have any problems its almost as much trouble as I you just had a regular hard-drive failure.

The point of a stripped array is speed, not redundancy.  In fact, stripped arrays have *lower* reliability than single hard drives, so it isn't at all uncommon for them to fail.

And, yes, RAID 5 isn't really worth much with only 3 drives.  It would have more reliability than a single hard drive, but you aren't protected against a second drive failing before you replace the first.  RAID 6 helps some, by giving you the ability to have 2 failed drives, but that wouldn't make sense for 3 hard drives (since 2 would be used up just with parity.) 

For home users, there really is no substitute for backups. 

There are other solutions, but not for small installations.  For example, Google invented a distributed file system that keeps multiple copies of the data.  It is sort of like RAID 0+1 but stored over thousands of servers rather than many hard drives in a single computer.  A distributed disk management system keeps track of all the data and ensures it gets automatically copied to a new place if a drive failure brings the number of back-up copies of any data below the required minimum number.

Unfortunately, I don't think Google makes this file system available.  Its too bad, to.  I bet some companies would pay a handsome sum to license it.
 
WWW  
IP Logged
 
Page Index Toggle Pages: 1
Send Topic Print