| Author |
Message |
CMTG
Leg Humper


Joined: 23 Feb 2002 Posts: 4823
Location: On average, Cheltenham.
|
Posted:
Thu Mar 29, 2007 4:49 am Post subject: RAID keeps degrading. |
|
I have two identical SATA 250Gb Seagate drives connected to an ECS 755-A2 Socket 754 motherboard.
The motherboard and drives are very new and the disks seem to work perfectly when configured as plain old storage.
When configured as a mirrored RAID, however, md keeps dropping one as 'failed.'
Here's the errors:
WARNING: Kernel Errors Present
Additional sense: Scsi parity error...: 11 Time(s)
SCSI error : <1 0 0 0> retu...: 11 Time(s)
ata2: error=0x84 { DriveStat...: 539 Time(s)
ata2: status=0x51 { DriveReady SeekComplete Error }...: 539 Time(s)
device-mapper: error adding target to...: 8 Time(s)
end_request: I/O error, dev sdb, sector...: 11 Time(s)
raid1: sdb: unrecoverable I/O read error for block 116259...: 1 Time(s)
raid1: sdb: unrecoverable I/O read error for block 222383...: 1 Time(s)
raid1: sdb: unrecoverable I/O read error for block 259833...: 1 Time(s)
raid1: sdb: unrecoverable I/O read error for block 297969...: 1 Time(s)
raid1: sdb: unrecoverable I/O read error for block 335692...: 1 Time(s)
raid1: sdb: unrecoverable I/O read error for block 538063...: 1 Time(s)
raid1: sdb: unrecoverable I/O read error for block 602894...: 1 Time(s)
raid1: sdb: unrecoverable I/O read error for block 718786...: 1 Time(s)
raid1: sdb: unrecoverable I/O read error for block 745408...: 1 Time(s)
raid1: sdb: unrecoverable I/O read error for block 911502...: 1 Time(s)
raid1: sdb: unrecoverable I/O read error for block 955232...: 1 Time(s)
Quickly followed by:
From: mdadm monitoring <XXX>
To: XXX
Subject: Fail event on /dev/md0:ic1.matbooth.co.uk
This is an automatically generated mail message from mdadm
running on ic1.matbooth.co.uk
A Fail event had been detected on md device /dev/md0.
Faithfully yours, etc.
I can use mdadm to remove and re-add the drive to the array and it will re-build the array without complaint and function without issue for a couple of days until it bombs out again with the above messages.
I'm disinclined to think it's an hard disk fault because they work fine as independent drives and in different PCs. My only theory is that it's the motherboard, which I got at the same time as the disks. However, there doesn't seem to be anything overtly wrong with that either. My Google-fu has failed me and I can't find any reason to believe that this motherboard wouldn't be able to support software RAID with its two SATA channels.
I'm using software RAID over the motherboard's built-in stuff because the hardware RAID can't be managed or monitored remotely and doesn't send me emails when something goes wrong.
So what's going on? Any suggestions appreciated.
(Using RHEL/CentOS 4.4, if that matters.) |
_________________ Pie. I wish I could
constrain my hungry greed but...
Sadly, defeated.
So I'm cruising in my '91 Daihatsu blasting Vanessa Carlton's rockin' smash hit "A Thousand Miles," when it suddenly occurs to me:
"Am I too gangsta? Am I too hardcore and menacing for this world?" I just might be.
- Tatsuya Ishida
|
|
|
|
|
the taz man
Butt Sniffer


Joined: 16 Nov 2002 Age: 33 Posts: 1259
Location: CT, USA
|
Posted:
Thu Mar 29, 2007 7:49 am Post subject: |
|
|
|
|
|
CMTG
Leg Humper


Joined: 23 Feb 2002 Posts: 4823
Location: On average, Cheltenham.
|
Posted:
Thu Mar 29, 2007 12:36 pm Post subject: |
|
the taz man wrote:Have you tried the drive manufactures diagnostic tools?
http://www.seagate.com/ww/v/index.jsp?locale=en-US&name=SeaTools&vgnextoid=720bd20cacdec010VgnVCM100000dd04090aRCRD
These tools will give the drives a better going over that what windows can do alone.
Just my $0.02 worth
Thanks for the link. I downloaded the bootable FreeDOS image and just tried the short tests. They all came back fine.
It says the long tests take a few hours to run, so I'll run it tomorrow while I'm at work perhaps. |
_________________ Pie. I wish I could
constrain my hungry greed but...
Sadly, defeated.
So I'm cruising in my '91 Daihatsu blasting Vanessa Carlton's rockin' smash hit "A Thousand Miles," when it suddenly occurs to me:
"Am I too gangsta? Am I too hardcore and menacing for this world?" I just might be.
- Tatsuya Ishida
|
|
|
|
|
Slymer
Butt Sniffer


Joined: 29 May 2003 Age: 30 Posts: 1840
Location: chair in front of my computer
|
Posted:
Thu Mar 29, 2007 6:22 pm Post subject: |
|
this happened to me recently...
computer locks up... reboot and no hard drive... looks like the hard drive is dead... plugged in another power connector on the same chain that fits better (doesn't slip out so easily) and it works fine. might wanna just check the cables and maybe spray the contacts with some contact cleaner. It may also be a thermal expansion problem with the IO chip. try re-syncing the rig then swapping the drives and see if it sticks with sdb failing. If it does, then it's a problem with the mobo.
*2 cents* |
_________________ The Sly One
======================================
Windows is like crack. It feels good, it's easy to start into, it hooks you bad, it costs a ton of money, and it makes you crazy. And you still love it. - EdisonRex
Only two things are infinite, the universe and human stupidity, and I'm not sure about the former. -- Albert Einstein
|
|
|
|
|
CMTG
Leg Humper


Joined: 23 Feb 2002 Posts: 4823
Location: On average, Cheltenham.
|
Posted:
Fri Mar 30, 2007 12:52 am Post subject: |
|
Slymer wrote:this happened to me recently...
computer locks up... reboot and no hard drive... looks like the hard drive is dead... plugged in another power connector on the same chain that fits better (doesn't slip out so easily) and it works fine. might wanna just check the cables and maybe spray the contacts with some contact cleaner. It may also be a thermal expansion problem with the IO chip. try re-syncing the rig then swapping the drives and see if it sticks with sdb failing. If it does, then it's a problem with the mobo.
*2 cents*
Locking up isn't a symptom, it carries on working without interruption. The only reason I know this is happening is because of the email it sends me when the array degrades. I'm able to rebuild the array without having to reboot.
I will try swapping the drives this evening. |
_________________ Pie. I wish I could
constrain my hungry greed but...
Sadly, defeated.
So I'm cruising in my '91 Daihatsu blasting Vanessa Carlton's rockin' smash hit "A Thousand Miles," when it suddenly occurs to me:
"Am I too gangsta? Am I too hardcore and menacing for this world?" I just might be.
- Tatsuya Ishida
|
|
|
|
|
Dave Rave
Butt Sniffer


Joined: 13 Nov 2003 Posts: 1876
Location: Sydney Australia
|
Posted:
Fri Mar 30, 2007 2:23 am Post subject: |
|
good old HDD Health
connect both drives in plain IDE mode
look the HDD Smart stats
check if one or the other is a pile o' crap
back to raid mode
continue |
|
|
|
|
|
|
CMTG
Leg Humper


Joined: 23 Feb 2002 Posts: 4823
Location: On average, Cheltenham.
|
Posted:
Fri Mar 30, 2007 5:16 am Post subject: |
|
Dave Rave wrote:good old HDD Health
connect both drives in plain IDE mode
look the HDD Smart stats
check if one or the other is a pile o' crap
back to raid mode
continue
Um...
HDD Health wrote:HDD Health is a full-featured failure-prediction agent for machines using Windows 95, 98, NT, Me, 2000 and XP.
I wrote:(Using RHEL/CentOS 4.4, if that matters.) |
_________________ Pie. I wish I could
constrain my hungry greed but...
Sadly, defeated.
So I'm cruising in my '91 Daihatsu blasting Vanessa Carlton's rockin' smash hit "A Thousand Miles," when it suddenly occurs to me:
"Am I too gangsta? Am I too hardcore and menacing for this world?" I just might be.
- Tatsuya Ishida
|
|
|
|
|
Dave Rave
Butt Sniffer


Joined: 13 Nov 2003 Posts: 1876
Location: Sydney Australia
|
Posted:
Fri Mar 30, 2007 3:07 pm Post subject: |
|
plug the drives into a windows computer
or at least look at the smart stats
there is a smartmontools thing in linux
if you can't find the tool to help diagnose, work around |
|
|
|
|
|
|
CMTG
Leg Humper


Joined: 23 Feb 2002 Posts: 4823
Location: On average, Cheltenham.
|
Posted:
Mon May 21, 2007 1:16 pm Post subject: |
|
Just thought I'd report back on this.
All my problems seem to have evaporated with the installation of CentOS 5. Must be new or updated drivers in the newer kernel or something.
</Shrugs>
So, for future reference, the SATA controller on the ECS 755-A2 doesn't seem to play too well on versions of RHEL/CentOS earlier than 5. (5 has a 2.6.18 kernel, 4 was 2.6.9.) |
_________________ Pie. I wish I could
constrain my hungry greed but...
Sadly, defeated.
So I'm cruising in my '91 Daihatsu blasting Vanessa Carlton's rockin' smash hit "A Thousand Miles," when it suddenly occurs to me:
"Am I too gangsta? Am I too hardcore and menacing for this world?" I just might be.
- Tatsuya Ishida
|
|
|
|
|
noodly2877
StormDog


Joined: 22 May 2003 Age: 30 Posts: 2536
Location: The State of Confusion
|
Posted:
Tue May 22, 2007 12:43 pm Post subject: |
|
Quote:I have two identical SATA 250Gb Seagate drives connected to an ECS 755-A2 Socket 754 motherboard.
There's your problem right there...you're using an ECS Motherboard |
_________________ Genius has it's limits
But stupidity shines infinite
anglachel wrote:Also a note to any managers out there, a promotion with out a pay raise is like kicking some one in the balls and telling them that is was a blow job.
|
|
|
|
|
CMTG
Leg Humper


Joined: 23 Feb 2002 Posts: 4823
Location: On average, Cheltenham.
|
Posted:
Tue May 22, 2007 12:52 pm Post subject: |
|
Maybe, but it was going cheap. Like the budgie. |
_________________ Pie. I wish I could
constrain my hungry greed but...
Sadly, defeated.
So I'm cruising in my '91 Daihatsu blasting Vanessa Carlton's rockin' smash hit "A Thousand Miles," when it suddenly occurs to me:
"Am I too gangsta? Am I too hardcore and menacing for this world?" I just might be.
- Tatsuya Ishida
|
|
|
|
|
anglachel
Guide Dog


Joined: 08 Nov 2003 Posts: 8131
Location: MN
|
Posted:
Tue May 22, 2007 6:48 pm Post subject: |
|
CheeseMonger The Great wrote:Just thought I'd report back on this.
All my problems seem to have evaporated with the installation of CentOS 5. Must be new or updated drivers in the newer kernel or something.
</Shrugs>
So, for future reference, the SATA controller on the ECS 755-A2 doesn't seem to play too well on versions of RHEL/CentOS earlier than 5. (5 has a 2.6.18 kernel, 4 was 2.6.9.)
my raid was spotty before 2.6.12ish... some times it would just say that the mirror for the root partition was broken, it would attempt a repair, and fail, and then reboot and everything was fine...
think it was just the older raid drivers. |
_________________
Quidquid latine dictum sit, altum sonatur.
Death to Shuttleworth!
|
|
|
|
|
|
|