LITTLEBLACKDOG.COM Forum Index LITTLEBLACKDOG.COM

 
LWD LWD   FAQ FAQ   Memberlist Memberlist   Usergroups Usergroups   Active Topics Active Topics   Register Register  
  Profile Profile   Log in to check your private messages Log in to check your private messages   Log in Log in  
  Who is Online Who is Online   Image Gallery Image Gallery   Chat Chat   Search Search  
  LWDGear       LBDGear  

View next topic
View previous topic
Post new topic     Reply to topic   LITTLEBLACKDOG.COM Forum Index -> Computer Hardware » Storage
Author Message
CMTG
Leg Humper
Leg Humper


Joined: 23 Feb 2002
Posts: 4959
Location: On average, Cheltenham.

Post Posted: Thu Mar 29, 2007 4:49 am   Post subject: RAID keeps degrading. Reply with quote Back to top  

I have two identical SATA 250Gb Seagate drives connected to an ECS 755-A2 Socket 754 motherboard.

The motherboard and drives are very new and the disks seem to work perfectly when configured as plain old storage.

When configured as a mirrored RAID, however, md keeps dropping one as 'failed.'

Here's the errors:

Code: Select all
WARNING:  Kernel Errors Present
   Additional sense: Scsi parity error...:  11 Time(s)
   SCSI error : <1 0 0 0> retu...:  11 Time(s)
   ata2: error=0x84 { DriveStat...:  539 Time(s)
   ata2: status=0x51 { DriveReady SeekComplete Error }...:  539 Time(s)
   device-mapper: error adding target to...:  8 Time(s)
   end_request: I/O error, dev sdb, sector...:  11 Time(s)
   raid1: sdb: unrecoverable I/O read error for block 116259...:  1 Time(s)
   raid1: sdb: unrecoverable I/O read error for block 222383...:  1 Time(s)
   raid1: sdb: unrecoverable I/O read error for block 259833...:  1 Time(s)
   raid1: sdb: unrecoverable I/O read error for block 297969...:  1 Time(s)
   raid1: sdb: unrecoverable I/O read error for block 335692...:  1 Time(s)
   raid1: sdb: unrecoverable I/O read error for block 538063...:  1 Time(s)
   raid1: sdb: unrecoverable I/O read error for block 602894...:  1 Time(s)
   raid1: sdb: unrecoverable I/O read error for block 718786...:  1 Time(s)
   raid1: sdb: unrecoverable I/O read error for block 745408...:  1 Time(s)
   raid1: sdb: unrecoverable I/O read error for block 911502...:  1 Time(s)
   raid1: sdb: unrecoverable I/O read error for block 955232...:  1 Time(s)


Quickly followed by:

Code: Select all
From: mdadm monitoring <XXX>
To: XXX
Subject: Fail event on /dev/md0:ic1.matbooth.co.uk

This is an automatically generated mail message from mdadm
running on ic1.matbooth.co.uk

A Fail event had been detected on md device /dev/md0.

Faithfully yours, etc.


I can use mdadm to remove and re-add the drive to the array and it will re-build the array without complaint and function without issue for a couple of days until it bombs out again with the above messages.

I'm disinclined to think it's an hard disk fault because they work fine as independent drives and in different PCs. My only theory is that it's the motherboard, which I got at the same time as the disks. However, there doesn't seem to be anything overtly wrong with that either. My Google-fu has failed me and I can't find any reason to believe that this motherboard wouldn't be able to support software RAID with its two SATA channels.

I'm using software RAID over the motherboard's built-in stuff because the hardware RAID can't be managed or monitored remotely and doesn't send me emails when something goes wrong.

So what's going on? Any suggestions appreciated.

(Using RHEL/CentOS 4.4, if that matters.)

_________________
Pie. I wish I could
constrain my hungry greed but...
Sadly, defeated.


Charlene's Law: There's no such thing as can't.
Charlene's Corollary: Unless it's followed by be arsed.

If only 20% of your staff is programmers, and you can save 50% on salary by outsourcing programmers to India, well, how much of a competitive advantage are you really going to get out of that 10% savings?
View user's profile Send private message Send e-mail Visit poster's website
the taz man
Butt Sniffer
Butt Sniffer


Joined: 16 Nov 2002
Age: 33
Posts: 1380
Location: CT, USA

Post Posted: Thu Mar 29, 2007 7:49 am   Post subject: Reply with quote Back to top  

Have you tried the drive manufactures diagnostic tools?

http://www.seagate.com/ww/v/index.jsp?locale=en-US&name=SeaTools&vgnextoid=720bd20cacdec010VgnVCM100000dd04090aRCRD

These tools will give the drives a better going over that what windows can do alone.

Just my $0.02 worth

_________________
"It's not the size of the cat in the fight, it's the size of the fight in the cat."
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger
CMTG
Leg Humper
Leg Humper


Joined: 23 Feb 2002
Posts: 4959
Location: On average, Cheltenham.

Post Posted: Thu Mar 29, 2007 12:36 pm   Post subject: Reply with quote Back to top  

the taz man wrote:
Have you tried the drive manufactures diagnostic tools?

http://www.seagate.com/ww/v/index.jsp?locale=en-US&name=SeaTools&vgnextoid=720bd20cacdec010VgnVCM100000dd04090aRCRD

These tools will give the drives a better going over that what windows can do alone.

Just my $0.02 worth


Thanks for the link. I downloaded the bootable FreeDOS image and just tried the short tests. They all came back fine.

It says the long tests take a few hours to run, so I'll run it tomorrow while I'm at work perhaps.

_________________
Pie. I wish I could
constrain my hungry greed but...
Sadly, defeated.


Charlene's Law: There's no such thing as can't.
Charlene's Corollary: Unless it's followed by be arsed.

If only 20% of your staff is programmers, and you can save 50% on salary by outsourcing programmers to India, well, how much of a competitive advantage are you really going to get out of that 10% savings?
View user's profile Send private message Send e-mail Visit poster's website
Slymer
Tail-Wagger
Tail-Wagger


Joined: 29 May 2003
Age: 30
Posts: 2360
Location: chair in front of my computer

Post Posted: Thu Mar 29, 2007 6:22 pm   Post subject: Reply with quote Back to top  

this happened to me recently...

computer locks up... reboot and no hard drive... looks like the hard drive is dead... plugged in another power connector on the same chain that fits better (doesn't slip out so easily) and it works fine. might wanna just check the cables and maybe spray the contacts with some contact cleaner. It may also be a thermal expansion problem with the IO chip. try re-syncing the rig then swapping the drives and see if it sticks with sdb failing. If it does, then it's a problem with the mobo.

*2 cents*

_________________
The Sly One
======================================
Windows is like crack. It feels good, it's easy to start into, it hooks you bad, it costs a ton of money, and it makes you crazy. And you still love it. - EdisonRex

Only two things are infinite, the universe and human stupidity, and I'm not sure about the former. -- Albert Einstein

View user's profile Send private message Visit poster's website AIM Address ICQ Number
CMTG
Leg Humper
Leg Humper


Joined: 23 Feb 2002
Posts: 4959
Location: On average, Cheltenham.

Post Posted: Fri Mar 30, 2007 12:52 am   Post subject: Reply with quote Back to top  

Slymer wrote:
this happened to me recently...

computer locks up... reboot and no hard drive... looks like the hard drive is dead... plugged in another power connector on the same chain that fits better (doesn't slip out so easily) and it works fine. might wanna just check the cables and maybe spray the contacts with some contact cleaner. It may also be a thermal expansion problem with the IO chip. try re-syncing the rig then swapping the drives and see if it sticks with sdb failing. If it does, then it's a problem with the mobo.

*2 cents*


Locking up isn't a symptom, it carries on working without interruption. The only reason I know this is happening is because of the email it sends me when the array degrades. I'm able to rebuild the array without having to reboot.

I will try swapping the drives this evening.

_________________
Pie. I wish I could
constrain my hungry greed but...
Sadly, defeated.


Charlene's Law: There's no such thing as can't.
Charlene's Corollary: Unless it's followed by be arsed.

If only 20% of your staff is programmers, and you can save 50% on salary by outsourcing programmers to India, well, how much of a competitive advantage are you really going to get out of that 10% savings?
View user's profile Send private message Send e-mail Visit poster's website
Dave Rave
Butt Sniffer
Butt Sniffer


Joined: 13 Nov 2003
Posts: 1880
Location: Sydney Australia

Post Posted: Fri Mar 30, 2007 2:23 am   Post subject: Reply with quote Back to top  

good old HDD Health
connect both drives in plain IDE mode
look the HDD Smart stats
check if one or the other is a pile o' crap
back to raid mode
continue
View user's profile Send private message ICQ Number
CMTG
Leg Humper
Leg Humper


Joined: 23 Feb 2002
Posts: 4959
Location: On average, Cheltenham.

Post Posted: Fri Mar 30, 2007 5:16 am   Post subject: Reply with quote Back to top  

Dave Rave wrote:
good old HDD Health
connect both drives in plain IDE mode
look the HDD Smart stats
check if one or the other is a pile o' crap
back to raid mode
continue


Um...

HDD Health wrote:
HDD Health is a full-featured failure-prediction agent for machines using Windows 95, 98, NT, Me, 2000 and XP.

I wrote:
(Using RHEL/CentOS 4.4, if that matters.)

_________________
Pie. I wish I could
constrain my hungry greed but...
Sadly, defeated.


Charlene's Law: There's no such thing as can't.
Charlene's Corollary: Unless it's followed by be arsed.

If only 20% of your staff is programmers, and you can save 50% on salary by outsourcing programmers to India, well, how much of a competitive advantage are you really going to get out of that 10% savings?
View user's profile Send private message Send e-mail Visit poster's website
Dave Rave
Butt Sniffer
Butt Sniffer


Joined: 13 Nov 2003
Posts: 1880
Location: Sydney Australia

Post Posted: Fri Mar 30, 2007 3:07 pm   Post subject: Reply with quote Back to top  

plug the drives into a windows computer
or at least look at the smart stats
there is a smartmontools thing in linux
if you can't find the tool to help diagnose, work around
View user's profile Send private message ICQ Number
CMTG
Leg Humper
Leg Humper


Joined: 23 Feb 2002
Posts: 4959
Location: On average, Cheltenham.

Post Posted: Mon May 21, 2007 1:16 pm   Post subject: Reply with quote Back to top  

Just thought I'd report back on this.

All my problems seem to have evaporated with the installation of CentOS 5. Must be new or updated drivers in the newer kernel or something.

</Shrugs>

So, for future reference, the SATA controller on the ECS 755-A2 doesn't seem to play too well on versions of RHEL/CentOS earlier than 5. (5 has a 2.6.18 kernel, 4 was 2.6.9.)

_________________
Pie. I wish I could
constrain my hungry greed but...
Sadly, defeated.


Charlene's Law: There's no such thing as can't.
Charlene's Corollary: Unless it's followed by be arsed.

If only 20% of your staff is programmers, and you can save 50% on salary by outsourcing programmers to India, well, how much of a competitive advantage are you really going to get out of that 10% savings?
View user's profile Send private message Send e-mail Visit poster's website
noodly2877
StormDog
StormDog


Joined: 22 May 2003
Age: 31
Posts: 2884
Location: The State of Confusion

Post Posted: Tue May 22, 2007 12:43 pm   Post subject: Reply with quote Back to top  

Quote:
I have two identical SATA 250Gb Seagate drives connected to an ECS 755-A2 Socket 754 motherboard.


There's your problem right there...you're using an ECS Motherboard

_________________
Genius has it's limits
But stupidity shines infinite

anglachel wrote:
Also a note to any managers out there, a promotion with out a pay raise is like kicking some one in the balls and telling them that is was a blow job.


View user's profile Send private message Send e-mail Yahoo Messenger MSN Messenger
CMTG
Leg Humper
Leg Humper


Joined: 23 Feb 2002
Posts: 4959
Location: On average, Cheltenham.

Post Posted: Tue May 22, 2007 12:52 pm   Post subject: Reply with quote Back to top  

Maybe, but it was going cheap. Like the budgie.

_________________
Pie. I wish I could
constrain my hungry greed but...
Sadly, defeated.


Charlene's Law: There's no such thing as can't.
Charlene's Corollary: Unless it's followed by be arsed.

If only 20% of your staff is programmers, and you can save 50% on salary by outsourcing programmers to India, well, how much of a competitive advantage are you really going to get out of that 10% savings?
View user's profile Send private message Send e-mail Visit poster's website
anglachel
Guide Dog
Guide Dog


Joined: 08 Nov 2003
Posts: 8421
Location: MN

Post Posted: Tue May 22, 2007 6:48 pm   Post subject: Reply with quote Back to top  

CheeseMonger The Great wrote:
Just thought I'd report back on this.

All my problems seem to have evaporated with the installation of CentOS 5. Must be new or updated drivers in the newer kernel or something.

</Shrugs>

So, for future reference, the SATA controller on the ECS 755-A2 doesn't seem to play too well on versions of RHEL/CentOS earlier than 5. (5 has a 2.6.18 kernel, 4 was 2.6.9.)


my raid was spotty before 2.6.12ish... some times it would just say that the mirror for the root partition was broken, it would attempt a repair, and fail, and then reboot and everything was fine...

think it was just the older raid drivers.

_________________

Quidquid latine dictum sit, altum sonatur.
Death to Shuttleworth!
View user's profile Send private message AIM Address
Display posts from previous:   
Post new topic     Reply to topic

View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2002 phpBB Group
phpBB SEO
All times are GMT - 8 Hours

Help us keep advertisements off this site. Donate today!