LISTSERV - SCIENTIFIC-LINUX-USERS Archives

SCIENTIFIC-LINUX-USERS Archives

January 2006

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

	LISTSERV Archives
	SCIENTIFIC-LINUX-USERS Home
	SCIENTIFIC-LINUX-USERS January 2006

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: Strange errors
From:	Michael Mansour <[log in to unmask]>
Reply To:	Michael Mansour <[log in to unmask]>
Date:	Mon, 9 Jan 2006 13:43:40 +1000
Content-Type:	text/plain
Parts/Attachments:	text/plain (122 lines)

Hi Ioannis,

> Michael Mansour wrote:
> > Hi Ioannis,
> > 
> > 
> >>  ---------------------- pam_unix End -------------------------
> >>
> >>Jan  4 11:16:41 hdc: dma_intr: status=0x51 { DriveReady SeekComplete 
> >>Error }
> >>Jan  4 11:16:41 hdc: dma_intr: error=0x84 { DriveStatusError BadCRC }
> >>Jan  4 11:16:41 hdc: dma_intr: status=0x51 { DriveReady SeekComplete 
> >>Error }
> >>Jan  4 11:16:41 hdc: dma_intr: error=0x84 { DriveStatusError BadCRC }
> >>Jan  4 11:16:41 hdc: dma_intr: status=0x51 { DriveReady SeekComplete 
> >>Error }
> >>Jan  4 11:16:41 hdc: dma_intr: error=0x84 { DriveStatusError BadCRC }
> >>Jan  4 11:16:41 hdc: dma_intr: status=0x51 { DriveReady SeekComplete 
> >>Error }
> >>Jan  4 11:16:41 hdc: dma_intr: error=0x84 { DriveStatusError BadCRC }
> >>Jan  4 11:16:41 hdc: dma_intr: status=0x51 { DriveReady SeekComplete 
> >>Error }
> >>Jan  4 11:16:41 hdc: dma_intr: error=0x84 { DriveStatusError BadCRC }
> >>Jan  4 11:16:41 hdc: dma_intr: status=0x51 { DriveReady SeekComplete 
> >>Error }
> >>Jan  4 11:16:41 hdc: dma_intr: error=0x84 { DriveStatusError BadCRC }
> >>Jan  4 11:16:41 hdc: dma_intr: status=0x51 { DriveReady SeekComplete 
> >>Error }
> >>Jan  4 11:16:41 hdc: dma_intr: error=0x84 { DriveStatusError BadCRC }
> > 
> > 
> > I've been away for the past few days so haven't been able to contribute to
> > this discussion, but from what I've read on replies to your issue above, I
> > feel there's just too many saying to turf the disk, and I'd personally
> > recommend against it until you have verified the disk is actually faulty.
> > 
> > I have been involved in smartmontools and on their mailing list for years, so
> > know that the errors above do not (and should not) direct you to replacing the
> > disk without you first trying to correct it.
> > 
> > Your query may have been better directed to the smartmontools mailing list
> > (maybe it's an idea to join it for your current issue or just search their
> > archives), where they would have directed you to a good starting resource:
> > 
> > http://smartmontools.sourceforge.net/BadBlockHowTo.txt
> > 
> > I have had many disks give me the same issue as you've had, and have corrected
> > them using the methods described in the above link, or simply by zeroing them
> > (to force re-allocation of sectors) and then re-adding them to my mirrors. At
> > that point, they continue to work flawlessly again. I've done this for years,
> > and have yet to actually turf a disk because of the errors you're currently
> > getting.
> 
> Thanks for the link Michael. Actually this type of error *email* 
> message is very rare (I must have seen it 2-4 times since July).

Are you actually using smartmontools? smartd should be run each day with a
short test, and each week with a long test. You shouldn't rely on the messages
file to report issues on your disks when smartmontools comes with Linux, all
it needs is some configuration.

> There are others more common (and probably meaning nothing), like 
> this I have just received:
> 
>   ################### LogWatch 5.2.2 (06/23/04) ####################
>         Processing Initiated: Mon Jan  9 01:54:43 2006
>         Date Range Processed: yesterday
>       Detail Level of Output: 0
>            Logfiles for Host: localhost.localdomain
>   ################################################################
> 
>   --------------------- Cron Begin ------------------------
> 
> **Unmatched Entries**
> STARTUP (V5.0)
> 
>   ---------------------- Cron End -------------------------
> 
>   --------------------- pam_unix Begin ------------------------
> 
> crond:
>     Unknown Entries:
>        session closed for user root: 28 Time(s)
>        session opened for user root by (uid=0): 28 Time(s)
> 
>   ---------------------- pam_unix End -------------------------
> 
>   --------------------- Connections (secure-log) Begin ------------------------
> 
> **Unmatched Entries**
> userhelper[5479]: running '/sbin/poweroff' with root privileges on 
> behalf of 'root' userhelper[3698]: running '/sbin/poweroff' with 
> root privileges on behalf of 'root'
> 
>   ---------------------- Connections (secure-log) End -------------------------
> 
> ------------------ Disk Space --------------------
> 
> /dev/mapper/VolGroup00-LogVol00
>                         99G  9.4G   85G  11% /
> /dev/hdc1              99M   12M   82M  13% /boot
> 
>   ###################### LogWatch End #########################

None of those are actual problems.

> In any case, I suppose ext3 takes care of bad sectors automatically, 
> if present.

ext3 doesn't really do that, but relies on the disk doing that automatically.

On disks, there's a certain amount of "free sectors" which are used for
reallocation when a bad sector is detected. If your disk runs out of these
sectors and you continue to get bad sectors, that's the time you'll be forced
to turf the disk.

If you're not already, you really should be running smartd to monitor your
disks and schedule short and long tests periodically to let you know when a
disk is about to fail, without relying on the messages file.

Michael.

ATOM RSS1 RSS2

LISTSERV.FNAL.GOV