SCIENTIFIC-LINUX-USERS Archives

March 2008

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Markos Gogoulos <[log in to unmask]>
Reply To:
Markos Gogoulos <[log in to unmask]>
Date:
Thu, 20 Mar 2008 11:05:42 +0200
Content-Type:
text/plain
Parts/Attachments:
text/plain (27 lines)
Dear all,

in our Scientific Linux cluster (SL 4.5 64bit) we have received the 
following two errors:

wn014: Mar 12 09:09:20 wn014 kernel: knot[24574] trap divide error 
rip:405894 rsp:7fbfffe520 error:0

wn057: Feb 27 04:31:44 wn057 kernel: fs[4851]: segfault at 
00000000ed410c22 rip 00000033ed3707b0 rsp 0000007fbfffb2a8 error 4

Errors are repeated many times (each 4-5 seconds) and appear every few 
days on most of the nodes.

All nodes have 2.6.9-67.0.4.ELsmp x86_64 kernel and run a small set of 
ordinary services - as grid worker nodes-  plus a gpfs client software 
to mount a gpfs filesystem.


Have you experienced this before? Could you point us at how we can 
source the problem or increase the level of debugging, so we can provide 
more valuable information?

Thanks and regards,

IASA IT TEAM

ATOM RSS1 RSS2