SCIENTIFIC-LINUX-USERS Archives

April 2009

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Mark Whidby <[log in to unmask]>
Reply To:
Mark Whidby <[log in to unmask]>
Date:
Wed, 29 Apr 2009 19:26:28 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (87 lines)
Hi,
I look after a small beowulf whose head node got rebooted last night.
Today the compute nodes have got their NFS mounts from the head node mixed
up. On the head node I have this extract in /etc/fstab:

/dev/mapper/3600d02300000000000ed100540e4a000p1 /data01  ext3 defaults  1 2
/dev/mapper/3600d02300000000000ed100540e4a001p1 /data02  ext3 defaults  1 2
/dev/mapper/3600d02300000000000ed105947002a00p1 /data03  ext3 defaults  1 2
/dev/mapper/3600d02300000000000ed105947002a01p1 /data04  ext3 defaults  1 2
/dev/mapper/3600d02300000000000ed105a1384ec00p1 /data05  ext3 defaults  1 2
/dev/mapper/3600d02300000000000ed105a1384ec01p1 /data06  ext3 defaults  1 2

and /etc/exports extract:

/data01  10.0.0.0/24(rw,sync) 130.88.15.0/24(rw,sync) 130.88.67.0/24(rw,sync) 130.88.16.0/24(rw,sync)
/data02  10.0.0.0/24(rw,sync) 130.88.15.0/24(rw,sync) 130.88.67.0/24(rw,sync) 130.88.16.0/24(rw,sync)
/data03  10.0.0.0/24(rw,sync) 130.88.15.0/24(rw,sync) 130.88.67.0/24(rw,sync) 130.88.16.0/24(rw,sync)
/data04  10.0.0.0/24(rw,sync) 130.88.15.0/24(rw,sync) 130.88.67.0/24(rw,sync) 130.88.16.0/24(rw,sync)
/data05  10.0.0.0/24(rw,sync) 130.88.15.0/24(rw,sync) 130.88.67.0/24(rw,sync) 130.88.16.0/24(rw,sync)
/data06  10.0.0.0/24(rw,sync) 130.88.15.0/24(rw,sync) 130.88.67.0/24(rw,sync) 130.88.16.0/24(rw,sync)

To demonstrate what is wrong I created a file on each /dataxx partition called the same as the
partition so on the head node I see this:

# for n in 1 2 3 4 5 6
 > do
 > ls /data0${n}/data??
 > done
/data01/data01
/data02/data02
/data03/data03
/data04/data04
/data05/data05
/data06/data06

On the compute nodes (which I can only access via submitting a Sun Grid Engine job
at this moment in time) this is in /etc/fstab:

10.0.0.254:/data01      /data01                 nfs     defaults        0 0
10.0.0.254:/data02      /data02                 nfs     defaults        0 0
10.0.0.254:/data03      /data03                 nfs     defaults        0 0
10.0.0.254:/data04      /data04                 nfs     defaults        0 0
10.0.0.254:/data05      /data05                 nfs     defaults        0 0
10.0.0.254:/data06      /data06                 nfs     defaults        0 0

The mount command shows this:

/dev/ram0 on / type ext2 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/sda2 on /tmp type ext3 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
10.0.0.254:/home on /home type nfs (rw,addr=10.0.0.254)
10.0.0.254:/usr on /usr type nfs (rw,addr=10.0.0.254)
10.0.0.254:/opt on /opt type nfs (rw,addr=10.0.0.254)
10.0.0.254:/data01 on /data01 type nfs (rw,addr=10.0.0.254)
10.0.0.254:/data02 on /data02 type nfs (rw,addr=10.0.0.254)
10.0.0.254:/data03 on /data03 type nfs (rw,addr=10.0.0.254)
10.0.0.254:/data04 on /data04 type nfs (rw,addr=10.0.0.254)
10.0.0.254:/data05 on /data05 type nfs (rw,addr=10.0.0.254)
10.0.0.254:/data06 on /data06 type nfs (rw,addr=10.0.0.254)

but when I run my little loop as above, I see this:

/data01/data02
/data02/data03
/data03/data01
/data04/data04
/data05/data05
/data06/data06

The first three mounts are plainly wrong and it is the same on all four compute
nodes. I am absolutely confused as to what has happened - any ideas?

Sorry for the length of this but I've tried to be as concise as possible.
It's probably not even SL specific (the cluster is running SL 5.0) but
I value the knowledge and wisdom of the people on this list.

-- 
Mark Whidby
Infrastructure Coordinator (Unix) - Physics/Chemistry/EAES/Mathematics Team
Information Systems
Faculty of Engineering and Physical Sciences

ATOM RSS1 RSS2