SCIENTIFIC-LINUX-USERS Archives

July 2015

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Nico Kadel-Garcia <[log in to unmask]>
Reply To:
Nico Kadel-Garcia <[log in to unmask]>
Date:
Tue, 14 Jul 2015 20:21:43 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (56 lines)
The new SL 7 installer is not your friend. But what you sysadmin friend was missing, in order to edit the root drive's /etc/fstab, was the command "mount -o remount,rw /", or "mount -o remount,rw /mnt/sysimage" depending on the state of the system.


Nico Kadel-Garcia
Email: [log in to unmask]
Sent from iPhone

> On Feb 12, 2015, at 16:50, Yasha Karant <[log in to unmask]> wrote:
> 
> I always run an "enterprise" environment on any server, including our GPU compute engine for research applications.  This is not a
> testbed machine per se, although we must load new drivers and new concurrent/GPU implementation methodologies as these evolve.  The base of the
> GPU engine is CUDA.  New compute applications, often from other problem domain areas, typically are run (sometimes ported) to this compute engine.
> 
> We recently started the transition from SL 6 to SL  7; a colleague here was doing the work.  He has numerous comments, posted below,
> and is now insisting that SL (e.g., RHEL) 7 is not suitable for production use in our environment, but that OpenSuSE, Debian, or Mint are more suitable environments.
> I personally disagree, but I greatly would appeciate commentary, particularly from anyone who run other Linux distributions in a production server environment.
> We must support CUDA, some variety of MPI, and operational Infiniband drivers and services.
> 
> Comments (only lightly "cleaned up)
> 
> so i verified that the drive indeed has a bad superblock - open suse did not hesitate to mount
> because the drive was not in fstab, sl6 had mounted it previously because drives only
> get fsckd every (usually) 20 reboots
> 
> so this drive reached the 20 reboot threshold and fsck failed with bad superblock -
> 
> so far so good. the problem is - sl6 refused to mount the root drive rw in the emergency shell,
> but also refused to do anything other than reboot once a drive that is known not to be the
> system drive failed fsck (and it know this was not the sys drive because it had alread mounted
> root to get at fstab)
> 
> the sane, competent, safe solution to a drive problem is to not mount that drive, not refuse
> to bootdrive failure with bad superblock - however it is a data drive, in no way needed to
> boot the system.
> 
> over many trials, it became clear that:
> 
> 1> the drive is in fstab, so system tries fsck which fails into a shell - there appears to be no
> way to tell the system to continue to load, since manual fsck also fails - reboot leads to the same
> problem
> 
> 2> removed the drive - does not help, still tries to fsck and fails, and refuses to continue to load
> 
> 3> tried to edit fstab from shell to rem out drive - could not edit, drive was mounted readonly,
> could not change
> 
> 4> tried to boot from the sl7 live/install usb key - did not let me get to a shell, did not want to
> go ahead and install on top of current system
> 
> 5> created open suse usb key - this allowed me to boot, mount the raid1 drives, edit fstab -
> whereupon the system was able to boot
> 
> What kind of system is unable to deal gracefully with a failed data drive?
> 
> My conclusion - Scientific Linux is too fragile a system for serious use

ATOM RSS1 RSS2