SCIENTIFIC-LINUX-USERS Archives

December 2023

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Konstantin Olchanski <[log in to unmask]>
Reply To:
Konstantin Olchanski <[log in to unmask]>
Date:
Wed, 6 Dec 2023 09:51:20 -0800
Content-Type:
text/plain
Parts/Attachments:
text/plain (167 lines)
For your system, I would not agonize over choice of ext4 or XFS,
in practice you will see little difference.

Some practical advice for your system:

- bump the OS + home dirs from 1TB to 2TB (incremental cost of two 1TB SSD vs two 2TB SSD is small)
- run this system on UPS power. your 200 TB data array will be RAID-6 with 12-16 of 20 TB HDDs, probability of one HDD failure is high, raid rebuild time will be about 2 days, if power goes out during rebuild, Bad Things Will Happen.
- we have been using XFS for large data arrays since late 1990-ies (on SGI machines), it is very reliable, it will not corrupt itself (unless you have defective hardware, ZFS was developed to deal with that, checksums, self-healing, etc).
- your server should have ECC DRAM. this is a standard feature for all server-class machines, use it. all our server machines have 64 GB of ECC memory.

If I were building this system, I would make both the 2TB SSD array and the 200TB data array ZFS.

Also you do not say how you will back up your data. You must have both backups and archives. Backups protect you against oops, I deleted wrong file", archives protected you against "oops, I deleted wrong file 2 years ago".

Without backups and archives, if you have a fire, a flood, a crypto-ransomware attack,
if your server is stolen, you lose everything.


K.O.


On Wed, Dec 06, 2023 at 11:37:54AM -0500, Edward Zuniga wrote:
> Cc'ing supervisor to loop him in as well.
> 
> On Wed, Dec 6, 2023, 9:18 AM Edward Zuniga <[log in to unmask]> wrote:
> 
> > Thanks everyone for the feedback! I've learned so much from reading the
> > discussions.
> >
> > For our application, we will have a LAN with a single server (1TB RAID1
> > array for OS, 200TB RAID5 array for data) and up to 16 workstations (1TB
> > RAID1 array for OS). Our IT department is more familiar with Rocky Linux 8,
> > which I assume will perform the same as AlmaLinux 8. Some of our MRI
> > processing can take weeks to finish, so we need a system that is very
> > reliable. We also work with individual files in the hundreds of gigabytes.
> >
> > While reading the Red Hat 8 manual
> > <https://urldefense.proofpoint.com/v2/url?u=https-3A__access.redhat.com_documentation_en-2Dus_red-5Fhat-5Fenterprise-5Flinux_8_html_managing-5Ffile-5Fsystems_overview-2Dof-2Davailable-2Dfile-2Dsystems-5Fmanaging-2Dfile-2Dsystems&d=DwIFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=gd8BzeSQcySVxr0gDWSEbN-P-pgDXkdyCtaMqdCgPPdW1cyL5RIpaIYrCn8C5x2A&m=NWrdCkO_Rv2xr06ZFmX0tmfqqeYNrrwhynckTqel03PtwMXxPfTvgwA0pa8NEDQP&s=PuCPAQ38-YIaby8e4N7dh0ORNT6UbvsXS04mQ0wfKnw&e= >,
> > I found a few possible issues regarding XFS. I'm curious to see if anyone
> > has experienced these as well.
> >
> > 1. Metadata error behaviorIn ext4, you can configure the behavior when
> > the file system encounters metadata errors. The default behavior is to
> > simply continue the operation. When XFS encounters an unrecoverable
> > metadata error, it shuts down the file system and returns the EFSCORRUPTED
> >  error.*This could be problematic for processing that takes several
> > weeks.*2. Inode numbers
> >
> > The ext4 file system does not support more than 232 inodes.
> >
> > XFS dynamically allocates inodes. An XFS file system cannot run out of
> > inodes as long as there is free space on the file system.
> >
> > Certain applications cannot properly handle inode numbers larger than 232 on
> > an XFS file system. These applications might cause the failure of 32-bit
> > stat calls with the EOVERFLOW return value. Inode number exceed 232 under
> > the following conditions:
> >
> >    - The file system is larger than 1 TiB with 256-byte inodes.
> >    - The file system is larger than 2 TiB with 512-byte inodes.
> >
> > If your application fails with large inode numbers, mount the XFS file
> > system with the -o inode32 option to enforce inode numbers below 232.
> > Note that using inode32 does not affect inodes that are already allocated
> > with 64-bit numbers.
> > *Has anyone encountered this issue? *3. The Red Hat 8 manual also warns
> > that using xfs_repair -L might cause significant file system damage and
> > data loss and should only be used as a last resort. The manual does not
> > mention a similar warning about using e2fsck to repair an ext4 file system.
> > Has anyone experienced issues repairing a corrupt XFS file system?
> > Thanks,Eddie
> >
> > On Tue, Dec 5, 2023 at 8:46 PM Konstantin Olchanski <[log in to unmask]>
> > wrote:
> >
> >> On Mon, Dec 04, 2023 at 03:03:46PM -0500, Edward Zuniga wrote:
> >> >
> >> > We are upgrading our MRI Lab servers and workstations to AlmaLinux 8. We
> >> > have used ext4 for the past 10 years, however we are considering using
> >> XFS
> >> > for its better performance with larger files. Which file system do you
> >> use
> >> > for your lab?
> >> >
> >>
> >> Historical background.
> >>
> >> XFS filesystem with the companion XLV logical volume manager (aka
> >> "partitioning tool")
> >> came to Linux from SGI IRIX, where it was developed circa late-1990-ies.
> >> XFS was copied
> >> to Linux verbatim (initially with shims and kludges, later, fully
> >> integrated).
> >> XLV was reimplemented as LVM.
> >>
> >> The EXT series of filesystems were developed together with the linux
> >> kernel (first ext
> >> filesystem may have originated with MINIX, look it up). As improvements
> >> were made,
> >> journaling, no need to fsck after crash, online grow/shrink, etc, they
> >> were
> >> renamed ext2/ext3/ext4 and they are still largely compatible between
> >> themselves.
> >>
> >> For many purposes, both filesystems are obsoleted by ZFS, which added:
> >>
> >> - added metadata and data checksums - to detect silent bit rot on
> >> current-generation HDDs and SSDs
> >> - added online filesystem check - for broken data, gives you list of
> >> filenames instead of inode numbers
> >> - added "built-in" mirroring - together with checksums, online fsck (zfs
> >> scrub) and monthly zfs scrub cron job, allows automatic healing of bit rot.
> >> - added "built-in" raid-5 and raid-6 - again, together with checksums and
> >> online fsck, allows automatic healing and robust operation in presence of
> >> disk bad sectors, I/O errors, corruption and single-disk failure.
> >> - other goodies like snapshots, large ram cache, dedup, online
> >> compression, etc are taken for granted for current generation filesystems.
> >>
> >> On current generation HDDs and SSds, use of bare XFS and ext4 is
> >> dangerous, SSD failure or "HDD grows bad sectors" will destroy your data
> >> completely.
> >>
> >> On current generation HDDs, use of mirrored XFS and ext4 is dangerous
> >> (using mdadm or LVM mirroring), (a) bit rot inevitably causes differences
> >> between data between the two disks. Lacking checksums, mdadm and LVM
> >> mirroring cannot decide which of the two copies is the correct one. (b)
> >> after a crash, mirror rebuild fill fail if both disks happen to have bad
> >> sectors (or throw random I/O errors).
> >>
> >> Ditto for RAID5 and RAID6, probability of RAID rebuild failing because
> >> multiple disks have have sectors and I/O errors goes up with the number of
> >> disks.
> >>
> >> ZFS was invented to resolve all these problems. (BTRFS was invented as a
> >> NIH erzatz ZFS, is still incomplete wrt RAID5/RAID6).
> >>
> >> Bottom line, if you can, use ZFS. Current Ubuntu installer has a button
> >> "install on ZFS", use it!
> >>
> >> --
> >> Konstantin Olchanski
> >> Data Acquisition Systems: The Bytes Must Flow!
> >> Email: olchansk-at-triumf-dot-ca
> >> Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada
> >>
> >
> >
> > --
> > Edward A. Zuniga
> > Senior Research Analyst/CARI MRI Lab Manager
> > Virginia Commonwealth University
> > C. Kenneth and Dianne Wright Center for Clinical and Translational Research
> > Institute for Drug and Alcohol Studies
> > Collaborative Advanced Research Imaging (CARI) Facility
> > 203 East Cary Street, Suite 202
> > Richmond, Virginia 23219
> > Phone: (804) 828-4184
> > Fax: (804) 827-2565
> > [log in to unmask]
> >

-- 
Konstantin Olchanski
Data Acquisition Systems: The Bytes Must Flow!
Email: olchansk-at-triumf-dot-ca
Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada

ATOM RSS1 RSS2