SCIENTIFIC-LINUX-USERS Archives

August 2021

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Yasha Karant <[log in to unmask]>
Reply To:
Yasha Karant <[log in to unmask]>
Date:
Tue, 10 Aug 2021 17:20:12 -0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (194 lines)
Not all locations are as strict as BC, and below 50V (DC as well?) can 
cause fires, etc. -- those pesky Li battery issues that you may recall. 
My reference to power supplies was to all power supplies that have a 
risk of "high" temperature, fire, or explosion.

On 8/10/21 4:43 PM, Konstantin Olchanski wrote:
> 
> 
> In the Province of British Columbia, AC-side electrical equipment
> is designed by registered Professional Engineers (PE) and worked
> on by licensed electricians. At TRIUMF these people are better
> than average.
> 
> Equipment we have built for SNOLAB and CERN was certified
> by an outside licensed inspector and has the CSA and CE marks. Inspectors
> were pretty strict and required a few changes in order to pass muster.
> 
> Equipment below 50V generally does not need to be inspected/certified,
> but I, as an end-user, always ask "where is the fuse?" and "is this
> a fuse of the right type?" (I know there are different types of fuses
> but it is not my job to select the correct one).
> 
> We take electrical safety very seriously.
> 
> P.S. You cannot use a quarter to jumper blown fuses on current
> generation electronics. You are lucky if you can even see
> the fuse without a microscope. Replacing blown fuses is generally
> done by electronics technicians and they will not honor the request
> to "just jumper it!".
> 
> P.P.S. A physics lab is not Boeing with their "only flies down" airplanes
> and "what time is now?" space ships.
> 
> 
> K.O.
> 
> 
> On Tue, Aug 10, 2021 at 04:10:26PM -0700, Yasha Karant wrote:
>> A proper circuit breaker, hopefully with external or simple panel
>> removal access (not remove from rack, open chassis, remove ... ),
>> will work fine and typically is better than a fuse.  A "soldered in
>> place" fusible link also will work, but is much more difficult to
>> service and replace.  Anyone who puts a jumper over an overcurrent
>> device (the "coin in the fuse box"), other than for diagnostic
>> testing, needs to be both educated and reprimanded.  Note that if
>> there is a power supply (typically from the mains), one needs a
>> circuit breaker both for the power supply and for the items that the
>> power supply is supplying. Clearly, the safety engineering unit is
>> not verifying that any custom apparatus meets basic fail-safe
>> practices -- I am not suggesting actual UL, etc., testing and
>> certification for a specific experimental data or control device
>> (although I do look for such certifications on the actual circuit
>> breaker -- accidents are not nice).
>>
>> On 8/10/21 4:00 PM, Konstantin Olchanski wrote:
>>> On Tue, Aug 10, 2021 at 03:34:00PM -0700, Yasha Karant wrote:
>>>> One SSD had an internal short and turned into a space heater,
>>>> luckily there was no fire. End excerpt.
>>>>
>>>> Clearly, there is very poor safety engineering and/or quality
>>>> control
>>>
>>> you will not be amused to learn how many electronics lack
>>> proper fuses and protections against internal and external
>>> shorts. even here, I have seen good people forget to put fuses
>>> on newly built boards.
>>>
>>>> (as with certain Li batteries that did similar things in
>>>> personal devices being operated by the user).
>>>
>>> that's different. SSD stores bits, Li battery stores Joules,
>>> and "bits do not burn".
>>>
>>>> If that SSD had been inside a laptop (presumably, inside a rack mounted
>>>> disk farm and there were fire extinguishers and possibly a machine room fire
>>>> suppression system), things could have had a much worse outcome
>>>> (most laptops have combustible materials).
>>>
>>> tangled server rooms, laptops, men, guns, horses all together.
>>>
>>> laptop battery probably will not have enough oomph for a good SSD fire,
>>> cannot supply enough Amps, will shutdown before things get hot. ditto
>>> for laptop power supply (60 W vs 600 W PC power supply).
>>>
>>> server chassis with rack mounted SSD in a server room has such good cooling
>>> that the shorted SSD will only get slightly warm. also server power supply
>>> will probably shutdown quickly because of undervoltage condition. so no fire.
>>>
>>> in this particular case, the computer was in an experimental area,
>>> that has combustible materials, etc.
>>>
>>>>
>>>> As for the small amount of storage, the commentator is at a
>>>> reasonably well funded (through government sources and possible
>>>> tax-deductible or glamour philanthropy) HEP facility.
>>>>
>>>
>>> We also have a $$$ printing press in our basement (I have a key!) and
>>> we can transmute lead into gold (only slightly radioactive).
>>>
>>> K.O.
>>>
>>>>
>>>> Much of the world, including non-collaboration funded university research
>>>> facilities have rather poor funding at most entities within the USA
>>>> (not all faculty members can be at Harvard, Stanford, etc.) --
>>>> administrative and some instructional facilities typically can get
>>>> much more.  Many universities now outsource to paid "cloud" storage,
>>>> with all of the issues that may entail.
>>>>
>>>> On 8/10/21 3:08 PM, Konstantin Olchanski wrote:
>>>>> Hi, Larry, thank you for this information, it is always good to see
>>>>> how other people do things.
>>>>>
>>>>> I am surprised at how little storage you have, only a handful of TBs.
>>>>>
>>>>> Here, for each experiment data acquisition station, we now configure
>>>>> 2x1TB SSD for os, home dirs, apps, etc and 2x8-10-12TB HDD for recording
>>>>> experiment data. We use "sort by price" NAS CMR HDDs (WD red, etc).
>>>>>
>>>>> All disks are doubled up as linux mdadm raid1 (mirror) or ZFS mirror. This is
>>>>> to prevent any disruption of data taking from single-disk failure.
>>>>>
>>>>> (it is important to configure the boot loader on both SSDs to boot
>>>>> even if the other SSD is dead).
>>>>>
>>>>> I am surprised you use 1TB HDDs. We switched to SSD up to 2TB size (WD blue SATA SSDs).
>>>>>
>>>>> Failure rates of HDDs, the only reliable data is from backblaze:
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.backblaze.com_b2_hard-2Ddrive-2Dtest-2Ddata.html&d=DwIBAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=gd8BzeSQcySVxr0gDWSEbN-P-pgDXkdyCtaMqdCgPPdW1cyL5RIpaIYrCn8C5x2A&m=NXYkiOfF7bPKBqi2iMgqsqrtLHRVdP7lIO-L5J4AmqQ&s=DgUuM1BVcm4jUkUWsi_DNMAjvkuy1zl1oaDQzrC4YAk&e=
>>>>> and
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.backblaze.com_blog_backblaze-2Ddrive-2Dstats-2Dfor-2Dq2-2D2021_&d=DwIBAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=gd8BzeSQcySVxr0gDWSEbN-P-pgDXkdyCtaMqdCgPPdW1cyL5RIpaIYrCn8C5x2A&m=NXYkiOfF7bPKBqi2iMgqsqrtLHRVdP7lIO-L5J4AmqQ&s=lPk2j2mTwp6uDzrZYUsP2rIxyRiacBHZOU0o7R5mUqM&e=
>>>>>
>>>>> Failure rates of SSDs, seems to be very low, I only have 2-3 failed SSDs. One SSD had an
>>>>> internal short and turned into a space heater, luckily there was no fire.
>>>>>
>>>>> For backups of os and home dirs we use amanda and rsync+zfs snapshots. Backups
>>>>> of experiment data is not our responibility (many experiments use usb hdds).
>>>>>
>>>>>
>>>>> K.O.
>>>>>
>>>>>
>>>>> On Tue, Aug 10, 2021 at 10:55:35AM -0400, Larry Linder wrote:
>>>>>> There are 25 systems in our shop, all linux based, a linux based server,
>>>>>> and synology Disk Station running raid 1.   The Disk Station has 12 TB
>>>>>> of space.  6 TB per for each raid level.
>>>>>>
>>>>>> We buy only one brand of disk with the black label.  They are typically
>>>>>> 1 TB.
>>>>>>
>>>>>> User boxes has a SSD drive for the OS and a 2 TB disk for the users
>>>>>> space and 32 G RAM. and a quad or six core AMD processor.  The graphics
>>>>>> boxes get a Video card with lots of ram.  3 D rendering on a slow video
>>>>>> care wast's a lot of users time.
>>>>>>
>>>>>> The server has a SSD for the OS and 6 TB for user apps /
>>>>>> library /usr/local and /opt.  It also has a mirror disk that keeps a
>>>>>> copy of the server locally.
>>>>>>
>>>>>> These systems are on 24 / 7 and accumulate a lot of hours.  No matter
>>>>>> what the make mechanical disks have a life span.  For grins I used to do
>>>>>> a post mortum on disk that failed.   There were to types of failures,
>>>>>> the spring that returns the arm holding the heads cracks.  The second
>>>>>> type of failure is the main bearings.  Newer disk seem to have less of a
>>>>>> bearing failure rate.
>>>>>> To prevent operational problems we just swap out the disk on each box at
>>>>>> about 5,000 to 7,000 hr.  The manufacturer says they are good for 10,000
>>>>>> hr. See the fine print in the Waranty,  You have to remember this is a
>>>>>> money making operation and down time is costly.
>>>>>>
>>>>>> Backups run at 12:29 and 0:29 in the AM.  At the end of the morning back
>>>>>> up a copy is sent to a remote site.
>>>>>>
>>>>>> For security we shut down the network at 6:20 PM, bring it up at 0:01 AM
>>>>>> and shut it down after back up is complete.  We bring it back up at 6:45
>>>>>> AM.
>>>>>> 10 yeas ago we had a fixed IP and the Chineese found it by just
>>>>>> continually pounding on the door.  The return IP was 4 hops to a city
>>>>>> north east of Shanghi.  They had installed a root kit on our server,
>>>>>> disabled cron.  When you changed the passwd to the server a few
>>>>>> millisecond later it was sent to china.  We got rid of the fixed IP and
>>>>>> reloaded all the systems.  So when you shout down the network to your
>>>>>> provider the next time your start it you get a different IP.
>>>>>>
>>>>>> We don't give the disks away as they contain a lot of design data,
>>>>>> SW,Cad programs, part programs for our mill etc.  We donate them to a
>>>>>> charity that drills the disks and recycles the rest.
>>>>>>
>>>>>> Larry Linder
>>>>>
>>>
> 

ATOM RSS1 RSS2