SCIENTIFIC-LINUX-USERS Archives

August 2007

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
John Summerfield <[log in to unmask]>
Reply To:
John Summerfield <[log in to unmask]>
Date:
Fri, 31 Aug 2007 08:51:37 +0800
Content-Type:
text/plain
Parts/Attachments:
text/plain (75 lines)
Nathan Moore wrote:
> Hi,
> 
> I've been using rsync as a primitive backup tool on a small cluster of SL45
> and SL5 machines.  Lately, there is an intermittant error when I run rsync
> to backup a large (20GB) directory of mixed file types.  The error isn't a
> loud failure, but rather just that the filetransfer stalls and the node the
> files are being copied from locks up (the lockup is complete - the "server"
> node is unavailable via NIS, ssh, or console login - it has to be
> powercycled)
> 
> Is there a known bug in rsync?  Is there a way to trouble-shoot my "server"
> machine?

Volume of data isn't the only measure of "large" - the number of files 
is important too.

Some time ago (debian Woody+RHL7.3) I had a problem with rsync timing 
out when backing up (most of) my Woody filesystem over ADSL.

I took up the issue on the rsync list where the folk were very helpful. 
The thread was "reliability and robustness problems" about Oct 04.

By default, rsync does not timeout, so one really needs to specify a 
timeout value.

then, I found it timing out too readily.

It also used an enormous amount of RAM: it's the only program I know 
that can cause Linux to use swap (many times*real ram) and not cause 
thrashing.

As best I can figure it, rsync was building a filtered view of the 
target files area and (no doubt) the source files area, and neither side 
talks to the other while this is happening. I think this was taking an 
hour or so, but this _was_ a few years ago.

The rsync gurus opined that it was better to backup this way than to 
backup a single file, but my experience suggests otherwise; I now create 
a filtered filesystem image and use rsync to update that.

While rsync is building its lists of what to transfer, systems at both 
ends can get rather busy, particularly if something else is running 
interference on the use of ram RAM.


This, of course, can cause a bit of distress to both computers, but if 
they really are locked up as opposed to being seriously overtaxed, then 
you have either a kernel bug or a hardware problem. Nothing rsync can do 
should cause the system to actually lock up.

I think I would start by directing syslog (kernel messages at least) to 
another box, or to a printer on the parallel port. Look for signs of the 
oom killer at work.

You might also do something as crude as adapting and running this:
  while :
   do
     ps xar | logger -i
     sleep 1m
   done
while making sure the logged message go to Somewhere Else



-- 

Cheers
John

-- spambait
[log in to unmask]  [log in to unmask]

Please do not reply off-list

ATOM RSS1 RSS2