LISTSERV - SCIENTIFIC-LINUX-USERS Archives

SCIENTIFIC-LINUX-USERS Archives

July 2014

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

	LISTSERV Archives
	SCIENTIFIC-LINUX-USERS Home
	SCIENTIFIC-LINUX-USERS July 2014

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: How do you speed up rsync?
From:	Kevin K <[log in to unmask]>
Reply To:	Kevin K <[log in to unmask]>
Date:	Mon, 14 Jul 2014 16:51:03 -0500
Content-Type:	text/plain
Parts/Attachments:	text/plain (17 lines)

On Jul 14, 2014, at 4:37 PM, Konstantin Olchanski <[log in to unmask]> wrote:

> On Mon, Jul 14, 2014 at 04:33:03PM -0500, Kevin K wrote:
>> I guess I don't understand the part about how files can be different sizes on different filesystems.
>> 
>> They can obviously use up more or less disk space on different filesystems.  For instance, a FAT disk with 32KB clusters will use up a minimum of 32KB even for a 10 byte file.  While NTFS will probably put the 10 bytes in the directory entry or use up a maximum of 4KB for 4KB clusters.
>> 
>> But I don't see why rsync would care about the unused data.  It should just sync the 10 bytes accessible.  I'm ignoring alternate streams here.
> 
> 
> This is the usual confusion between the "st_size" and "st_blocks" entries in "struct stat" returned by lstat() and co.

Is what I was missing is complexities in files that, for example, may be sparse?

I was thinking of the case that, when you do a ls -l, you normally get a byte size value.  Depending on your options, you can also get block size, which du would also return.

So, if I'm not going off the deep end, a quick determination of whether a file is different probably has to check both values.  Since it may show 1000000 bytes, but if sparse most of the file may be nulls and therefore no on disk storage allocated to it.  If that changes, on even the same filesystem, something may have changed and data may have to be synced.  And with different cluster sizes, the normal case will be blocks used will be different.

ATOM RSS1 RSS2

LISTSERV.FNAL.GOV