SCIENTIFIC-LINUX-USERS Archives

July 2014

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Konstantin Olchanski <[log in to unmask]>
Reply To:
Konstantin Olchanski <[log in to unmask]>
Date:
Mon, 14 Jul 2014 14:58:38 -0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (40 lines)
On Mon, Jul 14, 2014 at 04:51:03PM -0500, Kevin K wrote:
> On Jul 14, 2014, at 4:37 PM, Konstantin Olchanski <[log in to unmask]> wrote:
> 
> > On Mon, Jul 14, 2014 at 04:33:03PM -0500, Kevin K wrote:
> >> I guess I don't understand the part about how files can be different sizes on different filesystems.
> >> 
> >> They can obviously use up more or less disk space on different filesystems.  For instance, a FAT disk with 32KB clusters will use up a minimum of 32KB even for a 10 byte file.  While NTFS will probably put the 10 bytes in the directory entry or use up a maximum of 4KB for 4KB clusters.
> >> 
> >> But I don't see why rsync would care about the unused data.  It should just sync the 10 bytes accessible.  I'm ignoring alternate streams here.
> > 
> > 
> > This is the usual confusion between the "st_size" and "st_blocks" entries in "struct stat" returned by lstat() and co.
> 
> Is what I was missing is complexities in files that, for example, may be sparse?
> 
> I was thinking of the case that, when you do a ls -l, you normally get a byte size value.  Depending on your options, you can also get block size, which du would also return.
> 
> So, if I'm not going off the deep end, a quick determination of whether a file is different probably has to check both values.  Since it may show 1000000 bytes, but if sparse most of the file may be nulls and therefore no on disk storage allocated to it.  If that changes, on even the same filesystem, something may have changed and data may have to be synced.  And with different cluster sizes, the normal case will be blocks used will be different.


No, this will not work. You cannot rely on the "st_blocks" to compare file contents.

For example, some filesystems implement "tail packing", where contents of multiple files is packed
into a single block. (I think ReiserFS was the first to do this and I have no idea what it returned
as "st_blocks" for tail-packed files).

Anyhow, for tail-packing, different versions of the same filesystems may use different heuristics
on when files are packed or not and how and depending on what.

Not deterministic and not reliable.

Kind of like checking if this is the same person by counting the coins in their pockets.


-- 
Konstantin Olchanski
Data Acquisition Systems: The Bytes Must Flow!
Email: olchansk-at-triumf-dot-ca
Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada

ATOM RSS1 RSS2