[geeks] listing identical files

Phil Stracchino alaric at caerllewys.net
Fri Nov 19 13:29:25 CST 2004


On Fri, Nov 19, 2004 at 02:17:28PM -0500, velociraptor wrote:
> I was thinking about this myself, as my backup "system" (hardly
> applicable) has been duplicating files onto different hard drives, and
> now I have more than just two copies over time.

I need to introduce you to Bacula.  :)

> The only way I can think to do it is by using some kind of
> checksumming (e.g. md5), but I'm not sure if checksumming takes into
> account the names of the files as I've only used it on identically
> named files.  The downside is that it would be very disk and time
> intensive.

md5sum does not take into account filenames or any other file metadata.

> Anyone else with any more useful ideas?

Well, one could write a program which catalogued the sizes of all files,
sorted the files to identify thise of the same size, and then md5summed
(or diffed, or xdelta'd) only those.  However, I think this would
consume more resources overall than the md5sum method.  Certainly it'd
consume gobs of memory.


-- 
  ========== Fight Back!  It may not be just YOUR life at risk. ==========
  alaric at caerllewys.net : phil-stracchino at earthlink.net : phil at novylen.net
   phil stracchino : unix ronin : renaissance man : mystic zen biker geek
     2000 CBR929RR, 1991 VFR750F3 (foully murdered), 1986 VF500F (sold)
           Linux Now!  ...Friends don't let friends use Microsoft.



More information about the geeks mailing list