[geeks] script language advice

Nadine Miller velociraptor at gmail.com
Tue Feb 5 17:27:12 CST 2008


Jonathan C. Patschke wrote:
> On Sat, 2 Feb 2008, Nadine Miller wrote:
> 
>> num of dupes * filesize  /path/to/file/filename /path/to/file/filename2
>> /path/to/filename3 [...]
> 
> Ah, the perl code you'd want is something like:
> 
>    while (<>) {                              # Snag a line from stdin
>          chomp;                              # Trim the line-ending
>          my @components = split(/\s/);       # Split into a list at spaces
>          my $dupcount   = shift @components; # Remove the first field.
>          my $trash1     = shift @components; # Remove the second field.
>          my $trash2     = shift @components; # Remove the third field.
>          my $fileToKeep = shift @components; # Remove the fourth field.
> 
>          unlink @components;                 # Delete everything else.
>    }
> 
> If you wanted to test this, you could replace the last two lines with:
> 
>    print "Keeping $fileToKeep, nuking: ";
>    foreach my $filename (@components) { print "$filename," }
>    print "\n";

Thanks for the educational code, Jonathan.  I really need to force 
myself to sit down and do some serious work in Perl.

In this case, though, these file paths are not sane.  Since the files 
are on Fat32, there's a lot of spaces in the paths.

All of the partitions are mounted under the same sub-directory on the 
Xubuntu box, so had I continued with the scripting, I would have split 
the lines at that sub-directory "prefix" since it would be easy to add 
the prefix back.

I copied the files from each of the WinXP computer's individual 
partitions into a sub-dir on one external drive.  For some reason, my 
dad felt the need to partition his HD's into a bunch of smaller 
partitions, so I had C:, D:, E:, etc. all the way up to K: on one of the 
computers.

So now it looks like:
   /subdir/computer1/partition1
   /subdir/computer1/partition2
   etc.
   /subdir/computer2/partition1
   /subdir/computer2/partition2
   etc.

All mounted on a box were I shuffled data to make room for a Xubuntu 
install.

I decided not to build the script after some testing of fslint with a 
couple of smaller filesets with no recursion; it seemed to DTRT and 
performed reasonably well.  Being conservative, I am running it with 
recursion on each sub-directory I created for the individual fat32 file 
systems.  When those are complete, I'll re-run it over the top 
sub-directory with recursion to get down to a single copy across all 
file systems.  I feel a little sketchy trusting it, but to be honest, 
I'd feel even sketchier trusting my own code. :-/

I haven't re-installed any of the original computers, so I can fall back 
to the original data if this blows up.  I am fairly certain that most of 
the data is also backed up on removable media, too, but that would be a 
bear to deal with unless absolutely necessary.  I can tell my dad was 
starting to doubt his own reliability, as I've found many, many 
duplicates of things like digital photos and similar.

Bottom line, my dad was a pack rat, and unfortunately didn't segregate 
"important" data (e.g. pictures, financial info, personal IP, contact 
info, login info) from "interesting" but non-essential info (music 
collection, info downloaded for later reading, etc).  Hopefully this 
will prod some other pack rats I know here to think about what might 
happen at home with their computers if they were seriously injured/ 
passed away and a relative had to deal with the data.

I have already started a "home wiki" so that my husband and I can manage 
all our website business related info.  I think I'll be extending that 
to cover some other things that we both should have access to, as well 
as thinking about how to re-organize the rest of my data to make it 
clear what is important and what is not.  This experience also 
reinforces my support of open formats.

Food for thought--

=Nadine=



More information about the geeks mailing list