[geeks] script language advice

Nadine Miller velociraptor at gmail.com
Fri Feb 1 21:53:09 CST 2008


What language would the collective brain recommend for a script to parse 
lines of up to 7500 chars in length?  I'm leaning towards shell or php 
since I've been doing a lot of tinkering with those of late, and my perl 
is very weak.

I'm trying to sort out duplicate files from 3 computers that I've 
consolidated on one.  The output is from fslint that I ran on the 
command line, since I was afraid the gui would not handled the large 
number of duplicates (>135K lines, which works out to be a lot more 
duplicate files).

My general idea is to split the output into files based on number of 
duplicates, e.g. separate files for those that have 2 duplicates, 3 
duplicates, etc.  I was actually surprised that it only took about 12 
hours to process, given that md5sums were generated for every file.

Aside from the line lengths, the biggest bear is that the filesystems 
are fat32, so there's a lot of unusual characters (rsync choked on "?" 
for example) and spaces in the file paths.

Thanks for any suggestions--
=Nadine=



More information about the geeks mailing list