[rescue] reading old unix disks from Linux

Fri Apr 26 10:46:26 CDT 2019

> I have old SCSI drives I'm trying to read, and I'm running into a
> number of different issues I'd welcome feedback on.

> I've got drives from PCs, Macs, Suns, and DEC machines, and I'm using
> a 32 bit linux box (3.x kernel) to read them all.  One thing I'm
> wondering is if I'd have fewer problems booting off a FreeBSD or
> NetBSD liveCD.

"Maybe."  _I_ would, at least with NetBSD, but that's in large part
because I know the NetBSD tools far better than I know the Linux tools.
(For example, I can't help with your item (1) because I don't know
enough about the Linux partitioning tools.  I suspect you'd be better
off using sdc instead of sdc1, sdc2, etc, but that's a guess based on
fuzzy memory of watching other people using Linux.)

> (2) 
> Mounting is weird. 

> Often, a disk will have overlapping partitions that will mount.  For
> example, on one disk, I have 7 partitions.  Mounting partitions 1 and
> 3 looks the same at the mount point, as do partitions 4 and 7.  (3
> seems to be the entire disk, which is of course normal for old
> unixes).

Or even some not-so-old ones. :-)

> Except with the filesystem mounted via sdc1, some files will throw
> i/o errors if I try to read them, but when mounted via sdc3, the same
> files won't.

That is not surprising.  As I think I saw someone else suggest, I'd
guess that the partitions begin at the same point, but sdc3 is larger,
and the filesystem on the disk extends past the end of sdc1.

> (3)
> Some stuff won't mount. 

> So, I have disks of unknown partitions. Some of them are probably
> swap.  How do I tell?

As far as I can tell, the answer is "you guess".  Some disk
partitioning schemes have types attached to partitions, in which case
the type can be a hint.  But, if the type is unrecognized, or the
partitioning scheme is one that doesn't have types, you have to guess.

I've found that reading the beginning of a partition (with
general-purpose tools, as in dd if=/dev/whatever bs=512 count=64 | less)
will often help educate such guesses.

> Some of them are labeled as SunOS (possibly as old as 3.5) and
> running strings on them that looks believable, but they refuse to
> mount (error is "wrong fs type, bad option, bad superblock on
> /dev/loop0, missing codepage or helper program, or other error") and
> the ufstype options don't help.  What next?

Depends on how far you're willing to go setting up things you otherwise
have little use for.

For example, you could set up a SunOS install to read such disks.  But
that's likely to be a lot of work, especially if you aren't experienced
with Sun stuff and/or don't have Sun hardware to do it on.

> (4) 
> Some stuff is empty. 

> In a few cases, I've mounted partitions and seen only a lost+found
> directory that's empty.  And dated sometime in the 1990s.  But if I
> run strings(1) on the dd files of the raw partitions, I see tons of
> stuff there.  So, am I seeing the remains of deleted files, or is the
> UFS driver buggy or having a poor interaction with the kernel's
> determination of partitions?

Either is possible.  If it really is Berkeley FFS (what Sun calls UFS -
for the purposes of this paragraph the two are the same), deleted
files' contents stay on disk until something else is created which
happens to re-use those data blocks, so fragments of deleted files are
to be expected.  This is not to say that your UFS support isn't
responsible too.

> Is there an undelete tool for antique UFS?

No.  I once looked at it and it is impossible in the case you have and
would be moderately difficult to add.  The major problem is that too
much is destroyed upon deletion.  FFS uses a multi-level tree to find
file data: the inode has a list of so-called "direct" data blocks,
typically about ten of them.  If the file is bigger than can be stored
in that many direct blocks, one data block is allocated and filled with
more (usually from 128 to 16384) data block pointers; this is an
"indirect" data block - a single-indirect data block.  The inode has
one such single-indirect data block pointer.  For files too big for
_that_, each inode also has a double-indirect data block pointer; this
points to a data block which contains block numbers of single-indirect
data blocks.  There is, in the implementations I've seen, also a
triple-indirect data block, but the code I've seen generally has a
comment saying that triple-indirect blocks are untested.  I don't know
whether that's actually true or whether the comment is out of date;
I've certainly worked with files far larger than anything that could
have been stored on a single disk back when those comments were
probably written, sometimes larger than double-indirect can support
with relatively small block/frag sizes.  (This does mean FFS has a hard
maximum file size, but it is very large, even by modern standards.  For
example, a handy FFS filesystem I have has a file size limit of
0x000400400402ffff, just a smidgen over one petabyte.  Another one, set
up with larger frag/block sizes, has a file size limit of
0x04001000400bffff, just over a quarter of an exabyte.)

But, for the moment, what's relevant is that, upon file destruction,
the block pointers in the inode are cleared.  This means that there is
no longer any record of which of the (usually) many unused data blocks
used to belong to that file.  Data kept behind indirect blocks can
often be identified as a single blob, assuming of course that none of
it has been reused since, but there is still no indication which file
it used to belong to.

In principle, the data block pointers could be left untouched in freed
inodes, which could make some form of undelete possible.  As far as I
know, no implementation actually does that, so undeleting on the disks
you have is unlikely to be an option.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse at rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B