Would you be able to boot from USB medium? If yes, then perhaps install a rescue ISO image on to an appropriate sized USB memory stick, boot it and then salvage the information from the OS disk?
I'm going to try that. I'll see if I can get a running system that is as independent from the disk I/O system as possible. Unfortunately, I still really need a disk to write recovered data out to.
Currently, I've put the failing disk behind a USB-IDE adapter, so I can boot from CD correctly, and then see what happens when the failing disk is plugged in later.
How about disconnecting both the data RAID and the dying system disk, install a replacement for the system disk and perform a new OS installation. Once that is complete, reconnect the dying system disk and mount it read-only. Assuming success up to this point, then use any technique (dd conv=noerror) to salvage as much as possible.
The failures that I'm seeing happen while the kernel is trying to identify the disk, and make the partitions available. It seems the drivers do some sanity checking (is the disk really that big?) that interferes with attempts to read the data by user-level programs. It appears my biggest issues come before I can attempt to manipulate the failing disk.
Let me explain ... Yesterday's attempts got somewhere with this, but only so far...
When I plug the failing disk in, the kernel messages show the USB device detection as normal, followed by the storage device identification too. It identifies the disk as a 160GB disk with 6 partitions (two within the extended partition; all old-school MBR format). The disk starts clicking away...
Unfortunately, things don't stay like that for long. After a few USB resets (tens of seconds apart), the identification happens again ... and again ... and again. For a few times. Eventually, the messages start to complain that the partitions go off the end of the device (EOD), and the reported size gets smaller. Eventually, it seems to settle on having 5 partitions and just 8GB, and eventually stops going through this identification process again and again.
Before things have settled, basic commands (like "fdisk -l /dev/sda", or "ddrescue -n /dev/sda /mnt/rescue/sda.img /mnt/rescue/sda.log") often fail because the device has disappeared from under it.
However, once the kernel has settled on the 5-partition, 8 GB device, user commands start to work.
In this state, partitions 1-3 are within the first 8GB; partition 4 is large enough to span the full (original-size) disk, and partition 5 curtails itself at 8GB (which I don't think is the proper endpoint of partition 5). No sign of partition 6.
While I was in this state yesterday, I managed to "ddrescue" the individual partitions for 1-3. I then asked it to rescue the whole of /dev/sda; this pulled the full 8GB off ... and then terminated because that was the size of the device. I couldn't get it to read further down the disk, even by specifying a size manually.
While rescuing this data, I had zero blocks with errors. The disc went through occasional phases of the head clicking away, but it didn't seem to affect extraction.
So ... my main requirement now is to get the disk through the identification process properly.
My intended tricks, right now, are to try plugging it in with different physical orientations. Trying different sides upwards.
Later tricks might involve banging the drive physically in a certain direction, or freezing it. I'm not sure I want to go there just yet...
Ddrescue has worked miracles for me in the past. AFAIK It improves upon simple 'dd' recovery by narrowing down the damaged data to just the few bytes that can't be read, rather than the entire block. It sounds like here, though, maybe the partition table is corrupt?
I love ddrescue too. It is a shame, though, that I never seem to learn enough to never want to use it again!
In this case, I think I'm getting something wrong with the head movement inside the disk, in a way that prevents it from reading the full disk. The Linux disk driver, or the disk itself, sanity-checks the last partitions out of existence.
I'll take a look at testdisk - it's not something I've seen before.