Saturday, August 31, 2019

linux - Maxiumum recovery of data from old floppy discs with padded bad sectors and multiple passes


I've a collection of old 3.5" floppy disks that I'm looking to recover as much data as possible from.


The issue is due to the structure of some of the files, I need the length of all the files to be maintained meaning any bad sectors should be padded (the tl;dr reason why is some files are Acorn ADFS files where data and code are combined. The code references the data as an offset from the start of the file. Reading ADFS format isn't an issue in linux, the padding of bad sectors is).


The discs haven't been read in 25 years so I'm expecting unpredictable reading, regular bad sectors and potentially rendering the disks unreadable - which I don't mind as long as data recovery is maximised.


To do this I'm expecting multiple passes to be needed to read as much as possible.


dd


I looked at dd, with this command promising for a first run:


dd if=/dev/fd0 of=adfs.img conv=noerror,sync

Followed by subsequent calls of


dd if=/dev/fd0 of=adfs.img conv=noerror,notrunc

Where:


noerror means errors are ignored


sync means bad sectors are padded with the null character


notrunc means the (already existing) output file is not truncated when dd is called.


However, as I read the man page and this explanation of notrunc, despite notrunc being set, dd overwrites the output each time resulting in output that still only represents what was read with the last pass. Any sectors previously read correctly but now bad eg due to degrading old floppy disk, will be overwritten with null.


So dd doesn't look suitable.


ddrescue


ddrescue looks promising in that it can be used with multiple passes as long as a logfile is used to record what was successfully written and then referred to when the next pass is done.


The first pass to only read non-error blocks


ddrescue -d -p --no-split /dev/fd0 output.img log/output.logfile

for the first pass and subsequent passes of to fill in the errors


ddrescue -d -r3 /dev/fd0 output.img log/output.logfile

Where


-d direct disk access. Ignore system cache


--no-split or -n do not try to split or retry failed blocks ie only read good blocks. This is to only get good data in the first run to avoid the disk failing while trying to recover bad blocks.


-p preallocate preallocates disk space before recovery ie the output file will be the same size as input file/device


-r3 retry bad sectors 3 times (used on 2nd pass onward)


but the gotcha is ddrescue seems not to pad bad sectors as they occur. With -p set, it seems to result in all the padding at the end of the file, not maintaining the offsets of data from the start of the file as required.


This seems to be the case as ddrescue is written to try hard to save disk space so bad sectors are truncated, then added to if the bad sectors are successfully read in subsequent passes. Setting -p just creates an output file the same size as the input file to save the space, not to pad the data. The contents of the output with -p set or unset will therefore be identical ie not padded until the end of the file.


Question


So my question is a three parter


1) is it correct that ddrescue does NOT pad the recovered file even with -p set?


2) Is there any way to get it to pad? In my internet searching, I read a comment (will find it again and add) that the log file created by ddrescue could be used by a script to pad the relevant places. Any idea how?


and


3) do you know any better command / program / script to do what I'm trying to do - maximum data recovery via multiple passes of reading a corrupt disk with padding of bad sectors?


I'm using Ubuntu 18.04, dd version is (coreutils) 8.28 and GNU ddrescue version is 1.22, both from the Ubuntu repositories.


Thanks as always for any help


Answer



GNU ddrescue is the right tool for the recovery you are attempting.





1) is it correct that ddrescue does NOT pad the recovered file even with -p set?



This presumption is incorrect. You might be confused by the manual saying this:



Ddrescue does not write zeros to the output when it finds bad sectors in the input, and does not truncate the output file if not asked to.



It just means that zeros aren't written in place of bad sectors. Whatever should be at those bad sectors just isn't filled in. If you're writing to a blank disk or a file, the unwritten areas at the destination will read back as zeros (null bytes).


Also, the -p/--preallocate option does not have anything to do with "padding". It means "preallocate". On supported file systems, the option ensures that you have enough disk space on the destination to store the source disk.





2) Is there any way to get it to pad?



The file output by GNU ddrescue is logically the same layout as the source disk. A block read from the source goes in the same position in the destination. You could even reverse the whole recovery (-R/--reverse) and the blocks will be filled in backwards, still in the right places.





3) do you know any better command / program / script to do what I'm trying to do - maximum data recovery via multiple passes of reading a corrupt disk with padding of bad sectors?



GNU ddrescue does exactly what you want. It can do multiple passes (-r/--retry-passes=n), and the desired "padding" of bad sectors is the default behavior of GNU ddrescue. From the manual:



If the output file is a regular file created by ddrescue, the areas marked as bad-sector will contain zeros.



To be perfectly clear and to address your concern that a successful read followed by a bad read would become "padded" with null, ddrescue will not try to re-read a successful read―there is no need to because the data has already been recovered. The mapfile is how ddrescue knows what it already recovered and failed to recover.


No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...