Basics
There are two kinds of data-recovery programs:
- Those that look at the file-system for clues on files to recover
- Those that use signatures
File-system Programs
The former kind look at the file-system on the disk for information about files and folders to recover. They may look at the FAT/MFT, but usually when a file is lost, that information is lost and so these programs will often examine the clusters on the disk to find what look like directories. Then they will examine the directory entries to identify files that are marked as deleted. If the indicated files have not been overwritten, then there should be enough data in the directory (name, starting cluster, and size) to recover the file. The same goes for subdirectories: there should be enough information (name, starting cluster) to identify the subdirectory and repeat the aforementioned process for files in there.
There are two major drawback for this method:
- This only works if the files and folders are contiguous. That is, if they are fragmented, then the program can only recover the start of the file/folder up until the end of the contiguous chain. It has no way of finding the next cluster in the chain.
- It has no way of knowing which files were recently deleted and to be recovered, so it just lists everything it finds that is marked as deleted. As such, it could contain files/folders that were deleted a long time ago and have thus long been overwritten.
Signature Programs
The latter type of data-recovery program ignores the file-system altogether and instead looks for file types. It will usually contain a list of file signatures (e.g., headers, magic numbers, etc.) which are typical of different kinds of files. It then scans the disk, looking for these patterns of bytes, and whenever it finds one, it adds that cluster, and a number of subsequent clusters as a file, then displays the list of files.
This method has a different set of drawbacks:
- It has no information about the file’s name, location, date, or even size because it finds them according to the raw file contents instead of file-system data.
- Because it has no information about the names, it will give them a contrived name such as
file0001
,file0002
, etc. - Because it has no information about the location, it will dump them all in a single, giant folder (though some may sort them according to file-type).
- Because it has no information about the file’s size, it will round it out to the nearest cluster, thus adding anywhere between 511 to 65,535 bytes of junk to the end.
- It may find old files that were deleted a long time ago, or even files that are almost identical but different versions. For example, if you were working on a file and saved it numerous times after each batch of modifications, then you may end up finding many of those changes as separate files and would then have to figure out which is the most recent version.
- Like the file-system method, it requires files to be contiguous because it has no information about cluster chains.
Comparison
The two methods have their pros and cons, and which you use is up to you because it will depend on your disk and files. You are probably best served by using at least one program of each type to maximize your success. That way, you will find the most files by recovering both the most content and meta-data (filenames, etc.) You may need to do some comparisons and manual work to copy the correct meta-data to the correct content, but it is your best bet at maximum recovery. On the down-side, using multiple programs will result in a lot of clutter and false-positives that need to be sorted through, so it is up to you to decide the value of the lost files.
Application
Why do recovery programs often recover files without the original names/structure?
For instance, I use PhotoRec, I think it is a very good application for recovery purposes but everything it finds it recovers without their names and structure.
Why is that?
Because PhotoRec is a signature-type of data-recovery program.
I understand the 'source' for the names is the Master File Table where it stores names/structures/attributes etc. Is this true? For true recovery, with structure and names, is it necessary to have the MFT intact?
Yes and yes.
Recovering deleted files is often more successful with NTFS than with FAT for the reasons you described, however PhotoRec is a signature-style program, so it does not refer to the MFT for meta-data about files/folders.
Moreover, unless you immediately stop using the disk containing the deleted files and perform the recovery right away, there is a chance that the clusters containing the files and/or the MFT entries referring to them may be recycled and overwritten. (Windows has a nasty and quite baffling habit of mysteriously overwriting deleted files immediately. I have on many occasions gone so far as to press the Reset button—to avoid Windows overwriting during shutdown—after accidentally deleting a file, and still found that the file in question was overwritten.)