After Using PhotoRec

From CGSecurity
Jump to: navigation, search

En.png english version De.png deutsche Version Fr.png version française It.png versione italiana Ro.png versiunea română


It may be hard to sort the files recovered by PhotoRec. You can find here some ideas to help you in this process.

Sort files by extension

  • builtBackwards created an open source standalone executable script for Windows with AutoIt v3 called PhotoRec Sorter.

PhotoRec Sorter is executed from the same directory as the "recup_dir" folders and moves each file into a new folder matching the name of the file extension (in upper case, ex. PDF, DOC, PPT)

You end up with all the recovered files being sorted into folders by file extension.

Download Source and Compiled Executable: PhotoRec Sorter Project Page --BuiltBackwards 02:10, 25 October 2008 (UTC)

  • You can use this Python script to sort found files by extension.
  • Save the following code as a file (recovery.py) and then run it with the parameters of 'source' & 'destination'

Example: $ python recovery.py /home/me/recovered_files /home/me/sorted_files

#!/usr/bin/env python
import os
import os.path
import shutil
import sys

source = sys.argv[1]
destination = sys.argv[2]

while not os.path.exists(source):
    source = raw_input('Enter a valid source directory\n')
while not os.path.exists(destination):
    destination = raw_input('Enter a valid destination directory\n')

for root, dirs, files in os.walk(source, topdown=False):
    for file in files:
        extension = os.path.splitext(file)[1][1:].upper()
	destinationPath = os.path.join(destination,extension)
  	
	if not os.path.exists(destinationPath):
            os.mkdir(destinationPath)
	if os.path.exists(os.path.join(destinationPath,file)):
            print 'WARNING: this file was not copied :' + os.path.join(root,file)
	else:
	    shutil.copy2(os.path.join(root,file), destinationPath)


Mac OS X and Linux Implementation

Here is an alternative implementation which "copies" much more quickly (by creating "hard links"):

#!/bin/bash

recup_dir="${1%/}"

[ -d "$recup_dir" ] || {
    echo "Usage: $0 recup_dir";
    echo "Mirror files from recup_dir into recup_dir.by_ext, organized by extension";
    exit 1
};
find "$recup_dir" -type f | while read k; do
    ext="${k##*.}";
    ext_dir="$recup_dir.by_ext/$ext";
    [ -d "$ext_dir" ] || mkdir -p "$ext_dir";
    echo "${k%/*}"
    ln "$k" "$ext_dir";

done

Save it as photorec-sort-by-ext and run

   $ bash photorec-sort-by-ext /home/me/recovered_files

This will create a folder called /home/me/recovered_files.by_ext


If you are only interested in files with a specific extension (e.g. only .jpg) you can use the following *nix command to find all files in the recovered directories and copy them to a new location:

$ find /path/to/recovered/directories -name \*.jpg -exec cp {} /path/to/new/folder/ \;

JPEG

  • JPEG file sorting using Exif meta-data.
  • Canon PowerShot models store their image sequence numbers in the Exif data, so using a program that can dump Exif data to text like jhead, and the following Perl script, you can essentially restore all the JPG files to their original names. --Vees 01:59, 8 January 2007 (CET)
$working_dir = '.';
$jhead_bin = '/usr/local/bin/jhead';

@recovered_files = `ls $working_dir`;
foreach $file (@recovered_files) {
        chomp $file;
        @exif = `$jhead_bin -v $working_dir/$file`;
        foreach $line (@exif) { 
                if ($line =~ /Canon maker tag 0008 Value = 100(\d{1,8})$/) {
                        system("mv $working_dir/$file $working_dir/IMG_$1.JPG");
                        print "IMG_$1.JPG from $file\n";
                        last;
                }
        }
}

Or use this script to list all directories, search for files of a certain size, and place them in a date-based directory:

$working_dir = '/home/myhome/';
$result_dir = '/home/myhome/photos/'
$jhead_bin = '/usr/bin/jhead';

@rec_dirs = `ls ${working_dir} | grep recup_dir`;
foreach $recup_dir (@rec_dirs) {
	print "Scanning ${recup_dir}...";
	chomp $recup_dir;
	@photos_in_recup = `find ${working_dir}${recup_dir}/*jpg -type f -size +800k`;
	foreach $photo_file (@photos_in_recup) {
		chomp $photo_file;
#print "IMG $photo_file in $recup_dir\n";
		@exif = `$jhead_bin -v $photo_file`;
#print "$jhead_bin -v $photo_file\n";
		foreach $line (@exif) {
			if ($line =~ /Time\s*:\s*([0-9]{4}):([0-9]{2}):([0-9]{2})\s[0-9:]{8}$/) {
				print "IMG $photo_file $1-$2-$3\n";
				system("mkdir ${result_dir}$1-$2-$3");
#				system("mv $photo_file $result_dir/$1-$2-$3/");
				last;
			}
		}

	}
}
  • The following command recreates the original directory layout and file names present on the card (for Canon cameras, tested with numerous photos from an EOS 20D), using the file number EXIF info. ExifTool works under both Windows and Linux.
exiftool -r "-FileName<IMG_${FileIndex}%c.%e" DIR

It uses FileIndex from EXIF information in file to rename to original filename, the %c is checking for duplicate names and appends other digit to the name. And it works recursively (-r).

  • Issue the following command using Exiv2 to rename all JPEGs to their respective date (the program will ask what to do if conflicts occur):
$ exiv2 -t rename *.jpg

Finding duplicate

  • linux.png FSlint Duplicate file finder for Linux (very simple to handle, includes a GUI)
  • linux.png mac.png Under Linux or Mac OS X (or with perl and 'sum'), you can find duplicates in a hierarchy using find_dup.
  • linux.png mac.png Under Linux or Mac OSX, md5sum can used to find duplicate files (maybe just md5'ing only the first x bytes).

In this example, we check for the first 80k of recup_dir*/*.sib

for file in recup_dir.*/*.sib; do MD5=`dd count=20 bs=4k if="$file" 2> /dev/null|md5sum`; echo "$MD5 $file"; done|sort
1a07198de3486ff2ecab7859612fe7ba  - Box Clever.sib
33105f4a7997b2e2681e404b3ac895f2  - Random, Matching - 2 bars.sib
376e0c53e78e56ba6f2858d9680f8c6b  - 01aIdentifyCommonInst.sib
b0b40a516a1e26660748a0a09cdf3207  - 01ArticulationFlashcards.sib

Each checksum is unique - there are no duplicates.

  • win.png On Windows you can use the fc utility to find duplicates - the following batch file (does not work on Win9x/ME) might help: --Joey 08:36, 17 July 2008 (CEST)
@echo off
SETLOCAL ENABLEEXTENSIONS ENABLEDELAYEDEXPANSION
SET FILELIST=
FOR %%i IN (*) DO (
	FOR %%j IN (!FILELIST!) DO (
		IF %%~zi EQU %%~zj (
			fc /b "%%~i" "%%~j">NUL && echo "%%~i" = "%%~j"
		)
	)
	SET FILELIST=!FILELIST! "%%~i"
)
ENDLOCAL
  • On Windows you may add a "/r" (without the quotes) after both "for"s in the above batch file.
  • On Unix machines, you can use fdupes and the following script to generate a shell script with rm statements to remove all duplicate files:
#!/bin/sh
OUTF='rm-dups.sh'

if [ -e $OUTF ]; then
  echo "File $OUTF already exists."
  exit 1;
fi

echo "#!/bin/sh" > $OUTF
fdupes -r -f . | sed -r 's/(.+)/rm \"&\"/' >> $OUTF
chmod +x $OUTF

MP3, mp4, Ogg vorbis...

Most mp3, mp4 and ogg files have embedded information about Title, Album and Author. You can use EasyTag to automatically rename the recovered audios and videos using this information.

MS Office

  • To read a broken MS Office document (doc/xls/ppt/...) that MS Office could not read, you can try OpenOffice. OpenOffice.org is a multiplatform and multilingual office suite and an open-source project. Compatible with all other major office suites, the product is free to download, use, and distribute.
  • Some MS Office documents (xls/ppt/...) may be recovered with a Word .doc extension - you may need to rename these files.

MS Outlook

  • To recover a broken Outlook PST file, try Microsoft Scanpst