After Using PhotoRec

From CGSecurity
Jump to: navigation, search

En.png English De.png Deutsch Fr.png Français It.png Italiano Ro.png Română


It may be hard to sort the files recovered by PhotoRec. You can find here some ideas to help you in this process.

Sort files by extension

Using a powershell script under Windows

https://github.com/lconte/Copy-PhotoRecFilesbyExtension.ps1

Using a Python script

  • You can use this Python script to sort found files by extension.
  • Save the following code as a file (recovery.py) and then run it with the parameters of 'source' & 'destination'

Example: $ python recovery.py /home/me/recovered_files /home/me/sorted_files

#!/usr/bin/env python
import os
import os.path
import shutil
import sys

source = sys.argv[1]
destination = sys.argv[2]

while not os.path.exists(source):
    source = raw_input('Enter a valid source directory\n')
while not os.path.exists(destination):
    destination = raw_input('Enter a valid destination directory\n')

for root, dirs, files in os.walk(source, topdown=False):
    for file in files:
        extension = os.path.splitext(file)[1][1:].upper()
	destinationPath = os.path.join(destination,extension)
  	
	if not os.path.exists(destinationPath):
            os.mkdir(destinationPath)
	if os.path.exists(os.path.join(destinationPath,file)):
            print 'WARNING: this file was not copied :' + os.path.join(root,file)
	else:
	    shutil.copy2(os.path.join(root,file), destinationPath)

Using a more complex Python script

There is a more extended Python program sort-PhotorecRecoveredFiles that does the following things with your recovered data:

  • Sort all files by file extensions into own folders.
  • Limit the number of files/folder by creating subfolders if a certain numbers is exceeded. The file/folder number can be customized.
  • For all jpgs: put them into own folders per year when they have been created (EXIF-Data). Within a year folders for every event are created, e.g. all photos taken at one weekend or vacation are sorted into one folder.

Using a shell script for Mac OS X and Linux

Here is an alternative implementation which "copies" much more quickly (by creating "hard links"):

#!/bin/bash

recup_dir="${1%/}"

[ -d "$recup_dir" ] || {
    echo "Usage: $0 recup_dir";
    echo "Mirror files from recup_dir into recup_dir.by_ext, organized by extension";
    exit 1
};
find "$recup_dir" -type f | while read k; do
    ext="${k##*.}";
    ext_dir="$recup_dir.by_ext/$ext";
    [ -d "$ext_dir" ] || mkdir -p "$ext_dir";
    echo "${k%/*}"
    ln "$k" "$ext_dir";

done

Save it as photorec-sort-by-ext and run

   $ bash photorec-sort-by-ext /home/me/recovered_files

This will create a folder called /home/me/recovered_files.by_ext


If you are only interested in files with a specific extension (e.g. only .jpg) you can use the following *nix command to find all files in the recovered directories and copy them to a new location:

$ find /path/to/recovered/directories -name \*.jpg -exec cp {} /path/to/new/folder/ \;

JPEG

  • JPEG file sorting using Exif meta-data.
  • Canon PowerShot models store their image sequence numbers in the Exif data, so using a program that can dump Exif data to text like jhead, and the following Perl script, you can essentially restore all the JPG files to their original names. --Vees 01:59, 8 January 2007 (CET)
$working_dir = '.';
$jhead_bin = '/usr/local/bin/jhead';

@recovered_files = `ls $working_dir`;
foreach $file (@recovered_files) {
        chomp $file;
        @exif = `$jhead_bin -v $working_dir/$file`;
        foreach $line (@exif) { 
                if ($line =~ /Canon maker tag 0008 Value = 100(\d{1,8})$/) {
                        system("mv $working_dir/$file $working_dir/IMG_$1.JPG");
                        print "IMG_$1.JPG from $file\n";
                        last;
                }
        }
}

Or use this script to list all directories, search for files of a certain size, and place them in a date-based directory:

$working_dir = '/home/myhome/';
$result_dir = '/home/myhome/photos/'
$jhead_bin = '/usr/bin/jhead';

@rec_dirs = `ls ${working_dir} | grep recup_dir`;
foreach $recup_dir (@rec_dirs) {
	print "Scanning ${recup_dir}...";
	chomp $recup_dir;
	@photos_in_recup = `find ${working_dir}${recup_dir}/*jpg -type f -size +800k`;
	foreach $photo_file (@photos_in_recup) {
		chomp $photo_file;
#print "IMG $photo_file in $recup_dir\n";
		@exif = `$jhead_bin -v $photo_file`;
#print "$jhead_bin -v $photo_file\n";
		foreach $line (@exif) {
			if ($line =~ /Time\s*:\s*([0-9]{4}):([0-9]{2}):([0-9]{2})\s[0-9:]{8}$/) {
				print "IMG $photo_file $1-$2-$3\n";
				system("mkdir ${result_dir}$1-$2-$3");
#				system("mv $photo_file $result_dir/$1-$2-$3/");
				last;
			}
		}

	}
}
  • The following command recreates the original directory layout and file names present on the card (for Canon cameras, tested with numerous photos from an EOS 20D), using the file number EXIF info. ExifTool works under both Windows and Linux.
exiftool -r '-FileName<IMG_${FileIndex}%c.%e' DIR

It uses FileIndex from EXIF information in file to rename to original filename, the %c is checking for duplicate names and appends other digit to the name. And it works recursively (-r).

  • Issue the following command using Exiv2 to rename all JPEGs to their respective date (the program will ask what to do if conflicts occur):
$ exiv2 -t rename *.jpg
  • When using the above exiv2 renaming and you have multiple thousands of files to rename, some shells might issue an error like "Argument list too long". In that case, use the following workaround:
$ find ./ -exec exiv2 -t rename {}  \;

In those cases, in which the number of files is very large, specifying a default action, e.g. always rename duplicates (-F) seems advisable:

$ find ./ -exec exiv2 -F -t rename {}  \;

Finding duplicate

  • Linux.png FSlint Duplicate file finder for Linux (very simple to handle, includes a GUI)
  • Linux.png Macosx.png Under Linux or Mac OS X (or with perl and 'sum'), you can find duplicates in a hierarchy using find_dup.
  • Linux.png Macosx.png Under Linux or Mac OSX, md5sum can used to find duplicate files (maybe just md5'ing only the first x bytes).

In this example, we check for the first 80k of recup_dir*/*.sib

for file in recup_dir.*/*.sib; do MD5=`dd count=20 bs=4k if="$file" 2> /dev/null|md5sum`; echo "$MD5 $file"; done|sort
1a07198de3486ff2ecab7859612fe7ba  - Box Clever.sib
33105f4a7997b2e2681e404b3ac895f2  - Random, Matching - 2 bars.sib
376e0c53e78e56ba6f2858d9680f8c6b  - 01aIdentifyCommonInst.sib
b0b40a516a1e26660748a0a09cdf3207  - 01ArticulationFlashcards.sib

Each checksum is unique - there are no duplicates.

  • Win.png On Windows you can use the fc utility to find duplicates - the following batch file (does not work on Win9x/ME) might help: --Joey 08:36, 17 July 2008 (CEST)
@echo off
SETLOCAL ENABLEEXTENSIONS ENABLEDELAYEDEXPANSION
SET FILELIST=
FOR %%i IN (*) DO (
	FOR %%j IN (!FILELIST!) DO (
		IF %%~zi EQU %%~zj (
			fc /b "%%~i" "%%~j">NUL && echo "%%~i" = "%%~j"
		)
	)
	SET FILELIST=!FILELIST! "%%~i"
)
ENDLOCAL
  • On Windows you may add a "/r" (without the quotes) after both "for"s in the above batch file.
  • On Unix machines, you can use fdupes and the following script to generate a shell script with rm statements to remove all duplicate files:
#!/bin/sh
OUTF='rm-dups.sh'

if [ -e $OUTF ]; then
  echo "File $OUTF already exists."
  exit 1;
fi

echo "#!/bin/sh" > $OUTF
fdupes -r -f . | sed -r 's/(.+)/rm \"&\"/' >> $OUTF
chmod +x $OUTF

MP3, mp4, Ogg vorbis...

Most mp3, mp4 and ogg files have embedded information about Title, Album and Author. To automatically rename the recovered audios and videos using this information, you can use

exiftool -r -ext mp3 "-filename<D:\NewDirPath\${artist;} - ${album;}\${title;}%-c.%e" "D:\recup_dir.1"

Microsoft Office

  • To read a broken Microsoft Office document (doc/xls/ppt/...) that MS Office could not read, you can try LibreOffice.

Linux.png Win.png Macosx.png LibreOffice is a multiplatform and multilingual office suite and an open-source project. Compatible with all other major office suites, the product is free to download, use, and distribute.

  • Some Microsoft Office documents (xls/ppt/...) may be recovered with a Word .doc extension - you may need to rename these files.

MS Outlook

  • Win.png To recover a broken Outlook PST file, try Microsoft Scanpst

Messenger Log File