After Using PhotoRec

English Deutsch Español Français Italiano Română

It may be hard to sort the files recovered by PhotoRec. You can find here some ideas to help you in this process.

Sort files by extension

Using a powershell script under Windows

https://github.com/lconte/Copy-PhotoRecFilesbyExtension.ps1

Using a Python script

You can use this Python script to sort found files by extension.
Save the following code as a file (recovery.py) and then run it with the parameters of 'source' & 'destination'

Example: $ python recovery.py /home/me/recovered_files /home/me/sorted_files

#!/usr/bin/env python
import os
import os.path
import shutil
import sys

source = sys.argv[1]
destination = sys.argv[2]

while not os.path.exists(source):
    source = raw_input('Enter a valid source directory\n')
while not os.path.exists(destination):
    destination = raw_input('Enter a valid destination directory\n')

for root, dirs, files in os.walk(source, topdown=False):
    for file in files:
        extension = os.path.splitext(file)[1][1:].upper()
        destinationPath = os.path.join(destination,extension)
        
        if not os.path.exists(destinationPath):
            os.mkdir(destinationPath)
        if os.path.exists(os.path.join(destinationPath,file)):
            print('WARNING: this file was not copied :' + os.path.join(root,file))
        else:
            shutil.copy2(os.path.join(root,file), destinationPath)

Using a more complex Python script

There are a more extended Python programs like

that do the following things with your recovered data:

Sort all files by file extensions into own folders.
Limit the number of files/folder by creating subfolders if a certain numbers is exceeded. The file/folder number can be customized.
For all jpgs: put them into own folders per year when they have been created (EXIF-Data). Within a year folders for every event are created, e.g. all photos taken at one weekend or vacation are sorted into one folder.

Using a shell script for Mac OS X and Linux

Here is an alternative implementation which "copies" much more quickly (by creating "hard links"):

#!/bin/bash

recup_dir="${1%/}"

[ -d "$recup_dir" ] || {
    echo "Usage: $0 recup_dir";
    echo "Mirror files from recup_dir into recup_dir.by_ext, organized by extension";
    exit 1
};
find "$recup_dir" -type f | while read k; do
    ext="${k##*.}";
    ext_dir="$recup_dir.by_ext/$ext";
    [ -d "$ext_dir" ] || mkdir -p "$ext_dir";
    echo "${k%/*}"
    ln "$k" "$ext_dir";

done

Save it as photorec-sort-by-ext and run

   $ bash photorec-sort-by-ext /home/me/recovered_files

This will create a folder called /home/me/recovered_files.by_ext

If you are only interested in files with a specific extension (e.g. only .jpg) you can use the following *nix command to find all files in the recovered directories and copy them to a new location:

$ find /path/to/recovered/directories -name \*.jpg -exec cp {} /path/to/new/folder/ \;

https://github.com/danthem/PRECsort is another shell script that move files based on extension, remove duplicated files, rename jpg...

JPEG

JPEG file sorting using Exif meta-data. (Archived version)
Canon PowerShot models store their image sequence numbers in the Exif data, so using a program that can dump Exif data to text like jhead, and the following Perl script, you can essentially restore all the JPG files to their original names. --Vees 01:59, 8 January 2007 (CET) - You may have to install jhead first, e.g.: sudo apt install jhead --UlfZibis 16:00, 17 January 2018 (CET)

#!/usr/bin/perl -w
# read optional working directory from the command line:
$dir = (@ARGV > 0) ? $ARGV[0] : '.';
$dir =~ s/\/*$//;    # truncate trailing '/'s

foreach $file (glob "$dir/*") {
    chomp $file;
    open(EXIF, '-|', 'jhead', '-v', $file) or die "Not found jhead $!";
    if (defined(<EXIF>)) {
        foreach $line (<EXIF>) {
            if ($line =~ /Canon maker tag 0008 Value = [1-9]\d\d(\d{1,8})$/) {
                rename($file, "$dir/IMG_$1.JPG");
                print "$dir/IMG_$1.JPG from $file\n";
                last;
            }
        }
        close EXIF or die "EXIF: $! $?";
    }
}

Additionally restore the files into there original folder tree: --UlfZibis 16:10, 17 January 2018 (CET)

#!/usr/bin/perl -w
# read optional working directory from the command line:
$dir = (@ARGV > 0) ? $ARGV[0] : '.';
$dir =~ s/\/*$//;    # truncate trailing '/'s

foreach $file (glob "$dir/*") {
    chomp $file;
    open(EXIF, '-|', 'jhead', '-v', $file) or die "Not found jhead $!";
    if (defined(<EXIF>)) {
        $folder_no = '000';
        $picture_no = '0000';
        $date = '0000';
        foreach $line (<EXIF>) {
            if ($line =~ /Canon maker tag 0008 Value = ([1-9]\d\d)(\d{1,8})$/) {
                $folder_no = $1;
                $picture_no = $2;
            }
            if ($line =~ /Time\s*:\s*\d{4}:(\d\d):(\d\d)\s[\d:]{8}$/) {
                $date = "$1$2";
#                $date = "$2$1";    # if camera language setting = german
                last;
            }
        }
        print "$dir/$folder_no\_$date/IMG_$picture_no.JPG from $file\n";
        mkdir("$dir/$folder_no\_$date");
        rename($file, "$dir/$folder_no\_$date/IMG_$picture_no.JPG");
        close EXIF or die "EXIF: $! $?";
    }
}

Or use this script to list all directories, search for files of a certain size, and place them in a date-based directory:

#!/usr/bin/perl -w
$working_dir = '/home/myhome/';
$result_dir = '/home/myhome/photos/'
$jhead_bin = '/usr/bin/jhead';

@rec_dirs = `ls ${working_dir} | grep recup_dir`;
foreach $recup_dir (@rec_dirs) {
	print "Scanning ${recup_dir}...";
	chomp $recup_dir;
	@photos_in_recup = `find ${working_dir}${recup_dir}/*jpg -type f -size +800k`;
	foreach $photo_file (@photos_in_recup) {
		chomp $photo_file;
#print "IMG $photo_file in $recup_dir\n";
		@exif = `$jhead_bin -v $photo_file`;
#print "$jhead_bin -v $photo_file\n";
		foreach $line (@exif) {
			if ($line =~ /Time\s*:\s*([0-9]{4}):([0-9]{2}):([0-9]{2})\s[0-9:]{8}$/) {
				print "IMG $photo_file $1-$2-$3\n";
				system("mkdir ${result_dir}$1-$2-$3");
#				system("mv $photo_file $result_dir/$1-$2-$3/");
				last;
			}
		}

	}
}

The following command recreates the original directory layout and file names present on the card (for Canon cameras, tested with numerous photos from an EOS 20D), using the file number EXIF info. ExifTool works under both Windows and Linux.

exiftool -r '-FileName<IMG_${FileIndex}%c.%e' DIR

It uses FileIndex from EXIF information in file to rename to original filename, the %c is checking for duplicate names and appends other digit to the name. And it works recursively (-r).

Issue the following command using Exiv2 to rename all JPEGs to their respective date (the program will ask what to do if conflicts occur):

$ exiv2 -t rename *.jpg

When using the above exiv2 renaming and you have multiple thousands of files to rename, some shells might issue an error like "Argument list too long". In that case, use the following workaround:

$ find ./ -exec exiv2 -t rename {}  \;

In those cases, in which the number of files is very large, specifying a default action, e.g. always rename duplicates (-F) seems advisable:

$ find ./ -exec exiv2 -F -t rename {}  \;

Finding duplicate

FSlint Duplicate file finder for Linux (very simple to handle, includes a GUI)
Under Linux or Mac OS X (or with perl and 'sum'), you can find duplicates in a hierarchy using find_dup.
Under Linux or Mac OSX, md5sum can used to find duplicate files (maybe just md5'ing only the first x bytes).

In this example, we check for the first 80k of recup_dir*/*.sib

for file in recup_dir.*/*.sib; do MD5=`dd count=20 bs=4k if="$file" 2> /dev/null|md5sum`; echo "$MD5 $file"; done|sort
1a07198de3486ff2ecab7859612fe7ba  - Box Clever.sib
33105f4a7997b2e2681e404b3ac895f2  - Random, Matching - 2 bars.sib
376e0c53e78e56ba6f2858d9680f8c6b  - 01aIdentifyCommonInst.sib
b0b40a516a1e26660748a0a09cdf3207  - 01ArticulationFlashcards.sib

Each checksum is unique - there are no duplicates.

On Windows you can use the fc utility to find duplicates - the following batch file (does not work on Win9x/ME) might help: --Joey 08:36, 17 July 2008 (CEST)

@echo off
SETLOCAL ENABLEEXTENSIONS ENABLEDELAYEDEXPANSION
SET FILELIST=
FOR %%i IN (*) DO (
	FOR %%j IN (!FILELIST!) DO (
		IF %%~zi EQU %%~zj (
			fc /b "%%~i" "%%~j">NUL && echo "%%~i" = "%%~j"
		)
	)
	SET FILELIST=!FILELIST! "%%~i"
)
ENDLOCAL

On Windows you may add a "/r" (without the quotes) after both "for"s in the above batch file.
On Unix machines, you can use fdupes and the following script to generate a shell script with rm statements to remove all duplicate files:

#!/bin/sh
OUTF='rm-dups.sh'

if [ -e $OUTF ]; then
  echo "File $OUTF already exists."
  exit 1;
fi

echo "#!/bin/sh" > $OUTF
fdupes -r -f . | sed -r 's/(.+)/rm \"&\"/' >> $OUTF
chmod +x $OUTF

MP3, mp4, Ogg vorbis...

Most mp3, mp4 and ogg files have embedded information about Title, Album and Author. To automatically rename the recovered audios and videos using this information, you can use

exiftool -r -ext mp3 "-filename<D:\NewDirPath\${artist;} - ${album;}\${title;}%-c.%e" "D:\recup_dir.1"

Microsoft Office

To read a broken Microsoft Office document (doc/xls/ppt/...) that MS Office could not read, you can try LibreOffice.

LibreOffice is a multiplatform and multilingual office suite and an open-source project. Compatible with all other major office suites, the product is free to download, use, and distribute.

Some Microsoft Office documents (xls/ppt/...) may be recovered with a Word .doc extension - you may need to rename these files.

MS Outlook

To recover a broken Outlook PST file, try Microsoft Scanpst

Messenger Log File

To recover a broken Messenger log file, try Vital. Here is a description of the Messenger Log file format.

After Using PhotoRec

Contents

Sort files by extension

Using a powershell script under Windows

Using a Python script

Using a more complex Python script

Using a shell script for Mac OS X and Linux

JPEG

Finding duplicate

MP3, mp4, Ogg vorbis...

Microsoft Office

MS Outlook

Messenger Log File

Navigation menu

After Using PhotoRec

Sort files by extension

Using a powershell script under Windows

Using a Python script

Using a more complex Python script

Using a shell script for Mac OS X and Linux

JPEG

Finding duplicate

MP3, mp4, Ogg vorbis...

Microsoft Office

MS Outlook

Messenger Log File

Navigation menu

Search