Pruning Canon EOS image folders

I have had my Canon EOS 20D since early 2005. The first few weeks I set the camera to save images in JPEG only. After a while I switched over to saving images in both JPEG and RAW and have been doing so up until this year. Up until last year I had been using Windows XP and Windows Vista and having JPEGs around made it easier to look at the photos. However, about a year ago I switched over to Mac and am now using Aperture 2 for my photo cataloging needs. There, the presence of both JPEGs and raw images is nothing but annoying.

Photos with both RAW and JPG files

To avoid the problem with both formats in Aperture I want to import the raw images where available and JPEGs otherwise. But I can’t just remove the JPEG files on a folder level because some images are only available as JPEGs. And with literally tens of thousands of images I just didn’t want to do it manually.

The attached Perl script solves the issue. It takes a source and a target folder as arguments. It then goes through the source directory hierarchy and copies all the image files to the target – but skipping files that are available both as RAW and JPEG. In that case it will pick the RAW file. It uses embedded EXIF tags (the time the photo was taken plus the serial number of the image) to judge if two images are the same. Further, it retains the folder structure but removes certain folders to flatten the target folder structure – I had originally put the RAW files one folder down so that they wouldn’t interfere with the JPEGs when viewing them in Vista’s image viewer.

Please note that I can only vouch that this works on CR2 files and JPEG files from a Canon EOS 20D as that is the only thing I have tested it with. It should be simple to adapt it for other cameras. Also note that the script does not test whether the target folder is empty. You are advised to test the program on some files that you don’t mind losing before you apply it to your entire image library.

I called the script photo_prune, despite the fact that it doesn’t actually prune the source data. To avoid data loss it instead copies the data to a new location.

Download script