Sunday, April 11, 2010

Automating batch video transcoding

I would like to share a script I wrote to automate batch transcoding of raw video files. I wrote this with DVDs in mind, but I'm sure it could be adapted for other sources.

For a few years now I've been backing up in my DVDs. It all began with my first daughter. We started building a library of movies for her, and quickly learned how much she enjoyed rubbing them on any rough surface, usually rendering them useless. Having paid full prices for re-re-re-releases of Disney and other titles, we were not excited about re-purchasing them. Not to mention we had to keep a variety of movies in our vehicle, as to not make her watch the same movie 10 times in a row, so it would be a shame to lose the pack of original DVDs.

I'm not going to talk about decrypting DVDs. The law is nebulous here, but in the end, I am backing up my own DVDs for reasons listed above. There are several good software solutions to decrypt your DVDs, I use AnyDVD since in the hundreds of DVDs I've used, only one failed. However, MacTheRipper and others for Linux also work.

The idea here is that I ripped all my DVDs and put them in a single sub-directory. Then I run the following script. It traverses all subdirectories, and transcodes them using HandBrake into mkv/mp4 file formats using H.264 and AAC. The end result is a video file that is around 600-900MB and can be streamed to PS3/Xb0x, or played on iPhone and TiVos, and of course by media players like VLC.

I originally wrote this in bash, but found there were times I wanted to run this from Windows or Mac, so I ported it to Python. It will handle any movies which were ripped and contain 1 or more titles. It looks for directories with "video_ts" subdirectories - usually DVD rips.

If only 1 title is found, the file name is the same as the directory name (e.g. Transfromers becomes Transformers.m4v). However, if there are several titles such as the main movie plus several extras, it appends "t#" to it (e.g. Transformers-t1.m4v, Transformers-t2.m4v, t3, t4, etc).

Currently HandBrake is called first to scan the directory, and the # of titles is discovered. Then HandBrake is called to transcode. If you would like to change the transcoder, edit the script, and also change the TRANSCODER_OPTIONS.

At the end, I print some stats such as # of movies transcoded, total time spent scanning, total time transcoding, and average time for each. If you run with --verbose, you will get output from handbrake such as frames per second.

Good luck, and please let me know if this is useful, or what I can change.

Sample usage:
[vinny ~]$ ./ -i videos/input/ -o videos/output/

Removed code from here and created a project at gitorious:

Added the ability to read ISOs instead of directories with --isos
Also added a wiki with some details: