Advertisement
Guest User

mkvstrip

a guest
Aug 28th, 2014
362
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 20.45 KB | None | 0 0
  1. #!/usr/bin/env python
  2.  
  3. # Welcome to mkvstrip.py.  This script can go through a folder looking for extraneous audio and subtitle tracks and removes them.  Additionally you can choose to overwrite the title field of the mkv.
  4.  
  5. # Version = 0.9 (8/21/2014)
  6. # The latest version can always be found at https://github.com/cyberjock/mkvstrip
  7.  
  8. # This python script has the following requirements:
  9. # 1.  Mkvtoolnix installed: 7.0.0 and 7.1.0 were tested (should work with recent prior versions like 6.x and 5.x though)
  10. # 2.  Python installed: 2.7.8_2 in FreeBSD/FreeNAS jail and 2.7.8 in Windows (should work with the entire 2.7.x series though)
  11.  
  12. # Note that this script will remux the entire directory provided from the command line or the below variable.  If you point this to a large amount of files this could potentially keep the storage location near saturation speed until it completes.  In tests this could render the storage media for your movies or tv shows so busy that it becomes nearly unresponsive until completion of this script (I was able to remux at over 1GB/sec in tests).  As this process is not CPU intensive your limiting factor is likely to be the throughput speed of the storage location of the video files being checked/remuxed.  For example, remuxing a 15GB movie file I had took 9 minutes over 1Gb LAN but just 32 seconds locally from a FreeNAS jail.
  13.  
  14. # Keep in mind that because of how remuxing works this won't affect the quality of the included video/audio streams.
  15.  
  16. # Using this over a file share is ***STRONGLY*** discouraged as this could take considerable time (days/weeks?) to complete due to the throughput bottleneck of network speeds.  
  17.  
  18. # Use this script at your own risk (or reward).  Unknown bugs could result in this script eating your files.  There are a few "seatbelts" to ensure that nothing too undesirable happens.  For example, if only one audio track exists and it doesn't match your provided languages an ERROR is logged and the video file is skipped.  I tested this extensively and I've used this for my collection, but there is no guarantee that bugs don't exist for you.  ALWAYS TEST A SMALL SAMPLE OF COPIES OF YOUR FILES TO ENSURE THE EXPECTED RESULT IS OBTAINED BEFORE UNLEASHING THIS SCRIPT ON YOUR MOVIES OR TV SHOWS BLINDLY.
  19.  
  20. # Some default variables are provided below but all can be overwritten by command line parameters.  If the default variables are commented out then you MUST pass the appropriate info from the command line.
  21.  
  22. # A remux should only occur if a change needs to be made to the file.  If no change is required then the file isn't remuxed.
  23.  
  24. # For help with the command line parameters use the -h parameter.
  25.  
  26. import os, re, sys, atexit, subprocess
  27. from datetime import datetime
  28. from StringIO import StringIO
  29. from argparse import ArgumentParser
  30.  
  31. # Location for mkvmerge executable binary.
  32. # Note that the location always uses the / versus the \ as appropriate for some OSes.
  33. # Windows is usually set to something like C:/Program Files (x86)/MkvToolNix/mkvmerge.exe
  34. # For a FreeNAS jail (and FreeBSD) this usually something like /usr/local/bin/mkvmerge
  35. MKVMERGE_BIN = '/usr/local/bin/mkvmerge'
  36. MKVINFO_BIN = '/usr/local/bin/mkvinfo'
  37. MKVPROPEDIT_BIN = '/usr/local/bin/mkvpropedit'
  38.  
  39. # Log errors to file.  Log file will be in the same directory as mkvstrip.py and will include the year, month, day and time that mkvstrip is invoked.
  40. LOG = True
  41.  
  42. # Directory to process.
  43. # Note that the location always uses the / versus the \ for location despite what the OS uses (*cough* Windows).
  44. # Windows is usually something like C:/Movies
  45. # FreeNAS jails (and FreeBSD) should be something like /mnt/tank/Movies or similar.
  46. # DIR = os.path.dirname(os.path.realpath(__file__))
  47. # DIR = '/mnt/tank/Entertainment/Movies'
  48.  
  49. # The below parameter lets mkvstrip go through the motions of what it would do but won't actually change any files.  This allows you to review the logs and ensure that everything in the log is what you'd actually like to do before actually doing it. (see bug list)
  50. DRY_RUN = False
  51.  
  52. # PRESERVE_TIMESTAMP keeps the timestamps of the old file if set.  This prevents you from having an entire library that has a date/time stamp of today.  Recommended to be enabled.
  53. # Note that if Plex has already inventoried your files it may or may not like this setting and may or may not like you remuxing your entire library suddenly. I recommend you stop Plex and then do an analysis of your library afterwards.
  54. PRESERVE_TIMESTAMP = True
  55.  
  56. # List of audio languages to retain.  Default is English (eng) and undetermined (und).  'und' (undetermined) is always recommended in case audio tracks aren't identified.
  57. AUDIO_LANG = [ 'eng', 'und' ]
  58.  
  59. # List of subtitle languages to retain. Default is English (eng) and undetermined (und).  'und' (undetermined) is always recommended in case subtitle tracks aren't identified.
  60. SUBTITLE_LANG = [ 'eng', 'und' ]
  61.  
  62. # Log files that have no subtitles in the languages chosen.  This is to allow you to be informed of videos that are missing subtitles so you can take action as necessary.  A WARNING message is logged if at least one subtitle language track isn't found.
  63. LOG_MISSING_SUBTITLE = True
  64.  
  65. # Rewrite the title field of mkv files to include the immediate parent directory.  If set to true it will rename the title field of the MKV to be the in the format of "(parent directory) - (name of video file without .mkv extension)" as this is the most common organization of TV shows for Plex.  This setting is mutually exclusive of RENAME_MOVIE.
  66. # Note: If RENAME_TV is set and your files *only* need a title change that a remux will still be triggered.  So use with caution.  (See bug list)
  67. RENAME_TV = False
  68.  
  69. # Rewrite the title field of mkv files to include the video file name without the .mkv extension.  This setting is mutually exclusive of RENAME_TV.
  70. # Note: If RENAME_MOVIE is set and your files *only* need a title change that a remux will still be triggered.  So use with caution. (see bug list)
  71. RENAME_MOVIE = True
  72.  
  73. # Known bugs and limitations are avaible from http://github.com/cyberjock/mkvstrip
  74.  
  75. for i in [
  76.         'MKVMERGE_BIN',
  77.         'MKVINFO_BIN',
  78.         'MKVPROPEDIT_BIN',
  79.         'LOG',
  80.         'DIR',
  81.         'DRY_RUN',
  82.         'PRESERVE_TIMESTAMP',
  83.         'AUDIO_LANG',
  84.         'SUBTITLE_LANG',
  85.         'LOG_MISSING_SUBTITLE',
  86.         'RENAME_TV',
  87.         'RENAME_MOVIE' ]:
  88.     if i not in globals():
  89.         raise RuntimeError('%s configuration variable is required.' % (i))
  90.  
  91. class Logger(object):
  92.     _files = dict()
  93.  
  94.     @staticmethod
  95.     def init(*args):
  96.         for path in args:
  97.             if path not in Logger._files:
  98.                 Logger._files[path] = open(path, 'w')
  99.                 Logger.write('Log file opened at', path)
  100.                 Logger.write('--')
  101.  
  102.     @staticmethod
  103.     def write(*args, **kwargs):
  104.         for k in kwargs:
  105.             if k not in [ 'stderr', 'indent' ]:
  106.                 raise TypeError('write() got an unexpected keyword argument \'%s\'' % (k))
  107.  
  108.         ts = datetime.now().strftime('[ %Y-%m-%d %I:%M:%S %p ] ')
  109.         msg = ' '.join(str(i) for i in args)
  110.  
  111.         # Build list of files to log to
  112.         files = list()
  113.         if 'stderr' in kwargs:
  114.             files.append(sys.stderr)
  115.         else:
  116.             files.append(sys.stdout)
  117.         files += Logger._files.values()
  118.  
  119.         for f in files:
  120.             print >> f, ts,
  121.             if 'indent' in kwargs:
  122.                 Logger._indent(kwargs['indent'], f)
  123.             print >> f, msg
  124.  
  125.     @staticmethod
  126.     def destroy():
  127.         Logger.write('--')
  128.         Logger.write('Finished processing.')
  129.         for f in Logger._files.values():
  130.             f.close()
  131.  
  132.     @staticmethod
  133.     def _indent(x, file):
  134.         for i in range(x):
  135.             print >> file, ' ',
  136.  
  137. class Track(object):
  138.     def __init__(self, id, name, language, default=None):
  139.         super(Track, self).__init__()
  140.         self.id = int(id)
  141.         self.name = name
  142.         self.language = language
  143.         self.default = default
  144.  
  145.     def __str__(self):
  146.         return 'Track #%i (%s): %s' % (self.id, self.language, self.name)
  147.  
  148. class VideoTrack(Track):
  149.     pass
  150.  
  151. class AudioTrack(Track):
  152.     pass
  153.  
  154. class SubtitleTrack(Track):
  155.     pass
  156.  
  157. def stringifyLanguages(tracks, filtered_tracks=None):
  158.     if filtered_tracks is None:
  159.         t = set([ i.language for i in tracks ])
  160.     else:
  161.         t = set([ i.language for i in tracks if i not in filtered_tracks ])
  162.  
  163.     if t:
  164.         return sorted(t)
  165.     return str(None)
  166.  
  167. atexit.register(Logger.destroy)
  168.  
  169. parser = ArgumentParser(description='Strips unnecessary tracks from MKV files.')
  170. group = parser.add_mutually_exclusive_group()
  171. group.add_argument('-l', '--log', default=None, action='store_true', help='Log to file in addition to STDOUT and STDERR.')
  172. group.add_argument('--no-log', default=None, action='store_false', dest='log')
  173. parser.add_argument('-d', '--dir', default=DIR)
  174. group = parser.add_mutually_exclusive_group()
  175. group.add_argument('-y', '--dry-run', default=None, action='store_true')
  176. group.add_argument('--no-dry-run', default=None, action='store_false', dest='dry_run')
  177. group = parser.add_mutually_exclusive_group()
  178. group.add_argument('-p', '--preserve-timestamp', default=None, action='store_true')
  179. group.add_argument('--no-preserve-timestamp', default=None, action='store_false', dest='preserve_timestamp')
  180. parser.add_argument('-a', '--audio-language', action='append', help='Audio languages to retain. May be specified multiple times.')
  181. parser.add_argument('-s', '--subtitle-language', action='append', help='Subtitle languages to retain. May be specified multiple times.')
  182. group = parser.add_mutually_exclusive_group()
  183. group.add_argument('-m', '--log-subtitle', default=None, action='store_true', help='Log if stripped file doesn\'t have a subtitle track.')
  184. group.add_argument('--no-log-subtitle', default=None, action='store_false', dest='log_subtitle')
  185. rename = parser.add_mutually_exclusive_group()
  186. rename.add_argument('-r', '--rename-tv', default=None, action='store_true', help='Rename video track names to include immediate parent directory.')
  187. rename.add_argument('-e', '--rename-movie', default=None, action='store_true', help='Use the filename to rename the video track names.')
  188. parser.add_argument('--no-rename-tv', default=None, action='store_false', dest='rename_tv')
  189. parser.add_argument('--no-rename-movie', default=None, action='store_false', dest='rename_movie')
  190. parser.add_argument('-b', '--mkvmerge-bin', default=MKVMERGE_BIN, help='Path to mkvmerge binary.')
  191. parser.add_argument('-i', '--mkvinfo-bin', default=MKVINFO_BIN, help='Path to mkvinfo binary.')
  192. parser.add_argument('-t', '--mkvpropedit-bin', default=MKVPROPEDIT_BIN, help='Path to mkvpropedit binary.')
  193. args = parser.parse_args()
  194.  
  195. LOG = LOG if args.log is None else args.log
  196. if LOG:
  197.     LOG_DIR = os.path.dirname(os.path.realpath(__file__))
  198.     LOG_FILE = datetime.now().strftime('log_%Y%m%d-%H%M%S.log')
  199.     LOG_PATH = os.path.abspath(os.path.join(LOG_DIR, LOG_FILE))
  200.     Logger.init(LOG_PATH)
  201.  
  202. Logger.write('Running', os.path.basename(__file__), 'with configuration:')
  203.  
  204. MKVMERGE_BIN = os.path.abspath(args.mkvmerge_bin)
  205. Logger.write('MKVMERGE_BIN =', MKVMERGE_BIN)
  206.  
  207. MKVINFO_BIN = os.path.abspath(args.mkvinfo_bin)
  208. Logger.write('MKVINFO_BIN =', MKVINFO_BIN)
  209.  
  210. MKVPROPEDIT_BIN = os.path.abspath(args.mkvpropedit_bin)
  211. Logger.write('MKVPROPEDIT_BIN =', MKVPROPEDIT_BIN)
  212.  
  213. DIR = os.path.abspath(args.dir)
  214. Logger.write('DIR =', DIR)
  215.  
  216. DRY_RUN = DRY_RUN if args.dry_run is None else args.dry_run
  217. Logger.write('DRY_RUN =', DRY_RUN)
  218.  
  219. PRESERVE_TIMESTAMP = PRESERVE_TIMESTAMP if args.preserve_timestamp is None else args.preserve_timestamp
  220. Logger.write('PRESERVE_TIMESTAMP =', PRESERVE_TIMESTAMP)
  221.  
  222. AUDIO_LANG = AUDIO_LANG if args.audio_language is None else args.audio_language
  223. Logger.write('AUDIO_LANG =', AUDIO_LANG)
  224. if len(AUDIO_LANG) == 0:
  225.     raise RuntimeError('At least one audio language to retain must be specified.')
  226.  
  227. # We don't need to check for subtitles to retain since some people might choose to retain nothing at all.
  228. SUBTITLE_LANG = SUBTITLE_LANG if args.subtitle_language is None else args.subtitle_language
  229. Logger.write('SUBTITLE_LANG =', SUBTITLE_LANG)
  230.  
  231. LOG_MISSING_SUBTITLE = LOG_MISSING_SUBTITLE if args.log_subtitle is None else args.log_subtitle
  232. Logger.write('LOG_MISSING_SUBTITLE =', LOG_MISSING_SUBTITLE)
  233.  
  234. RENAME_TV = RENAME_TV if args.rename_tv is None else args.rename_tv
  235. Logger.write('RENAME_TV =', RENAME_TV)
  236.  
  237. RENAME_MOVIE = RENAME_MOVIE if args.rename_movie is None else args.rename_movie
  238. Logger.write('RENAME_MOVIE =', RENAME_MOVIE)
  239.  
  240. if RENAME_TV is True and RENAME_MOVIE is True:
  241.     raise RuntimeError('Setting RENAME_TV = True and RENAME_MOVIE = True at the same time is not allowed.')
  242.  
  243. TITLE_RE = re.compile(r'Title: (.+?)\n')
  244. NAME_RE = re.compile(r'^(.+)\.mkv$', flags=re.IGNORECASE)
  245. VIDEO_RE = re.compile(r'^Track ID (?P<id>\d+): video \([\w/\.-]+\) [number:\d+ uid:\d+ codec_id:[\w/]+ codec_private_length:\d+ codec_private_data:[a-f\d]+ language:(?P<language>[a-z]{3})(?: track_name:(?P<name>.+))? pixel_dimensions')
  246. AUDIO_RE = re.compile(r'^Track ID (?P<id>\d+): audio \([\w/]+\) [number:\d+ uid:\d+ codec_id:[\w/]+ codec_private_length:\d+ language:(?P<language>[a-z]{3})(?: track_name:(?P<name>.+))? default_track:(?P<default>[01]{1})')
  247. SUBTITLE_RE = re.compile(r'^Track ID (?P<id>\d+): subtitles \([\w/]+\) [number:\d+ uid:\d+ codec_id:[\w/]+ codec_private_length:\d+ language:(?P<language>[a-z]{3})(?: track_name:(?P<name>.+))? default_track:(?P<default>[01]{1}) forced_track:([01]{1})')
  248.  
  249. processList = list()
  250. if os.path.isfile(DIR) is True:
  251.     processList.append(DIR)
  252. else:
  253.     # Walk through the directory and sort by filename
  254.     unsortedList = list()
  255.     for dirpath, dirnames, filenames in os.walk(DIR):
  256.         mkvFilenames = [filename for filename in filenames if filename.lower().endswith('.mkv')]
  257.         mkvFilenames.sort()
  258.         unsortedList.append((dirpath, mkvFilenames))
  259.    
  260.     # Now sort by Directory and append to processList
  261.     unsortedList.sort(key=lambda dirTuple: dirTuple[0])
  262.     for dirpath, filenames in unsortedList:
  263.         for filename in filenames:
  264.             processList.append(os.path.join(dirpath, filename))
  265.  
  266. totalMKVs = len(processList)
  267. Logger.write('Starting processing of %s Videos' % totalMKVs)
  268. counter = 0
  269.  
  270. for path in processList:
  271.     counter += 1
  272.     Logger.write('============')
  273.  
  274.     # Attempt to identify file
  275.     Logger.write('Identifying video (%s/%s)' % (counter,totalMKVs), path)
  276.     cmd = [ MKVMERGE_BIN, '--identify-verbose', path ]
  277.     try:
  278.         result = subprocess.check_output(cmd)
  279.     except subprocess.CalledProcessError:
  280.         Logger.write('Failed to identify', path, stderr=True)
  281.         continue
  282.  
  283.     if DRY_RUN is False and RENAME_TV is True:
  284.         drive, tail = os.path.splitdrive(path)
  285.         parent = os.path.split(os.path.dirname(tail))[-1]
  286.         name = NAME_RE.match(os.path.basename(tail)).group(1)
  287.         title = TITLE_RE.findall(subprocess.check_output([MKVINFO_BIN, path]))
  288.         if not title or not title[0].strip() == name:
  289.             modifyCMD = [MKVPROPEDIT_BIN, path, "--set", "title=%s" % name]
  290.             Logger.write("Rename title of mkv to %s" % name)
  291.             try: subprocess.check_output(modifyCMD)
  292.             except subprocess.CalledProcessError as e:
  293.                 Logger.write('Modifying of', path, 'failed!', stderr=True)
  294.                 Logger.write(e.cmd, stderr=True)
  295.                 Logger.write(e.output, stderr=True)
  296.    
  297.     elif DRY_RUN is False and RENAME_MOVIE is True:
  298.         name = NAME_RE.match(os.path.basename(path)).group(1)
  299.         title = TITLE_RE.findall(subprocess.check_output([MKVINFO_BIN, path]))
  300.         if not title or not title[0].strip() == name:
  301.             modifyCMD = [MKVPROPEDIT_BIN, path, "--set", "title=%s" % name]
  302.             Logger.write("Rename title of mkv to %s" % name)
  303.             try: subprocess.check_output(modifyCMD)
  304.             except subprocess.CalledProcessError as e:
  305.                 Logger.write('Modifying of', path, 'failed!', stderr=True)
  306.                 Logger.write(e.cmd, stderr=True)
  307.                 Logger.write(e.output, stderr=True)
  308.  
  309.     # Find video, audio, and subtitle tracks
  310.     video = list()
  311.     audio = list()
  312.     subtitles = list()
  313.     Logger.write('Searching for video, audio, and subtitle tracks...')
  314.     for line in StringIO(result):
  315.         matches = AUDIO_RE.match(line)
  316.         if matches is not None:
  317.             audio.append(AudioTrack(**matches.groupdict()))
  318.             continue
  319.  
  320.         matches = SUBTITLE_RE.match(line)
  321.         if matches is not None:
  322.             subtitles.append(SubtitleTrack(**matches.groupdict()))
  323.             continue
  324.  
  325.         matches = VIDEO_RE.match(line)
  326.         if matches is not None:
  327.             video.append(VideoTrack(**matches.groupdict()))
  328.             continue
  329.  
  330.     Logger.write('Found video track(s):')
  331.     for i in video:
  332.         Logger.write(i, indent=4)
  333.     Logger.write('Found audio track(s):')
  334.     for i in audio:
  335.         Logger.write(i, indent=4)
  336.     Logger.write('Found subtitle track(s):')
  337.     for i in subtitles:
  338.         Logger.write(i, indent=4)
  339.  
  340.     # Filter out tracks that don't match languages specified.
  341.     Logger.write('Filtering audio track(s)...')
  342.     audio_lang = filter(lambda x: x.language in AUDIO_LANG, audio)
  343.     Logger.write('Removing audio languages(s):', stringifyLanguages(audio, audio_lang))
  344.     Logger.write('Retaining audio language(s):', stringifyLanguages(audio_lang))
  345.  
  346.     # Skip files that don't have the specified language audio tracks
  347.     if len(audio_lang) == 0:
  348.         Logger.write('ERROR: No audio tracks matching specified language(s) for', path, '... Skipping.', stderr=True)
  349.         continue
  350.  
  351.     Logger.write('Filtering subtitle track(s)...')
  352.     subtitles_lang = filter(lambda x: x.language in SUBTITLE_LANG, subtitles)
  353.     Logger.write('Removing subtitle languages(s):', stringifyLanguages(subtitles, subtitles_lang))
  354.     Logger.write('Retaining subtitle language(s):', stringifyLanguages(subtitles))
  355.  
  356.     # Log that the file doesn't have the specified language subtitle tracks
  357.     if len(subtitles_lang) == 0 and LOG_MISSING_SUBTITLE is True:
  358.         Logger.write('WARNING: No subtitle tracks matching specified language(s) for', path, stderr=True)
  359.  
  360.     # Print tracks to retain
  361.     Logger.write('Number of audio tracks retained:', len(audio_lang))
  362.     Logger.write('Remuxing with the following audio track(s):')
  363.     for i in audio_lang:
  364.         Logger.write(i, indent=4)
  365.     Logger.write('Number of subtitle tracks retained:', len(subtitles_lang))
  366.     Logger.write('Remuxing with the following subtitle track(s):')
  367.     for i in subtitles_lang:
  368.         Logger.write(i, indent=4)
  369.  
  370.     # Skip files that don't need processing
  371.     if len(audio) == len(audio_lang) and len(subtitles) == len(subtitles_lang):
  372.         Logger.write('Nothing to do for', path)
  373.         continue
  374.  
  375.     # Build command
  376.     cmd = [ MKVMERGE_BIN, '--output' ]
  377.  
  378.     target = path + '.tmp'
  379.     cmd.append(target)
  380.  
  381.     if len(audio_lang) > 0:
  382.         cmd += [ '--audio-tracks', ','.join([ str(i.id) for i in audio_lang ]) ]
  383.         for i in range(len(audio_lang)):
  384.             cmd += [ '--default-track', ':'.join([ str(audio_lang[i].id), '0' if i else '1' ]) ]
  385.  
  386.     if len(subtitles_lang) > 0:
  387.         cmd+= [ '--subtitle-tracks', ','.join([ str(i.id) for i in subtitles_lang ]) ]
  388.         for i in range(len(subtitles_lang)):
  389.             cmd += [ '--default-track', ':'.join([ str(subtitles_lang[i].id), '0']) ]
  390.  
  391.     cmd.append(path)
  392.  
  393.     # Attempt to process file
  394.     Logger.write('Processing %s...' % (path))
  395.     if DRY_RUN is False:
  396.         try:
  397.             result = subprocess.check_output(cmd)
  398.         except subprocess.CalledProcessError as e:
  399.             Logger.write('Remux of', path, 'failed!', stderr=True)
  400.             Logger.write(e.cmd, stderr=True)
  401.             Logger.write(e.output, stderr=True)
  402.             continue
  403.         else:
  404.             Logger.write('Remux of', path, 'successful.')
  405.  
  406.     # Preserve timestamp
  407.     if PRESERVE_TIMESTAMP is True:
  408.         Logger.write('Preserving timestamp of', path)
  409.         if DRY_RUN is False:
  410.             stat = os.stat(path)
  411.             os.utime(target, (stat.st_atime, stat.st_mtime))
  412.  
  413.     # Overwrite original file
  414.     if DRY_RUN is False:
  415.         try:
  416.             os.unlink(path)
  417.         except:
  418.             os.unlink(target)
  419.             Logger.write('Renaming of', target, 'to', path, 'failed!', stderr=True)
  420.         else:
  421.             os.rename(target, path)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement