ExtractMHT

Last Modified: Mon, 03/05/2007 - 02:33

Introduction

ExtractMHT is a fairly simple program that can be used to extract individual files out of MHTML files.  MHTML groups multiple individual files into a single .mht file.  This is most often used to save a web page to a single file for offline viewing, and is supported by Opera, Internet Explorer, and various other Microsoft applications.  While MHTML files can be convenient, there are rather supported outside of Windows, and even when using Windows one may be more interested in a single component of the MHTML file rather than the entire archive.  ExtractMHT solves this problem by allowing users to extract all components out of an MHTML file and save them as individual files.

ExtractMHT began as a request to add support for MHTML files to Universal Extractor.  I had planned on simply incorporating an existing program into Universal Extractor to handle MHTML support, but I was unable to find any freely redistributable programs for Windows that was capable of doing this.  I was about to give up on adding support, but after doing a bit more research into the actual structure of MHTML files I realized that it would be possible to write an extractor myself.  Thus, ExtractMHT was born.

ExtractMHT has since been incorporated into Universal Extractor, but it is available here as well as a standalone binary.  You can use it if you need to extract the contents of an MHTML file, but don't want or need to install Universal Extractor to do the job.

ExtractMHT, like most of my Windows programs, is written in AutoIt, a free and powerful open source scripting language.

Return to top

Screenshots

ExtractMHT Application
ExtractMHT file/destination GUI

Download  Current Version: 1.0, Released: 09/10/2006

ExtractMHT Binary Archive (281.96 KB) - This archive contains the ExtractMHT executable, as well as all source code.

ChangeLog - ExtractMHT development details

Return to top

Installation and Usage

ExtractMHT does not include an installer.  To use, simply download the archive, extract the files to your computer, and double-click on ExtractMHT.exe to launch the ExtractMHT GUI.  Enter (or use the file browser to select) the file you wish to extract and the destination directory, then click OK.

ExtractMHT also supports command line usage.  Please run ExtractMHT.exe /help to view usage instructions.

Return to top

Technical Details

When a file is passed to ExtractMHT, it begins by checking the file to ensure it's a valid MHTML file.  ExtractMHT then begins processing each of the parts contained in the file.  Each part will be written to an individual file in the specified output directory.

ExtractMHT will attempt to use the original filenames as described in the MHTML file, but it will ensure that each file is given a unique filename to prevent any files from being overwritten.  It may also be impossible in some circumstances to determine the original filename.  In this event, the file will be named "unknown".

Return to top

Known Limitations

ExtractMHT does not perform any analysis or rewriting of URLs to make links point to the extracted files rather than the original file.  Eg, if the Microsoft home page is saved as an MHT file and then extracted, the resulting index.html will still reference images on html://www.microsoft.com/... rather than the local copies.

I currently do not plan on implementing this capability, as the primary purpose of ExtractMHT is to simply extract the nested files from MHTML files.

Return to top

Credits

ExtractMHT would not exist without the following contributions from the Free Software community:

  • AutoIt (Jonathan Bennett, Open Source) - General-purpose Windows scripting language; used to write ExtractMHT
  • Crystal SVG (Everaldo Coelho, Free) - Collection of extremely high-quality icons for Linux/KDE; used as the source graphics for the ExtractMHT icon
  • GIMP (Spencer Kimball and Peter Mattis, Open Source) - The GNU Image Manipulation Program; used to create the icons used by ExtractMHT

Additionally, ExtractMHT uses the following code to perform Base64 decoding: http://www.autoitscript.com/forum/index.php?showtopic=21399&view=findpost&p=148460

Return to top