Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / September 2007

Tip: Looking for answers? Try searching our database.

Detecting mime types

Thread view: 
smcardle@smcardle.com - 20 Sep 2007 11:14 GMT
Hi All,

I have recently been working on a project where we needed to detect
the mime type of files. This in itself is not to hard when you
consider the available choice of libraries that have a good guess at
the mime type based on the file extension.

However, in my case we had a CMS that stored images (of any type) into
its repository and for some unknown reason renamed them all with
a .img extension. As the images consisted of ICON, JPEG, GIF and SVG
mime types we needed another way to detect mime types over and above
extension matching (if you can map the extension as certain files such
as Make files don't have an extension).

On unix the OS has a utility called file which makes a good guess at
the mime type of a file. Mime type detection is not bullet proof but
under normal conditions it should be pretty close. Anyway, the file
command uses a couple of text files containing certain rules that are
available on all flavors of UNIX. These files are called magic and
magic.mime.

So having looked at other solutions I didn't find one that really
fitted my needs, so I wrote a new one called mime-util and I have put
this little useful utility on sourceforge at the project location of
http://sourceforge.net/projects/mime-util if anybody is interested.

This little utility uses two methods to detect the mime type of a
file. First it try's to match the file extension and if found will
report the registered list of mime types for that extension. This
method can be modified by by placing a mime properties file on your
classpath thus allowing you to override any of my mappings and even
add new mappings for extensions I did not add to the internal property
file. Secondly, if it is unable to determine the mime type from the
file extension i.e. it is not registered or the file has no extension
it will use the parsed version of the unix magic.mime file (on windows
it will use the internally supplied copy). This file contains rules
that enable various magic numbers to be located at known offsets into
files and then reporting the first match. Again, you can provide your
own version of this file on the classpath as well allowing changes to
the existing matches or even adding new matches without actually
changing the unix magic.mime file itself.

You can use the methods to force only the second match i.e. magic
number matching on all files if you want but the intension is to
provide a fast utility that does a best effort guess. In my tests I
have been able to achieve a 100% match on a wide range of files using
the more expensive magic number matching and over 90% match using a
mixture of both extension and magic number matching. For me this was
sufficient and did not require me to make to many changes or additions
to compensate for the existing magic.mime file. I did however create
an extension to the rules which allow a fuzzy match to occur i.e. I
think this information should be somewhere within the first 1K of data
in this file.

Anyway, If anybody wants to use it its there under an Apache 2.0
license i.e. FREE for ALL and if you have any requests for changes,
additions etc please feel free to comment
Roedy Green - 23 Sep 2007 08:57 GMT
>I have recently been working on a project where we needed to detect
>the mime type of files. This in itself is not to hard when you
>consider the available choice of libraries that have a good guess at
>the mime type based on the file extension.

see http://mindprod.com/jgloss/mime.html
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.