Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / Tools / March 2006

Tip: Looking for answers? Try searching our database.

Compression Utilities

Thread view: 
Roedy Green - 27 Feb 2006 07:28 GMT
I have benchmarked a number of compression utilities and have posted
the results at http://mindprod.com/jgloss/compressionutilities.html

The bottom line is 7-zip is the clear champ for both ease of use and
maximum compression. WinZip has the best speed.

Microsoft is preposterously inept.. Their simple uncompressed copy
takes considerably longer than other's compression, and their
compression barely makes a dent in the size. Further their properties
dialog to display file sizes keeps giving the wrong results. It
erroneously caches results for other files in Win2K.


Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Richard F.L.R.Snashall - 27 Feb 2006 08:00 GMT
> I have benchmarked a number of compression utilities and have posted
> the results at http://mindprod.com/jgloss/compressionutilities.html
[quoted text clipped - 3 lines]
>
> Microsoft is preposterously inept.. Their simple uncompressed copy

While I love to hear this of m$, is this a question of I/O time
rather than compression time?  I ran a study a while back using
gzip and compress (on Unix).  In the end, most of the time was used
reading and writing to disk.  How are you removing this time from
the test?

> takes considerably longer than other's compression, and their
> compression barely makes a dent in the size. Further their properties
> dialog to display file sizes keeps giving the wrong results. It
> erroneously caches results for other files in Win2K.
>
>  
Roedy Green - 27 Feb 2006 12:54 GMT
On Mon, 27 Feb 2006 03:00:53 -0500, "Richard F.L.R.Snashall"
<rflrs@notnotrcn.com> wrote, quoted or indirectly quoted someone who
said :

>While I love to hear this of m$, is this a question of I/O time
>rather than compression time?  I ran a study a while back using
>gzip and compress (on Unix).  In the end, most of the time was used
>reading and writing to disk.  How are you removing this time from
>the test?

Why would you remove it?  If you do your i/o in dainty chunks it takes
a lot longer.Managing i/o is part of the skill of writing a good
archiver.

Granted yiou would expect a non-compressing archiver to beat straight
copy because the when you create an archive entry there is no need to
flush it to disk right away the way is traditional in an O/S close.  I
don't think windows though guarantees i/o is physically complete on
close. IIRC is has the option of a delayed write.

I think it was WinRAR that wasted time time by using a single thread.
It purely reads fora while (wasting CPU time), then purely thinks for
a while,( wasting I/O time)

Pack2000 is specially for class files, not even resources, so it
would be  is a bit cruel to hold it up to ridicule on general
compression. It might be reasonable though to test jar.exe or some
simple ZipOutputStream utility.

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Oliver Wong - 27 Feb 2006 16:46 GMT
> On Mon, 27 Feb 2006 03:00:53 -0500, "Richard F.L.R.Snashall"
> <rflrs@notnotrcn.com> wrote, quoted or indirectly quoted someone who
[quoted text clipped - 9 lines]
> a lot longer.Managing i/o is part of the skill of writing a good
> archiver.

   It might be interesting for people who want to develop a file transfer
utility. They would be interested in the speed and ratio of compression, but
not so much on the disk access time, because they won't be writing to disk;
rather, they'll be writing to a socket across a network.

   - Oliver
Roedy Green - 27 Feb 2006 19:02 GMT
>    It might be interesting for people who want to develop a file transfer
>utility. They would be interested in the speed and ratio of compression, but
>not so much on the disk access time, because they won't be writing to disk;
>rather, they'll be writing to a socket across a network.

there is one called BZip2 that comes with source. The install
instructions looked too complicated, so I passed on it.

I think the key will be to develop different sorts of compactors, or
preconditioners for different sorts of file e.g.
class
html
text
jpg
png
gif
au
wav
xml

I think too when you install software ends should get a common
dictionary for the languages the communicate in for the compacting
algorithms to use, and possibly even some aux dictionaries for people
who keep communicating to each other.

For example to compact jars you could extract the strings from classes
and resources and sort them alphabetically and then hand that to the
compressor.

You might also write "chunker" plugins for new file formats to help
the the compressor break the file up into logical places for optimal
searching for repeating strings, or for doing compression by creating
deltas.

See http://mindprod.com/projects/deltacreator.html

The other possibilities is tidying compactors, e.g. that remove
whitepace from HTML or XML and put it back on the other end, but not
necessarily exactly in the same spots.

Similarly a compressor might notice that a gif file only used 200
colours, so it could reduce the colour depth, but only transforms that
did not degrade the image, unless you specifically told it that was
ok.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

neznam neznam - 01 Mar 2006 15:48 GMT
> While I love to hear this of m$, is this a question of I/O time
> rather than compression time?  I ran a study a while back using
> gzip and compress (on Unix).  In the end, most of the time was used
> reading and writing to disk.  How are you removing this time from
> the test?

use RAM disk ;-)
Oliver Wong - 27 Feb 2006 16:59 GMT
>I have benchmarked a number of compression utilities and have posted
> the results at http://mindprod.com/jgloss/compressionutilities.html

   You mention that it's important that you compress to a format that the
recipient can decompress from; have you considered adding to the benchmarks
compressions where you create self-extracting archives? E.g. Winzip and
WinRar offer to compress to a .EXE instead of .ZIP and .RAR files.

   Obviously, the resulting self-extracting archives would probably be
platform specific, but this might make sense if what you were archiving were
the distribution for a platform specific program anyway.

> Microsoft is preposterously inept.. [...] their
> compression barely makes a dent in the size.

   To be fair, the intent of their "compact" utility is to allow the files
to be used without an explicit decompression step, which (AFAIK) none of the
other formats (.ZIP, .RAR, .7z, etc) allow. I believe they use something
really simple like run-length encoding, to allow for fast decompression,
random seeking within the file, and other stuff that one would typically
want to do with the uncompressed contents of a file, that might be
prohibitively expensive or difficult to do with the compression schemes used
by the other formats.

   I have some images of CDs on my harddrive which I mount using a CD drive
emulator. The "useful contents" of the CDs are relatively small (100MB), but
they contain padding files of sizes around 600MB which just contain the byte
0x00 over and over again; the reason for this is to place the useful content
near the outer edge of the CD, thus allowing for faster data reads (because
when the CD spins at a constant angular velocity, the drive can read from
the outer edge faster than the inner edge).

   This trick doesn't do anything for when the CD is stored as an image on
my harddrive though, so the file is 600MB bigger than it needs to be. If I
use the "compact" utility, it does RLE on the padding file to reduce it to
just a few kilobytes, and so the image file is down to a more reasonable
100MB size.

   - Oliver
Roedy Green - 27 Feb 2006 19:15 GMT
>    You mention that it's important that you compress to a format that the
>recipient can decompress from; have you considered adding to the benchmarks
>compressions where you create self-extracting archives?

Running benchmarks is as exciting as watching paint dry.  I have had
my fill for a while with Signum and the compressors.  If you want to
run some, I would be happy to format and post the results at
http://mindprod.com/jgloss/compressionutilities.html

The reason I did that last batch is I was so under the weather I was
not up to watching videos. I needed a task that required only IQ 50.

Self extracting is really a nutty idea, because the whole reason you
do this is to minimise download time.  You end up downloading the code
for the extract over and over.

What should be done instead is an special extension set up for self
extracting files that install software.  When you install the
compression software, it should set up the association. When such a
file arrives and you "execute" it, it behaves just like a conventional
self-extractor without the overhead.

The Jet people have invented a nice self-extracting scheme.  They have
such huge downloads the overhead is negligible for the decompressor.
It has a traditional install dialog with icon and splash and query
where to install. It can also set up associations as a side effect. It
also has a delta scheme. You can create downloads to take you from
version N to N+1 that contain just the differences.  If figures out
what is needed. The big saving is no runtime in the incremental
downloads which is 16 MB.

I wish install people would get clever and stop bundling run times
with the application and install have an the installer sniff around to
see if it already installed, and if not arrange to get it installed
from the run-time vendor's site. The benefit to the vendor would be
they could monitor that people were living up to their run-time
license agreements.

You really should bundle the runtime/JVM only for CD distribution.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Roedy Green - 27 Feb 2006 19:50 GMT
On Mon, 27 Feb 2006 19:15:17 GMT, Roedy Green
<my_email_is_posted_on_my_website@munged.invalid> wrote, quoted or
indirectly quoted someone who said :

>I wish install people would get clever and stop bundling run times
>with the application and install have an the installer sniff around to
>see if it already installed, and if not arrange to get it installed
>from the run-time vendor's site. The benefit to the vendor would be
>they could monitor that people were living up to their run-time
>license agreements.

I go on at length about how an installer should work at
http://mindprod.com/jgloss/installer.html
http://mindprod.com/projects/installer.html
http://mindprod.com/projects/sanitychecker.html
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.