I am finishing up a little tool to create a Google Site Map. a list of
all your files in XML, their last updates, how frequently you update
them, and how important they are.
Google prefers the whole file be Gziped.
Is there a way plug together the GZIP and OutputStreamWriter so that
you compress on the fly?
It seems I may need to create the file, then read it back as bytes and
create the gzip, or use two passes with a ByteArrayOutputStream.

Signature
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.
JScoobyCed - 05 Jan 2006 01:28 GMT
> I am finishing up a little tool to create a Google Site Map. a list of
> all your files in XML, their last updates, how frequently you update
[quoted text clipped - 7 lines]
> It seems I may need to create the file, then read it back as bytes and
> create the gzip, or use two passes with a ByteArrayOutputStream.
yes there is: GZIPOutputStream. But if you do so, I doubt you want to
use an OutputStreamWriter.
<code>
public OutputStream toGzipOutputStream(OutputStream os)
throws IOException, NullPointerException {
return new GZIPOutputStream(os);
}
</code>
If you really want the OutputStreamWriter:
<code>
OutputStreamWriter osw = null;
try {
osw = new OutputStreamWriter(toGzipOutputStream(outStream));
} catch(IOException ioe) {
// TODO
} catch(NullPointerException npe) {
// TODO
}
</code>

Signature
JSC
John C. Bollinger - 05 Jan 2006 03:47 GMT
>> Is there a way plug together the GZIP and OutputStreamWriter so that
>> you compress on the fly?
[For creating a GZIPped XML file.]
> yes there is: GZIPOutputStream. But if you do so, I doubt you want to
> use an OutputStreamWriter.
Why do you doubt that? Roedy has character data that he wants to
deliver to a byte stream. OutputStreamWriter is the bridge between
character data and binary streams. Why /wouldn't/ he want to use one?

Signature
John Bollinger
jobollin@indiana.edu
JScoobyCed - 05 Jan 2006 03:51 GMT
>> yes there is: GZIPOutputStream. But if you do so, I doubt you want to
>> use an OutputStreamWriter.
>
> Why do you doubt that? Roedy has character data that he wants to
> deliver to a byte stream. OutputStreamWriter is the bridge between
> character data and binary streams. Why /wouldn't/ he want to use one?
Yes, I missed the point here :) I inverted the way the data are pulled
and considered the Writer would get bytes as input data (from the
Gzipped stream). But in fact data is gzipped after conversion.
Still under New Year's party influence maybe :)

Signature
JSC
Roedy Green - 05 Jan 2006 01:51 GMT
On Thu, 05 Jan 2006 01:07:35 GMT, Roedy Green
<my_email_is_posted_on_my_website@munged.invalid> wrote, quoted or
indirectly quoted someone who said :
>Is there a way plug together the GZIP and OutputStreamWriter so that
>you compress on the fly?
It pretty obvious.
FileOutputStream fos = new FileOutputStream( new File( webRoot,
"googlesitemap.gz" ) );
GZIPOutputStream gzos = new GZIPOutputStream( fos, 10 * 1024 );
OutputStreamWriter eosw = new OutputStreamWriter( gzos, "UTF-8" );
I think my problem was getting into a headset of applying the gzip
last by thinking of the creating on the FileOutputSTream and
OutputStreamWriter as if there were an atomic pair.
When you start combinining layers like this, I wonder what rules of
thumb there are for where you put the buffering.

Signature
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.
John C. Bollinger - 05 Jan 2006 03:43 GMT
> FileOutputStream fos = new FileOutputStream( new File( webRoot,
> "googlesitemap.gz" ) );
> GZIPOutputStream gzos = new GZIPOutputStream( fos, 10 * 1024 );
>
> OutputStreamWriter eosw = new OutputStreamWriter( gzos, "UTF-8" );
[...]
> When you start combinining layers like this, I wonder what rules of
> thumb there are for where you put the buffering.
The usual rule of thumb is to put buffering as close as possible to the
external device. In this case that would mean inserting a
BufferedOutputStream between the GZIPOutputStream and the FileOutputStream.

Signature
John Bollinger
jobollin@indiana.edu
Chris Uppal - 05 Jan 2006 11:24 GMT
> The usual rule of thumb is to put buffering as close as possible to the
> external device. In this case that would mean inserting a
> BufferedOutputStream between the GZIPOutputStream and the
> FileOutputStream.
In this case the rule of thumb might be misleading. GZIPOutputStream does a
fair bit of buffering of the compressed output (in the underlying zlib
implementation), so omitting the buffering around the file will do much less
damage than would normally be the case. On the other hand, the cost of writing
a single byte/character to a GZIPOutputStream may be higher than is usual for a
stream which is not connected to an external device. If each write() results
in crossing the JNI barrier into zlib (even if the supplied data is just copied
into zlib's internal buffers, as would typically be the case), then the writes
will take a performance hit.
I admit that I've never tried to measure the various overheads in this
situation, but there is at least a chance that putting buffering around
GZIPOutputStream would bring greater benefits than putting it around the raw
file. (In practise, I would put the buffering around the file only, but that's
only on a "well, at the worst it won't be /too/ far off optimal, and I can
always tune it later if I feel the need" basis.)
-- chris