Java Forum / General / January 2006
Gzip each chunk separately
Lior Knaany - 02 Jan 2006 17:38 GMT Hi all,
I need some help understanding chunked & gzipped data in HTTP/1.1 protocol. Using headers like "Content-Encoding" vs. "Transfer-Encoding". (doing this in order to develop a web server filter)
I noticed that when the server sends a Gzip content in chunks the response headers will be as so :
"Content-Encoding: gzip Transfer-Encoding: Chunked"
The browser waits for all the chunks, concates them together & runs GUnZip on them to get the content.
But why Gzip the entire data before sending ? Is there a way that the server can Gzip the chunk & then send it (doing the same for all the chunks)? Meaning the Gzip will not be on the entire content all together, but for each chunk. This way the browser could read one chunk, GUnZip it, display the result & continue to the next chunk.
If there is a way, what should the response headers look like ? Maybe like this: "Transfer-Encoding: Gzip,Chunked" with no Content-Encoding header?
I have searched "RFC 2616 - Hypertext Transfer Protocol -- HTTP/1.1 " but could not find any meaningful information for this question.
Please help,
Thanks in advance, Lior.
Barry Margolin - 03 Jan 2006 05:42 GMT > But why Gzip the entire data before sending ? Is there a way that the > server can Gzip the chunk & then send it (doing the same for all the [quoted text clipped - 3 lines] > This way the browser could read one chunk, GUnZip it, display the > result & continue to the next chunk. Unless the chunks are really big, you're not going to get very good compression that way. Gzip uses an adaptive compression algorithm, so it gets better as the amount of data increases.
But since gzip is also a stream compression algorithm, it can be done on the fly as each chunk is sent and received.
 Signature Barry Margolin, barmar@alum.mit.edu Arlington, MA *** PLEASE post questions in newsgroups, not directly to me *** *** PLEASE don't copy me on replies, I'll read them in the group ***
Lior Knaany - 03 Jan 2006 11:05 GMT Thanks Barry,
I know that Gzip will work poorly on a smaller content, but can it be done (gzip on each chunk seperatly)? & if so, what should the headers look like ?
Chris Smith - 04 Jan 2006 00:01 GMT > I know that Gzip will work poorly on a smaller content, but can it be > done (gzip on each chunk seperatly)? > & if so, what should the headers look like ? No, it can't be done. (Or rather, if you do it then general-purpose browsers won't understand.)
 Signature www.designacourse.com The Easiest Way To Train Anyone... Anywhere.
Chris Smith - Lead Software Developer/Technical Trainer MindIQ Corporation
Lior Knaany - 05 Jan 2006 17:44 GMT Thanks Chris,
That is exactly what I am experiencing when producing such a page, I just thought, maybe I am doing something wrong with the headers.
Well thanks again for the info Chris.
Michael Wojcik - 06 Jan 2006 15:15 GMT > > I know that Gzip will work poorly on a smaller content, but can it be > > done (gzip on each chunk seperatly)? > > & if so, what should the headers look like ? > > No, it can't be done. (Or rather, if you do it then general-purpose > browsers won't understand.) Though as Barry pointed out, you can achieve essentially the same effect; neither the sender nor the receiver need buffer all the data and compress or decompress it at once, since gzip is a streaming compressor.
There's nothing to stop the server from reading N bytes of the file it's sending, initializing the compressor, compressing those N bytes to M bytes, sending an M-byte chunk, reading the next N bytes, compressing those without reinitializing the compressor, and so forth. The receiver can treat that just as it would a content-body that was compressed in its entirety before chunking. The only difference, as far as the receiver can tell, is that the chunks will probably vary in size if the sender compresses each chunk in turn.
By the same token, the receiver can initialize the decompressor before processing the first chunk, then pass it each chunk as it's received. It needn't buffer the entire compressed content-body.
 Signature Michael Wojcik michael.wojcik@microfocus.com
I gave my love some irises. (She was sick with viruses.) -- Charlie Gibbs
Lior Knaany - 16 Jan 2006 10:16 GMT Thanks Michael,
that was very enlightening
Rogan Dawes - 09 Jan 2006 07:48 GMT >>I know that Gzip will work poorly on a smaller content, but can it be >>done (gzip on each chunk seperatly)? >>& if so, what should the headers look like ? > > No, it can't be done. (Or rather, if you do it then general-purpose > browsers won't understand.) In fact, the gzip algorithm allows for indepently gzipped content to be concatenated, and it will still unzip just fine.
$ echo file 1 > file1 $ echo file 2 > file2 $ gzip file1 file2 $ cat file1.gz file2.gz > file3.gz $ gunzip file3.gz $ cat file3 file 1 file 2 $
So, if you created a gzipped stream by concatenating gzipped output, the browser SHOULD read it as the concatenation of the uncompressed files.
Regards,
Rogan
Chris Smith - 09 Jan 2006 16:49 GMT > $ echo file 1 > file1 > $ echo file 2 > file2 [quoted text clipped - 5 lines] > file 2 > $ Interesting...
 Signature www.designacourse.com The Easiest Way To Train Anyone... Anywhere.
Chris Smith - Lead Software Developer/Technical Trainer MindIQ Corporation
Chris Uppal - 10 Jan 2006 11:13 GMT [irrelevant and/or non-existent x-postings trimmed]
> In fact, the gzip algorithm allows for indepently gzipped content to be > concatenated, and it will still unzip just fine. More accurately, the gzip /program/ will act as you describe. The compressed format itself, the GZIP format as specified in RFC 1952, does naturally concatenate, but only in the sense that a file in that format consists of a number of elements, each of which is an independently compressed "file" (the format even includes an embedded file name!).
It's difficult to state how a browser should interpret a gzip-format stream which consists of several compressed elements. If the browser's decompression is based on the zlib library, then that library does not automatically hide the boundaries between the separate "files" in the stream (and nor should it), so it is quite possible -- even probable -- that the browser would stop decompressing at the end of the first compressed "file" in the stream.
OTOH (reverting to the original poster's question), I don't see any reason why the server cannot send chunked and compressed data, nor any reason (except, perhaps, convenience) why the browser should not decompress such data incrementally. The underlying compression format (shared by "GZIP" and "DEFLATE") is capable of being flushed and/or reset in mid-stream, so the server could flush the compression algorithm at the end of each chunk, and that would be transparent to the browser as it was decompressing it (assuming the use of a library at least as well-designed as zlib).
In point of fact, however, I'm not sure I see any real reason why the server should even bother to flush the compression algorithm -- it could just accumulate compressed data until it had enough for one chunk (possibly leaving some data in the compression code's buffers). Send that as one chunk. The client would decompress in the same incremental way.
-- chris
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|