Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / March 2006

Tip: Looking for answers? Try searching our database.

Chunked GZIP processing using Java Sockets

Thread view: 
aztechnology@gmail.com - 23 Mar 2006 16:29 GMT
Hi,

My java client program is reading web sites using low level sockets.
The response from the web site HTML is chunked and gzipped.  I am aware
of the HTTPClient and JRE's HTTPUrlConnection APIs that can handle this
directly, however; I must use the low level socket due to the error
control that I need to implement is not available via HTTPClient/JRE.

Can anyone be kind enough to point me how to read http response that is
chunked and gzipped using java sockets?  Are thre any classes that
provide the ability to coalesce the chunked stream and then deflate the
zipped contents?

Thanks
Chris Smith - 23 Mar 2006 17:29 GMT
> Can anyone be kind enough to point me how to read http response that is
> chunked and gzipped using java sockets?  Are thre any classes that
> provide the ability to coalesce the chunked stream and then deflate the
> zipped contents?

You'll probably have to implement your own class to handle chunking as a
subclass of FilterInputStream, if you can't use a higher-level API like
HttpClient.  Chunking isn't difficult, so this shouldn't take long.  
There is already a GZIPInputStream in java.util.zip.  So you'd do this:

   InputStream base = ...;
   InputStream logical = new GZIPInputStream(
       new MyChunkedInputStream(base);

   ...

Signature

www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation

aztechnology@gmail.com - 23 Mar 2006 19:29 GMT
Makes sense - anyone has chunked input stream logic that I can readily
use?  any pointers there.

I tried using the ones from other libraries and for some reason it
chokes - most likely the HTTP hedaer needs to be stripped before
passing onto the chunkedinputstream class - I do not think these
implementations expect the headers to be in tact (beginning of the
response) - I will need to strip the headers before passing onto the
stream handler.
Roedy Green - 23 Mar 2006 21:59 GMT
>Makes sense - anyone has chunked input stream logic that I can readily
>use?  any pointers there.

see http://mindprod.com/applets/fileio.html
and http://mindprod.com/products1.html#FILETRANSFER
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

aztechnology@gmail.com - 23 Mar 2006 22:28 GMT
HTTP/1.1 200 OK
Cache-Control: private,max-age=0
Date: Thu, 23 Mar 2006 21:26:29 GMT
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Expires: Wed, 01 Jan 1997 12:00:00 GMT
Vary: Accept-Encoding
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Server: Unauthorized-Use-Prohibited

These are the headers I get from the output  - if I try to Jam this
input stream via just Gzip of course it is not going to work, so I need
to take the input stream, wrap it in Gzip, and then wrap it on Chunked,
right?

Any reference for chunked stream reader?  Also, do I need to strip the
HTTP headers (as above) before passing to the stream handlers?

Thanks
aztechnology@gmail.com - 23 Mar 2006 22:31 GMT
HTTP/1.1 200 OK
Cache-Control: private,max-age=0
Date: Thu, 23 Mar 2006 21:26:29 GMT
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Expires: Wed, 01 Jan 1997 12:00:00 GMT
Vary: Accept-Encoding
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Server: Unauthorized-Use-Prohibited

These are the headers I get from the output  - if I try to Jam this
input stream via just Gzip of course it is not going to work, so I need
to take the input stream, wrap it in Gzip, and then wrap it on Chunked,
right?

Any reference for chunked stream reader?  Also, do I need to strip the
HTTP headers (as above) before passing to the stream handlers?  Again I
am using raw sockets

Thanks
Chris Smith - 24 Mar 2006 01:57 GMT
> HTTP/1.1 200 OK
> Cache-Control: private,max-age=0
[quoted text clipped - 12 lines]
> to take the input stream, wrap it in Gzip, and then wrap it on Chunked,
> right?

The other way around.  You need to un-chunk it first, then take that
result and gunzip it.  Yes, you definitely need to remove the HTTP
headers.

Signature

www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation

tom fredriksen - 24 Mar 2006 13:59 GMT
> Hi,
>
[quoted text clipped - 8 lines]
> provide the ability to coalesce the chunked stream and then deflate the
> zipped contents?

Could you give a little more information as to why you need to use the
socket? It seems to me as bad design doing it that way, maybe I am wrong
though. Could you not have it as to separate tasks? Where the socket
operation handles any errors while the API handles all non error
situations, that way you would have automatic dechunking and gzipping,
while at the same time error control.

I am sorry if this sounds like a bad idea, but I dont have any info on
architecture so its difficult to say if there is a better way of doing
it. It just sound to me like what you have now is not good architecture:(

/tom
aztechnology@gmail.com - 25 Mar 2006 03:22 GMT
Sockets give me the control on  handling network problems and server
responses the way I like it to.  I need the total control - so for my
needs I know I cannot use JRE HTTPConnectionURL APIs.

So I guess I need to do this:

InputSptream is = new
unGZipstream(de-chunkstream(socket.getinputstream());

And then I need to find the chunkingstream code.

Also, how do I strip the headers before calling the above line?  Just
advance the stream until the input stream is past the headers?  I can
do readLine() perhaps 13/14 times to get the stream to point to the
dechnked/gzip data in the stream.

Thanks

> > Hi,
> >
[quoted text clipped - 21 lines]
>
> /tom
tom fredriksen - 25 Mar 2006 23:35 GMT
> Sockets give me the control on  handling network problems and server
> responses the way I like it to.  I need the total control - so for my
[quoted text clipped - 11 lines]
> do readLine() perhaps 13/14 times to get the stream to point to the
> dechnked/gzip data in the stream.

The chunking is part of the http rfc and as far as I remember its not
all that difficult to implement, at least the basic principle isn't.
The problem is that you need the headers to be able to to de-chunking
properly, so what you actually need to do is create a sort of filter
applied at the socket level which is activated when you see a chunked
transmission. Perhaps just looking at the code the HTTPConnectionURL
could help you, if its possible you could even copy it. Other than that
I think your best bet is to search the web for a HTTP protocol
implementation in java which you can use some stuff from. The irony is
that you, in effect, are creating an HTTP socket on an HTTP socket.

/tom
aztechnology@gmail.com - 29 Mar 2006 08:26 GMT
This is a special purpose code [known client and server]- so in this
case the HTTP response is always going to be gzip and chunked - so I am
OK to hard code assuming a gzip chunked response.  So, HTTP headers are
not anything meaningful in my case, I know beforehand the response
type.


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.