Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / November 2006

Tip: Looking for answers? Try searching our database.

How to parse and manipulate a binary stream

Thread view: 
topcat.nyc@googlemail.com - 28 Nov 2006 20:32 GMT
Apologies in advance if my question is silly or trivial. I'm trying to
write a servlet that reads data from another source in byte[] form and,
having parsed this data stream and made a couple of modifications,
sends the modified data to an appropriate application whereby it can be
rendered in Excel or PDF format.

What I need to find out is how I can parse the incoming data, detect
certain string patterns in the data, and manipulate that information to
generate a new data stream.

TIA,
tc
Mark Jeffcoat - 28 Nov 2006 22:27 GMT
> Apologies in advance if my question is silly or trivial. I'm trying to
> write a servlet that reads data from another source in byte[] form and,
[quoted text clipped - 5 lines]
> certain string patterns in the data, and manipulate that information to
> generate a new data stream.

Not a trivial question, exactly, but a bit on the
vague side. It's difficult for me to tell what you're
having difficultly with.

I'm going to assume that you're stuck getting started.

First, you need to read binary data from a source. That's
exactly the job of an InputStream. It has a read() method
that lets to read directly into a byte array.

To parse that a portion of a byte[] as a String, you can
just use the constructors in the String class.

To write the output, you need an OutputStream. You may want
to subclass it, and write a write() that asks for its next
byte of output from the object responsible for doing the
search-and-replace manipulation.

Signature

Mark Jeffcoat
Austin, TX

topcat.nyc@googlemail.com - 29 Nov 2006 09:10 GMT
> Not a trivial question, exactly, but a bit on the
> vague side. It's difficult for me to tell what you're
> having difficultly with.

Sorry, my question *was* very vaguely worded.

> I'm going to assume that you're stuck getting started.
>
[quoted text clipped - 9 lines]
> byte of output from the object responsible for doing the
> search-and-replace manipulation.

My problem is that the conversion of the input stream into
character/String data doesn't give me anything meaningful - not enough
to parse and manipulate, at any rate. I suppose what I'm wondering is
whether there's any reference material that describes how an encoded
input stream of data (be it for Excel or PDF) can be "translated" into
a String representation in order to do basic String manipulations, and
then re-encoded and passed on to the next application.
Matt Humphrey - 29 Nov 2006 12:42 GMT
<snip>

>> To parse that a portion of a byte[] as a String, you can
>> just use the constructors in the String class.
[quoted text clipped - 11 lines]
> a String representation in order to do basic String manipulations, and
> then re-encoded and passed on to the next application.

Source data like Excel and GIFs don't have any natural string equivalent and
cannot be "parsed" in the sense of parsing strings.  PDF is largely text but
may have some segments in binary--I don't know offhand how the binary parts
work.   To "parse" true binary files you have to know the file structure.
You can go to http://www.wotsit.org/ to get information on file format.

Matt Humphrey matth@ivizNOSPAM.com http://www.iviz.com/
Mark Jeffcoat - 29 Nov 2006 16:03 GMT
> My problem is that the conversion of the input stream into
> character/String data doesn't give me anything meaningful - not enough
[quoted text clipped - 3 lines]
> a String representation in order to do basic String manipulations, and
> then re-encoded and passed on to the next application.

Yeah, okay. I gave you a strategy that will work if you've
got some Strings in an encoding you already understand surrounded
by other miscellaneous bytes that you can ignore; if that's
not the case (which it surely can be, if the binary format
is trying to be clever with how it stores text), you have
a harder problem.

The first thing I'd do is run the Unix program "strings"
(which you can surely find for Windows, if you have to) on
some of the files you're interested in, and see if you're
in the happy case. (It's sounds like you've already done
something like that, but a quick second opinion won't hurt.)

If not, you'll have to handle each format you want to
parse separately. I really like the POI library for handling
Excel documents in Java.

    http://jakarta.apache.org/poi/

There is surely something similar for PDF, but I've
never had the need of it; your Google will be as
good as mine.

Signature

Mark Jeffcoat
Austin, TX

topcat.nyc@googlemail.com - 30 Nov 2006 08:55 GMT
> Yeah, okay. I gave you a strategy that will work if you've
> got some Strings in an encoding you already understand surrounded
> by other miscellaneous bytes that you can ignore; if that's
> not the case (which it surely can be, if the binary format
> is trying to be clever with how it stores text), you have
> a harder problem.

I figured out what the problem with the PDF data was. The binary stream
that I read in gives me PDF data in compressed form, which I discovered
after running a few tests on it. I downloaded a free tool, pdftk, to
help me uncompress the source data stream, perform my text
manipulations, and then recompress the modified data before passing
them on.

Thanks for your help, guys! I really appreciate it.

- tc


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.