Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / April 2006

Tip: Looking for answers? Try searching our database.

recommendation for dealing with legacy data

Thread view: 
Jeff Kish - 27 Apr 2006 21:55 GMT
Greetings.

I am not too awfully advanced when it comes to java programming, but I have
done a fair amount of c/c++.

I have some legacy data files which are fixed length binary.
I'd like to figure out the best way to read the files from a java program that
may be running on any variety of platforms, and subsequently process the data.

The data fields in each record in the file may have '\n', nulls or any other
data.

Can someone recommend the best way, or even a good way to go about reading
this file and taking the 1st n bytes and processing it etc. I'll need to
find/recognize/skip '\n' etc.

The datafiles will always have come from a Wintel machine, using probably the
default western character set whatever the heck that is.

Any gotchas/watch out fors/etc would be appreciated.

Thanks
Jeff Kish
ducnbyu@aol.com - 27 Apr 2006 22:18 GMT
The DataInputStream class seems appropriate.
Thomas Weidenfeller - 28 Apr 2006 08:56 GMT
> I have some legacy data files

Your description does not match:

(a)

> which are fixed length binary.

(b)

>  I'll need to
> find/recognize/skip '\n' etc.

(c)

> using probably the
> default western character set

(a) says you have fixed-length records, (b) suggests you have
variable-length records. What is it?

If you have binary fixed-length records, then RandomeAccess file might
be a good start for accessing and skipping records.

If you have binary variable-length records, a FileInputStream plus a
BufferedInputStream are a good start. Depending on the individual data
encoding, a DataInputStream might help, too. Or you need to program the
decoding by hand.

(a) says you have binaries, in (b) and (c) you seem to talk about text
files. What is it?

If you have indeed a text file, then FileReader and a BufferedReader are
a good start.

> Any gotchas/watch out fors/etc would be appreciated.

Clarify your requirements.

/Thomas
Signature

The comp.lang.java.gui FAQ:
ftp://ftp.cs.uu.nl/pub/NEWS.ANSWERS/computer-lang/java/gui/faq
http://www.uni-giessen.de/faq/archiv/computer-lang.java.gui.faq/

Jeff Kish - 28 Apr 2006 13:50 GMT
>> I have some legacy data files
>
[quoted text clipped - 36 lines]
>
>/Thomas
It's a really nasty fixed length binary with some bytes corresponding to text
data.
But it is binary, and it is fixed length and some sections of each record have
text data.

Sorry it was badly designed by someone about 13 years ago who didn't know what
they were doing (and who shall go un-named ;> ) )

regards
Jeff Kish
Chris Uppal - 28 Apr 2006 09:54 GMT
> Any gotchas/watch out fors/etc would be appreciated.

Take some time out to get your head properly around the difference between
textual information and binary data.  As a C (or C++) programmer, you have
probably spent your life thus far without having to consider the difference
between the two (assuming you use char, or unisgned char, for both).  In Java
the two are (correctly) not conflated, and you will /have/ to be aware at all
times which you are dealing with.

A recent thread, entitled "Strings and bindary data", (this will probably
wrap):
http://groups.google.co.uk/group/comp.lang.java.programmer/browse_frm/thread/1df
0f881855c6d8f

might be an effective starting point.  It contains an over-long post from
myself on the topic, plus -- probably more helpful -- several links to further
info.

Beyond that, it's simple as long as you keep your head straight.  Represent
binary data as binary (byte[] arrays, or ints, etc), and textual data as text
(Stings, char[] arrays, etc).  If the input is binary read it using a
ReadStream (or one of its variants), if it is textual use a Reader (or one of
the variants).  You can't easily mix the two, so if the data is actually mixed,
then read it as binary, treat it as binary while you identify the textual
subsequences, and then convert them to some suitable textual represention (for
which you will need to understand the basics of charsets/character
encodings/code-pages).

DataInputStream is occasionally useful, but not -- in my experience -- very
often.

   -- chris
Oliver Wong - 28 Apr 2006 17:56 GMT
> Greetings.
>
[quoted text clipped - 21 lines]
>
> Any gotchas/watch out fors/etc would be appreciated.

   There are too many "ors" and "etcs." here for me to make much sense of
the nature of your files. If it's arbitrary binary data, what's wrong with
using java.io.FileInputStream to turn your file into a stream of bytes, and
working from there?

   I don't understand what you mean by "I'll need to find/recognize/skip
'\n' etc." Why would the '\n' character be treated specially in arbitrary
binary data? It's fixed length, so surely this character isn't acting as a
seperator, right?

   - Oliver


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.