Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / May 2007

Tip: Looking for answers? Try searching our database.

Another topic on how to read a binary file.

Thread view: 
Knitter - 20 May 2007 19:07 GMT
Hi,
I know this has been asked a few zillion times but I couldn't find a
good answer for my problem.

I have a binary file, the Ant Movie Catalog database if anyone knows
the software. It is a file where information about movies is stores,
the software was created in Delphi so the binary files contains Pascal
strings and integers.

I know the file format, for example, I know that the strings are store
using a 4 bytes integer representing the string length, followed by
the actually string. What I'm falling to understand is how to read the
file.

I've been using BufferedInputStream created with a FileInputStream.
If I use the read(byte[]) method, that fills the passed array with the
array.lenght how can I transform that array of bytes into the integer
that I need?

I'm creating a simple test applications to learn how to read the
binary file. I'm starting with the header that is represented as:

strFileHeader35 = ' AMC_3.5 Ant Movie Catalog 3.5.x   www.buypin.com
www.antp.be ';
OwnerName:          string;
OwnerSite:          string;
OwnerMail:          string;
OwnerDescription:   string;

So I thought of reading the 4 byte that tells me how long each string
is, convert the array with the 4 bytes into the needed integer and
then reading the string into another array with the size of the
integer I have found.

I'm stuck with how to correctly read the file, how to convert the
bytes into integers.

I'm I going the wrong way?

Thanks.
Arne Vajhøj - 20 May 2007 19:28 GMT
> I have a binary file, the Ant Movie Catalog database if anyone knows
> the software. It is a file where information about movies is stores,
[quoted text clipped - 10 lines]
> array.lenght how can I transform that array of bytes into the integer
> that I need?

> So I thought of reading the 4 byte that tells me how long each string
> is, convert the array with the 4 bytes into the needed integer and
[quoted text clipped - 3 lines]
> I'm stuck with how to correctly read the file, how to convert the
> bytes into integers.

If you wrap your stream in a BinaryReader you will get some
convenient methods.

Arne
Knitter - 20 May 2007 19:47 GMT
Isn't BinaryReader C#? I can't find that class on the Java API, maybe
I'm not seeing the correct package.
I'll look more closely into it, thanks.
Knitter - 20 May 2007 20:09 GMT
Thanks for all the replies. My previous post should have been the
third reply as I was replying to Arne post.

"Little-endian, big-endian" that is something I'll have to really
learn. Have never had to understand the issues in the integer
representation.
I know that the the file was created with Delphi 7 for the 32bit
Windows platform, does that help? :)

Thanks again, I believe I can manage now, that byte to int conversion
helped but I'm seeing my lack of knowledge getting in the way.

Best regards,

Sergio
Knitter - 20 May 2007 20:34 GMT
This is what the help on the Ant Movie Catalog file format states:

All types are Pascal types(...)

"Each "string" field is preceded by an integer (4 bytes, signed) that
gives the size of the vector (size = 0 if no vector, i.e. empty
string). So strings are string (char = 1 byte, unsigned) without
ending delimiter."

If I try to read 4 bytes, using a byte array with size 4, or if I try
to read an integer, using the DataInputStream's readInt() method, I
get a nunber that can't represent any string present in the catalog, I
get "541150531". I haven't tried to create a byte array with that
number but I think that is not a correct length for a string, I'll
most likely end up with an out of memory exception :)

I really can't see how to read the file... thanks, I'll go and think
about this a bit...
Arne Vajhøj - 20 May 2007 20:37 GMT
> If I try to read 4 bytes, using a byte array with size 4, or if I try
> to read an integer, using the DataInputStream's readInt() method, I
> get a nunber that can't represent any string present in the catalog, I
> get "541150531". I haven't tried to create a byte array with that
> number but I think that is not a correct length for a string, I'll
> most likely end up with an out of memory exception :)

541150531 is " ADC".

Arne
Knitter - 20 May 2007 20:58 GMT
> 541150531 is " ADC".
>
> Arne

That number should represent the length of the string, not the string.
Either way that string does not exist in the catalog.
The format is: <length as a 4 byte integer><string as 1 char sequence,
no ending delimiter>.
Arne Vajhøj - 20 May 2007 21:01 GMT
>> 541150531 is " ADC".
>
> That number should represent the length of the string, not the string.
> Either way that string does not exist in the catalog.
> The format is: <length as a 4 byte integer><string as 1 char sequence,
> no ending delimiter>.

I think you must have misread. Unless the length is supposed
to be in the hundreds of MB range.

And I find it suspiciously that it matches ASCII text.

Arne
Knitter - 20 May 2007 21:21 GMT
I might have misread it but I have copy and pasted the help on the
file format a few post above, I'll post it here again:

"Each "string" field is preceded by an integer (4 bytes, signed) that
gives the size of the vector (size = 0 if no vector, i.e. empty
string). So strings are string (char = 1 byte, unsigned) without
ending delimiter."

The above text is what is written in Ant Movie Catalog format. I may
be reading it wrong though.
Patricia Shanahan - 20 May 2007 21:41 GMT
>>> 541150531 is " ADC".
>>
[quoted text clipped - 9 lines]
>
> Arne

541150531 decimal is hex 20414D43. If it were " ADC" I would have
expected to see hex 20414443, decimal 541148227.

However, 541150531 is much more likely to be ASCII " AMC" than a string
length.

If the file does have a length in the previous four bytes, I would
expect it to read out as either 4 if all is well, or a much bigger power
of two if there is a byte order issue.

Patricia
Knitter - 20 May 2007 21:57 GMT
How are you converting the 541150531 to ASCII? Maybe a silly question
but I don't know the answer...
Mark Space - 20 May 2007 23:08 GMT
> How are you converting the 541150531 to ASCII? Maybe a silly question
> but I don't know the answer...

Do you see it now?  Patricia and Arne hashed it out pretty well.

541150531 = 20414D43 hex, which is the same as the sequence of bytes 20
41 4D 43 also in hex.  In ASCII that's " " "A" "M" "C".  Check an ASCII
table at Wikipedia or something.

Short strings, fixed length, are often used as markers and "magic
numbers" in file formats.  Don't expect every string to need a length,
just the "data" ones that might actually vary...
Knitter - 21 May 2007 00:41 GMT
Thank you all for the help.
I have learning something new :) and have been able to read the damn
file. After I got past my initial problem I found how the file was
actually made. It turns out *every* string doesn't include the first
string. So those 4 bytes were the initial string that has a fixed
size.

I wasn't seeing the conversion to hex, after Mark pin-pointed it, it's
obvious :D

Thanks again, best regards,

Sergio
Arne Vajhøj - 20 May 2007 22:00 GMT
>>>> 541150531 is " ADC".
>>>
[quoted text clipped - 10 lines]
> 541150531 decimal is hex 20414D43. If it were " ADC" I would have
> expected to see hex 20414443, decimal 541148227.

Ooops.

I missed the D<>4 alias M<>D.

Arne
Arne Vajhøj - 20 May 2007 20:09 GMT
> Isn't BinaryReader C#? I can't find that class on the Java API, maybe
> I'm not seeing the correct package.
> I'll look more closely into it, thanks.

Sorry.

Yes.

DataInputStream in Java !

Arne
Arne Vajhøj - 20 May 2007 20:35 GMT
>> Isn't BinaryReader C#? I can't find that class on the Java API, maybe
>> I'm not seeing the correct package.
[quoted text clipped - 5 lines]
>
> DataInputStream in Java !

Note that Java DataInputStream uses network order (big
endian) while .NET BinaryReader uses Little Endian.

Arne
Knitter - 20 May 2007 20:46 GMT
> Note that Java DataInputStream uses network order (big
> endian) while .NET BinaryReader uses Little Endian.
>
> Arne

.Net? I'm not using any .Net type. I'm trying to read Delphi types,
from Delphi 7, not Delphi.Net. So I'm trying to read Pascal types in
Java :)
Arne Vajhøj - 20 May 2007 20:52 GMT
>> Note that Java DataInputStream uses network order (big
>> endian) while .NET BinaryReader uses Little Endian.
[quoted text clipped - 4 lines]
> from Delphi 7, not Delphi.Net. So I'm trying to read Pascal types in
> Java :)

I know.

I brought the .NET in and just wanted to clarify that there are a
small difference between .NET BinaryReader and Java DataInputStream.

My guess is that Delphi would save in little endian, but the
docs should really say so.

Arne
Joshua Cranmer - 20 May 2007 19:46 GMT
> Hi,
> I know this has been asked a few zillion times but I couldn't find a
[quoted text clipped - 9 lines]
> the actually string. What I'm falling to understand is how to read the
> file.

Little-endian, big-endian, or what other composition of integer? There
is a big difference between the various representations that can
horribly break a program.

> I've been using BufferedInputStream created with a FileInputStream.
> If I use the read(byte[]) method, that fills the passed array with the
> array.lenght how can I transform that array of bytes into the integer
> that I need?

Assuming a big-endian format (where the integer 0x0a0b0c0d is stored as
the bytes 0x0a, 0x0b, 0x0c, 0x0d), then java.io.DataInput can be used to
parse the data.

If it is little-endian, or another less-standard format, then some magic
will have to be used:

byte[] temp = new byte[4];
in.read(temp);
return ((temp[0] & 0xff)) | ((temp[1] & 0xff) << 8) |
       ((temp[2] & 0xff) << 16) | ((temp[3] & 0xff) << 24);
Gordon Beaton - 20 May 2007 19:46 GMT
> I've been using BufferedInputStream created with a FileInputStream.
> If I use the read(byte[]) method, that fills the passed array with
> the array.lenght how can I transform that array of bytes into the
> integer that I need?

If you have an array of 4 bytes, you can convert it to an integer by
combining the 4 values in the right order (which depends on the byte
order in the file):

 int n = arr[0] + (arr[1] << 8) + (arr[2] << 16) + (arr[3] << 24);

or

 int n = arr[3] + (arr[2] << 8) + (arr[1] << 16) + (arr[0] << 24);

DataInputStream has methods to do this for you (if your data order is
big endian).

/gordon

--
Mark Space - 20 May 2007 21:10 GMT
> If you have an array of 4 bytes, you can convert it to an integer by
> combining the 4 values in the right order (which depends on the byte
[quoted text clipped - 8 lines]
> DataInputStream has methods to do this for you (if your data order is
> big endian).

Does the static method Integer.reverseBytes(int) do the same thing?  It
seems like it should, although I didn't check it.  DataInputStream seems
like the better choice for this case, just wanted to point out there was
another method...
Arne Vajhøj - 20 May 2007 21:20 GMT
>> If you have an array of 4 bytes, you can convert it to an integer by
>> combining the 4 values in the right order (which depends on the byte
[quoted text clipped - 13 lines]
> like the better choice for this case, just wanted to point out there was
> another method...

big endian : DataInputStream readInt

little endian : DataInputStream readInt + Integer reverseBytes

would do nicely.

Arne
Mark Space - 20 May 2007 21:24 GMT
> Does the static method Integer.reverseBytes(int) do the same thing?  It
> seems like it should, although I didn't check it.  DataInputStream seems
> like the better choice for this case, just wanted to point out there was
> another method...

Answer: yes it does, although my quick test seems to have been bit by a
silent cast conversion.  To the language designers: dear sweet Jesus and
God in Heaven, why?

package integertest;
import java.lang.Integer;

public class Main
{
    public static void main(String[] args)
    {
        System.out.println( "128 reversed is " +
Integer.reverseBytes(128));
        System.out.println( "-1 reversed is " + Integer.reverseBytes(-1));
        System.out.println( "-129 reversed is " +
Integer.reverseBytes(-129));
        System.out.println( "256 reversed is " +
Integer.reverseBytes(256));
    }
}

ompile:
run:
128 reversed is -2147483648
-1 reversed is -1
-129 reversed is 2147483647
256 reversed is 65536
BUILD SUCCESSFUL (total time: 0 seconds)
Mark Space - 20 May 2007 21:33 GMT
>  To the language designers: dear sweet Jesus and
> God in Heaven, why?

> 256 reversed is 65536

Oops, math error on my part, I thought 256 reversed would result in a
different bit pattern.  Wailing and moaning at Sun can now cease.  Move
along, nothing to see here...
Arne Vajhøj - 20 May 2007 21:34 GMT
> Answer: yes it does, although my quick test seems to have been bit by a
> silent cast conversion.  To the language designers: dear sweet Jesus and
[quoted text clipped - 24 lines]
> 256 reversed is 65536
> BUILD SUCCESSFUL (total time: 0 seconds)

Which one is puzzling you ?

Arne
Nigel Wade - 21 May 2007 09:34 GMT
> Hi,
> I know this has been asked a few zillion times but I couldn't find a
[quoted text clipped - 36 lines]
>
> Thanks.

Use a ByteBuffer. After you have read a buffer of data into your app. you wrap a
ByteBuffer around it:

ByteBuffer bb = ByteBuffer.wrap(buffer);

then you can read from the ByteBuffer using its methods, .getShort(), getInt(),
getLong(), getFloat() etc.

If the data is little-endian rather than big endian then, prior to reading data
from the ByteBuffer, set the order using bb.order(ByteOrder.LITTLE_ENDIAN).

So, to read a string you could do something like (I prefer to use
DataInputStream as it has the readFully() method, removing the necessity to
loop over a read() method) :

  in = new FileInputStream( file );
  dataIn = new DataInputStream( in );
 
  // read the string length
  byte[] lengthBytes = new byte[4];
  dataIn.readFully( lengthBytes );
       
  // wrap the byte array in a ByteBuffer
  ByteBuffer bb = ByteBuffer.wrap( lengthBytes );

  // if reading little endian data
  bb.order( ByteOrder.LITTLE_ENDIAN );
       
  int stringLength = bb.getInt();
  byte[] stringBytes = new byte[stringLength];
  dataIn.readFully( stringBytes );

You now have the string as a byte array. You now need to convert that to a Java
String, how you do that depends on the string encoding.

Signature

Nigel Wade, System Administrator, Space Plasma Physics Group,
           University of Leicester, Leicester, LE1 7RH, UK
E-mail :    nmw@ion.le.ac.uk
Phone :     +44 (0)116 2523548, Fax : +44 (0)116 2523555



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.