Java Forum / General / May 2007
Another topic on how to read a binary file.
Knitter - 20 May 2007 19:07 GMT Hi, I know this has been asked a few zillion times but I couldn't find a good answer for my problem.
I have a binary file, the Ant Movie Catalog database if anyone knows the software. It is a file where information about movies is stores, the software was created in Delphi so the binary files contains Pascal strings and integers.
I know the file format, for example, I know that the strings are store using a 4 bytes integer representing the string length, followed by the actually string. What I'm falling to understand is how to read the file.
I've been using BufferedInputStream created with a FileInputStream. If I use the read(byte[]) method, that fills the passed array with the array.lenght how can I transform that array of bytes into the integer that I need?
I'm creating a simple test applications to learn how to read the binary file. I'm starting with the header that is represented as:
strFileHeader35 = ' AMC_3.5 Ant Movie Catalog 3.5.x www.buypin.com www.antp.be '; OwnerName: string; OwnerSite: string; OwnerMail: string; OwnerDescription: string;
So I thought of reading the 4 byte that tells me how long each string is, convert the array with the 4 bytes into the needed integer and then reading the string into another array with the size of the integer I have found.
I'm stuck with how to correctly read the file, how to convert the bytes into integers.
I'm I going the wrong way?
Thanks.
Arne Vajhøj - 20 May 2007 19:28 GMT > I have a binary file, the Ant Movie Catalog database if anyone knows > the software. It is a file where information about movies is stores, [quoted text clipped - 10 lines] > array.lenght how can I transform that array of bytes into the integer > that I need?
> So I thought of reading the 4 byte that tells me how long each string > is, convert the array with the 4 bytes into the needed integer and [quoted text clipped - 3 lines] > I'm stuck with how to correctly read the file, how to convert the > bytes into integers. If you wrap your stream in a BinaryReader you will get some convenient methods.
Arne
Knitter - 20 May 2007 19:47 GMT Isn't BinaryReader C#? I can't find that class on the Java API, maybe I'm not seeing the correct package. I'll look more closely into it, thanks.
Knitter - 20 May 2007 20:09 GMT Thanks for all the replies. My previous post should have been the third reply as I was replying to Arne post.
"Little-endian, big-endian" that is something I'll have to really learn. Have never had to understand the issues in the integer representation. I know that the the file was created with Delphi 7 for the 32bit Windows platform, does that help? :)
Thanks again, I believe I can manage now, that byte to int conversion helped but I'm seeing my lack of knowledge getting in the way.
Best regards,
Sergio
Knitter - 20 May 2007 20:34 GMT This is what the help on the Ant Movie Catalog file format states:
All types are Pascal types(...)
"Each "string" field is preceded by an integer (4 bytes, signed) that gives the size of the vector (size = 0 if no vector, i.e. empty string). So strings are string (char = 1 byte, unsigned) without ending delimiter."
If I try to read 4 bytes, using a byte array with size 4, or if I try to read an integer, using the DataInputStream's readInt() method, I get a nunber that can't represent any string present in the catalog, I get "541150531". I haven't tried to create a byte array with that number but I think that is not a correct length for a string, I'll most likely end up with an out of memory exception :)
I really can't see how to read the file... thanks, I'll go and think about this a bit...
Arne Vajhøj - 20 May 2007 20:37 GMT > If I try to read 4 bytes, using a byte array with size 4, or if I try > to read an integer, using the DataInputStream's readInt() method, I > get a nunber that can't represent any string present in the catalog, I > get "541150531". I haven't tried to create a byte array with that > number but I think that is not a correct length for a string, I'll > most likely end up with an out of memory exception :) 541150531 is " ADC".
Arne
Knitter - 20 May 2007 20:58 GMT > 541150531 is " ADC". > > Arne That number should represent the length of the string, not the string. Either way that string does not exist in the catalog. The format is: <length as a 4 byte integer><string as 1 char sequence, no ending delimiter>.
Arne Vajhøj - 20 May 2007 21:01 GMT >> 541150531 is " ADC". > > That number should represent the length of the string, not the string. > Either way that string does not exist in the catalog. > The format is: <length as a 4 byte integer><string as 1 char sequence, > no ending delimiter>. I think you must have misread. Unless the length is supposed to be in the hundreds of MB range.
And I find it suspiciously that it matches ASCII text.
Arne
Knitter - 20 May 2007 21:21 GMT I might have misread it but I have copy and pasted the help on the file format a few post above, I'll post it here again:
"Each "string" field is preceded by an integer (4 bytes, signed) that gives the size of the vector (size = 0 if no vector, i.e. empty string). So strings are string (char = 1 byte, unsigned) without ending delimiter."
The above text is what is written in Ant Movie Catalog format. I may be reading it wrong though.
Patricia Shanahan - 20 May 2007 21:41 GMT >>> 541150531 is " ADC". >> [quoted text clipped - 9 lines] > > Arne 541150531 decimal is hex 20414D43. If it were " ADC" I would have expected to see hex 20414443, decimal 541148227.
However, 541150531 is much more likely to be ASCII " AMC" than a string length.
If the file does have a length in the previous four bytes, I would expect it to read out as either 4 if all is well, or a much bigger power of two if there is a byte order issue.
Patricia
Knitter - 20 May 2007 21:57 GMT How are you converting the 541150531 to ASCII? Maybe a silly question but I don't know the answer...
Mark Space - 20 May 2007 23:08 GMT > How are you converting the 541150531 to ASCII? Maybe a silly question > but I don't know the answer... Do you see it now? Patricia and Arne hashed it out pretty well.
541150531 = 20414D43 hex, which is the same as the sequence of bytes 20 41 4D 43 also in hex. In ASCII that's " " "A" "M" "C". Check an ASCII table at Wikipedia or something.
Short strings, fixed length, are often used as markers and "magic numbers" in file formats. Don't expect every string to need a length, just the "data" ones that might actually vary...
Knitter - 21 May 2007 00:41 GMT Thank you all for the help. I have learning something new :) and have been able to read the damn file. After I got past my initial problem I found how the file was actually made. It turns out *every* string doesn't include the first string. So those 4 bytes were the initial string that has a fixed size.
I wasn't seeing the conversion to hex, after Mark pin-pointed it, it's obvious :D
Thanks again, best regards,
Sergio
Arne Vajhøj - 20 May 2007 22:00 GMT >>>> 541150531 is " ADC". >>> [quoted text clipped - 10 lines] > 541150531 decimal is hex 20414D43. If it were " ADC" I would have > expected to see hex 20414443, decimal 541148227. Ooops.
I missed the D<>4 alias M<>D.
Arne
Arne Vajhøj - 20 May 2007 20:09 GMT > Isn't BinaryReader C#? I can't find that class on the Java API, maybe > I'm not seeing the correct package. > I'll look more closely into it, thanks. Sorry.
Yes.
DataInputStream in Java !
Arne
Arne Vajhøj - 20 May 2007 20:35 GMT >> Isn't BinaryReader C#? I can't find that class on the Java API, maybe >> I'm not seeing the correct package. [quoted text clipped - 5 lines] > > DataInputStream in Java ! Note that Java DataInputStream uses network order (big endian) while .NET BinaryReader uses Little Endian.
Arne
Knitter - 20 May 2007 20:46 GMT > Note that Java DataInputStream uses network order (big > endian) while .NET BinaryReader uses Little Endian. > > Arne .Net? I'm not using any .Net type. I'm trying to read Delphi types, from Delphi 7, not Delphi.Net. So I'm trying to read Pascal types in Java :)
Arne Vajhøj - 20 May 2007 20:52 GMT >> Note that Java DataInputStream uses network order (big >> endian) while .NET BinaryReader uses Little Endian. [quoted text clipped - 4 lines] > from Delphi 7, not Delphi.Net. So I'm trying to read Pascal types in > Java :) I know.
I brought the .NET in and just wanted to clarify that there are a small difference between .NET BinaryReader and Java DataInputStream.
My guess is that Delphi would save in little endian, but the docs should really say so.
Arne
Joshua Cranmer - 20 May 2007 19:46 GMT > Hi, > I know this has been asked a few zillion times but I couldn't find a [quoted text clipped - 9 lines] > the actually string. What I'm falling to understand is how to read the > file. Little-endian, big-endian, or what other composition of integer? There is a big difference between the various representations that can horribly break a program.
> I've been using BufferedInputStream created with a FileInputStream. > If I use the read(byte[]) method, that fills the passed array with the > array.lenght how can I transform that array of bytes into the integer > that I need? Assuming a big-endian format (where the integer 0x0a0b0c0d is stored as the bytes 0x0a, 0x0b, 0x0c, 0x0d), then java.io.DataInput can be used to parse the data.
If it is little-endian, or another less-standard format, then some magic will have to be used:
byte[] temp = new byte[4]; in.read(temp); return ((temp[0] & 0xff)) | ((temp[1] & 0xff) << 8) | ((temp[2] & 0xff) << 16) | ((temp[3] & 0xff) << 24);
Gordon Beaton - 20 May 2007 19:46 GMT > I've been using BufferedInputStream created with a FileInputStream. > If I use the read(byte[]) method, that fills the passed array with > the array.lenght how can I transform that array of bytes into the > integer that I need? If you have an array of 4 bytes, you can convert it to an integer by combining the 4 values in the right order (which depends on the byte order in the file):
int n = arr[0] + (arr[1] << 8) + (arr[2] << 16) + (arr[3] << 24);
or
int n = arr[3] + (arr[2] << 8) + (arr[1] << 16) + (arr[0] << 24);
DataInputStream has methods to do this for you (if your data order is big endian).
/gordon
--
Mark Space - 20 May 2007 21:10 GMT > If you have an array of 4 bytes, you can convert it to an integer by > combining the 4 values in the right order (which depends on the byte [quoted text clipped - 8 lines] > DataInputStream has methods to do this for you (if your data order is > big endian). Does the static method Integer.reverseBytes(int) do the same thing? It seems like it should, although I didn't check it. DataInputStream seems like the better choice for this case, just wanted to point out there was another method...
Arne Vajhøj - 20 May 2007 21:20 GMT >> If you have an array of 4 bytes, you can convert it to an integer by >> combining the 4 values in the right order (which depends on the byte [quoted text clipped - 13 lines] > like the better choice for this case, just wanted to point out there was > another method... big endian : DataInputStream readInt
little endian : DataInputStream readInt + Integer reverseBytes
would do nicely.
Arne
Mark Space - 20 May 2007 21:24 GMT > Does the static method Integer.reverseBytes(int) do the same thing? It > seems like it should, although I didn't check it. DataInputStream seems > like the better choice for this case, just wanted to point out there was > another method... Answer: yes it does, although my quick test seems to have been bit by a silent cast conversion. To the language designers: dear sweet Jesus and God in Heaven, why?
package integertest; import java.lang.Integer;
public class Main { public static void main(String[] args) { System.out.println( "128 reversed is " + Integer.reverseBytes(128)); System.out.println( "-1 reversed is " + Integer.reverseBytes(-1)); System.out.println( "-129 reversed is " + Integer.reverseBytes(-129)); System.out.println( "256 reversed is " + Integer.reverseBytes(256)); } }
ompile: run: 128 reversed is -2147483648 -1 reversed is -1 -129 reversed is 2147483647 256 reversed is 65536 BUILD SUCCESSFUL (total time: 0 seconds)
Mark Space - 20 May 2007 21:33 GMT > To the language designers: dear sweet Jesus and > God in Heaven, why?
> 256 reversed is 65536 Oops, math error on my part, I thought 256 reversed would result in a different bit pattern. Wailing and moaning at Sun can now cease. Move along, nothing to see here...
Arne Vajhøj - 20 May 2007 21:34 GMT > Answer: yes it does, although my quick test seems to have been bit by a > silent cast conversion. To the language designers: dear sweet Jesus and [quoted text clipped - 24 lines] > 256 reversed is 65536 > BUILD SUCCESSFUL (total time: 0 seconds) Which one is puzzling you ?
Arne
Nigel Wade - 21 May 2007 09:34 GMT > Hi, > I know this has been asked a few zillion times but I couldn't find a [quoted text clipped - 36 lines] > > Thanks. Use a ByteBuffer. After you have read a buffer of data into your app. you wrap a ByteBuffer around it:
ByteBuffer bb = ByteBuffer.wrap(buffer);
then you can read from the ByteBuffer using its methods, .getShort(), getInt(), getLong(), getFloat() etc.
If the data is little-endian rather than big endian then, prior to reading data from the ByteBuffer, set the order using bb.order(ByteOrder.LITTLE_ENDIAN).
So, to read a string you could do something like (I prefer to use DataInputStream as it has the readFully() method, removing the necessity to loop over a read() method) :
in = new FileInputStream( file ); dataIn = new DataInputStream( in ); // read the string length byte[] lengthBytes = new byte[4]; dataIn.readFully( lengthBytes ); // wrap the byte array in a ByteBuffer ByteBuffer bb = ByteBuffer.wrap( lengthBytes );
// if reading little endian data bb.order( ByteOrder.LITTLE_ENDIAN ); int stringLength = bb.getInt(); byte[] stringBytes = new byte[stringLength]; dataIn.readFully( stringBytes );
You now have the string as a byte array. You now need to convert that to a Java String, how you do that depends on the string encoding.
 Signature Nigel Wade, System Administrator, Space Plasma Physics Group, University of Leicester, Leicester, LE1 7RH, UK E-mail : nmw@ion.le.ac.uk Phone : +44 (0)116 2523548, Fax : +44 (0)116 2523555
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|