Java Forum / General / September 2007
Read binary data file
Windsor.Locks@gmail.com - 29 Aug 2007 20:52 GMT I am a C++ programmer, working on a java program. I need to read a binary file using Java.
Here is how I read it in C++,
Struct SOME_DATA { unsigned long data1; unsigned short data2; unsigned short data3; unsigned long data4; }
struct SOME_DATA someData;
and read using
fread(&someData, 12, 1, inputFile);
Please give me some pointers, how do i read this using Java? Thanks. BTW, those are not the variable names I use in my program.
Joshua Cranmer - 29 Aug 2007 21:07 GMT > I am a C++ programmer, working on a java program. I need to read a > binary file using Java. InputStream is = new FileInputStream("file/name.txt"); byte[] data = new byte[12]; is.read(data);
That reads 12 bytes of data into data. Alternatively, you can grab byte-by-byte or use only part of the buffer. See the JavaDocs for java.io.InputStream for more information.
 Signature Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
shakah - 29 Aug 2007 21:37 GMT On Aug 29, 3:52 pm, Windsor.Lo...@gmail.com wrote:
> I am a C++ programmer, working on a java program. I need to read a > binary file using Java. [quoted text clipped - 18 lines] > Please give me some pointers, how do i read this using Java? Thanks. > BTW, those are not the variable names I use in my program. It's never a good idea portability-wise to write structs in binary format (e.g. how do you deal with packing, different CPU architectures, etc.?), but ignoring that for now you could naively do something like the following. Note that this only works on big-endian machines, and is probably unreliable there anyway.
jc@soyuz:~/tmp/binrw$ cat main.cpp #include <stdio.h>
int main(int /*argc*/, char **argv) { struct SOME_DATA { unsigned long data1 ; unsigned short data2 ; unsigned short data3 ; unsigned long data4 ; } ;
SOME_DATA someData = { 1, 2, 3, 4 } ;
FILE *fh = fopen(argv[1], "wb") ; fwrite(&someData, sizeof(someData), 1, fh) ; fclose(fh) ;
return 0 ; } jc@soyuz:~/tmp/binrw$ g++ -W -Wall -pedantic -o test main.cpp jc@soyuz:~/tmp/binrw$ ./test test2.file jc@soyuz:~/tmp/binrw$ cat test.java public class test { public static void main(String [] args) throws java.io.IOException { java.io.DataInputStream dis = new java.io.DataInputStream( new java.io.FileInputStream( new java.io.File( args[0] ) ) ) ; System.out.println("data1: " + dis.readInt()) ; System.out.println("data2: " + dis.readShort()) ; System.out.println("data3: " + dis.readShort()) ; System.out.println("data4: " + dis.readInt()) ; } } jc@soyuz:~/tmp/binrw$ javac test.java jc@soyuz:~/tmp/binrw$ java -classpath . test test2.file data1: 1 data2: 2 data3: 3 data4: 4
For reference, duplicating the above on an Intel box yields: jc@jc-ubuntu:~/tmp/binrw$ java test test.file data1: 16777216 data2: 512 data3: 768 data4: 67108864
Windsor.Locks@gmail.com - 29 Aug 2007 21:54 GMT > On Aug 29, 3:52 pm, Windsor.Lo...@gmail.com wrote: > [quoted text clipped - 26 lines] > something like the following. Note that this only works on big-endian > machines, and is probably unreliable there anyway. Thanks for your reply. I do not have any say in the file format or how the file is written. My requirement is read this file and get the data out of it. There is nothing more I can do.
Hunter Gratzner - 29 Aug 2007 22:13 GMT On Aug 29, 10:54 pm, Windsor.Lo...@gmail.com wrote:
> Thanks for your reply. I do not have any say in the file format or how > the file is written. My requirement is read this file and get the data > out of it. There is nothing more I can do. Then the one "defining" this data format has no f.cking clue. C/C++ structs have no well defined binary layout, except the order of elements. C/C++ integer data types have no well defined binary representation and no well defined size, except a minimum value range.
~kurt - 29 Aug 2007 23:54 GMT > On Aug 29, 10:54 pm, Windsor.Lo...@gmail.com wrote: >> Thanks for your reply. I do not have any say in the file format or how [quoted text clipped - 5 lines] > elements. C/C++ integer data types have no well defined binary > representation and no well defined size, except a minimum value range. And you are missing the point. There are many legacy systems out there that make plenty of assumptions, and have been working just fine for longer than Java has even existed.
Instead of telling us the one defining the data format has no clue (which you are wrong about), why don't you explain your solution to writing a binary file in C/C++, FORTRAN, or whatever, that will solve all the academic issues you have just brought up.
Reading binary files is almost always tricky, especially when you move from one platform, OS, or language to the next. There is no way to circumvent this. It is the price you pay to have the data in a binary format. Java does, at least, make it portable across platforms and OSs - but not languages. If you are reading a binary file created outside of Java, then you are going to need to create a custom reader for this data. What is really annoying is when you don't even know the endian or the size of the values (16 bit, 32 bit?) and need to experiment to get it right.
I've had to do this numerous times myself. The worst was for one application that was written in a version of FORTRAN that would put an arbitrary sized (arbitrary as far as I could tell) header after each record that was written (turning off this header was a compile time option, that I seem to remember would make accessing the file less efficient, or something). I wanted to read it directly into Matlab - not an easy task.
- Kurt
Lew - 30 Aug 2007 02:22 GMT Windsor.Locks@gmail.com wrote:
>>> I do not have any say in the file format or how >>> the file is written. My requirement is read this file and get the data <http://java.sun.com/javase/6/docs/api/java/nio/ByteBuffer.html> <http://java.sun.com/javase/6/docs/api/java/nio/ByteBuffer.html#order(java.nio.By teOrder)>
 Signature Lew
Hunter Gratzner - 30 Aug 2007 20:23 GMT > > On Aug 29, 10:54 pm, Windsor.Lo...@gmail.com wrote: > >> Thanks for your reply. I do not have any say in the file format or how [quoted text clipped - 7 lines] > > And you are missing the point. No, I don't. A C struct is not a suitable, unambiguous format specification, binary or otherwise. That's the whole point. Giving someone just a C struct and telling him to implement it in Java is a pointless stupid act. It indicates that the one giving this file format "definition" has no f.cking clue what he is doing.
> Instead of telling us the one defining the data format has no clue (which > you are wrong about), why don't you explain your solution to writing a binary > file in C/C++, FORTRAN, or whatever, that will solve all the academic issues > you have just brought up. It did that previously in this same thread, but you are apparently more interested in picking a fight.
> Reading binary files is almost always tricky, especially when you move from > one platform, OS, or language to the next. There is no way to circumvent > this. Sure it is. By having an unambiguous format specification. A C struct is not an unambiguous format specification.
> It is the price you pay to have the data in a binary format. No, it is the price to pay when some fuckwit thinks that writing C structs 1:1 to memory is a good idea.
There is no difference between a binary and a text format if you need to move between platforms. Either the format is unambiguously defined, then it's a straight forward job to implement it, or it isn't.
> What is really annoying is > when you don't even know the endian or the size of the values (16 bit, > 32 bit?) and need to experiment to get it right. And why do you then think a C struct is a good definition of a binary format?
Mike Schilling - 30 Aug 2007 23:38 GMT > C/C++ integer data types have no well defined binary > representation and no well defined size, except a minimum value range. And the presence or absence of between-field padding isn't always guaranteed. Still, if the files don't have to be cross-platform, reading and writing structs will work just fine. Note: the *application* can be portable across platforms, so long as the (for example) Solaris/Sparc version won't have to read files written by the Windows/Intel version.
~kurt - 31 Aug 2007 01:18 GMT > No, I don't. A C struct is not a suitable, unambiguous format > specification, binary or otherwise. That's the whole point. Giving > someone just a C struct and telling him to implement it in Java is a > pointless stupid act. It indicates that the one giving this file > format "definition" has no f.cking clue what he is doing. It is hardly pointless. Most of the time, there is no format specification because binary data is often not written with the intention of being used outside of the application that writes it. Only later does an outside user have a need for the data, and then one has to often reverse engineer a solution. A C struct at least gives you an idea as to what type of data is in the file. Knowing what platform it was written in helps out even more.
>> Instead of telling us the one defining the data format has no clue (which >> you are wrong about), why don't you explain your solution to writing a binary [quoted text clipped - 3 lines] > It did that previously in this same thread, but you are apparently > more interested in picking a fight. I'm put off by your attitude that what the OP has to work with is due to someone who has no clue. If you are saying a C structure makes a bad ICD, then I agree with you. But, binary files are often not written with portability in mind, and the implementation details exist only in the code that reads/writes the data. There is nothing wrong with that when the original intent of the data was for internal use only - and that is often the case. Then, seeing how the data is read into a C structure is invaluable.
The soultion I saw you post was an example of how to read the data. I didn't see anything but bitching regarding the data source.
> No, it is the price to pay when some fuckwit thinks that writing C > structs 1:1 to memory is a good idea. It is often the only reasonable idea, depending on the orignal intent of the data. Like I said, I didn't see a better solution posted by you on how to do this. Creating unecessary ICDs is a bad thing.
> And why do you then think a C struct is a good definition of a binary > format? It works as good as anything else for many uses. If you write a specification describing how many bytes a number is supposed to take up, and the endian, and the data is only to be used internally, then you are creating extra work for youself when you port the code to other platforms (of course, you want to call sizeoff() when reading in the structure instead of hard coding the size).
- Kurt
Charles - 31 Aug 2007 05:41 GMT > > No, I don't. A C struct is not a suitable, unambiguous format > > specification, binary or otherwise. That's the whole point. Giving [quoted text clipped - 45 lines] > > - Kurt Dear Friends (when did you guys become my friends?)
Let's review what the OP stated
A struct is given in C++
Data needs to read from a file in Java.
You have the following data types
unsigned long unsigned short
As previously stated by other posters the Endianness of the operating system should affect how the output file is encoded. I assume this to be true but have not verified it to be true.
We assume all unsigned longs and unsigned short will ALWAYS have the same bytesize.
The complete struct is given as
unsigned long data1; unsigned short data2; unsigned short data3; unsigned long data4;
Can we also assume that the data will always be sequenced as described in the STRUCT? I don't see any argument why the data will be out of sequence as defined in the STRUCT.
Does the input file get modified when it is transported from one operating system to another? I assume NO. This is not verified.
Are there equivalents of unsigned long and unsigned short in Java? Are they the same byte size? Do they encode the data the same?
Try to read in Java and verify with known data. If you don't know any of the data values this becomes a harder task.
Lew - 31 Aug 2007 12:21 GMT > Let's review what the OP stated > [quoted text clipped - 25 lines] > I don't see any argument why the data will be out of sequence as > defined in the STRUCT. But we do not know the padding, and the OP doesn't know what those sizes are, nor the endianness of their files. They don't even know in what format the floating-point values are stored: IEEE? We need all that information to craft a Java equivalent, and we don't have it. The OP doesn't have it, by their account.
> Does the input file get modified when it is transported from one > operating system to another? > I assume NO. This is not verified. But if endianness and padding matter, the fact that it is not modified will make it unreadable on the second system.
> Are there equivalents of unsigned long and unsigned short in Java? No.
> Are they the same byte size? We do not know. The OP hasn't given us enough information.
> Do they encode the data the same? We do not know. The OP hasn't given us enough information.
> Try to read in Java and verify with known data. If you don't know any > of the data values this becomes a harder task. It's already impossible based on the information given. How much harder can it get?
 Signature Lew
Martin Gregorie - 31 Aug 2007 13:53 GMT > It's already impossible based on the information given. How much harder > can it get? If the OP *MUST* move binary data, at least do it in a platform and language-independent manner and use ASN.1 encoding.
 Signature martin@ | Martin Gregorie gregorie. | Essex, UK org |
~kurt - 01 Sep 2007 02:18 GMT > If the OP *MUST* move binary data, at least do it in a platform and > language-independent manner and use ASN.1 encoding. I understand Hunter's comments, and and while I don't know much about ASN.1 encoding, what I am pointing out is that binary files are usually *not* intended to be used across sytems. Every binary data file I have ever worked with was intended to be used either by the program that wrote it, or separate applications that used the same utility libraries as the application which wrote the data. There is nothing wrong with simply writing the C structure to a file, and reading it in the same way. In this case the code, and not some specification, drives the format of the data - and there is *nothing* wrong with this. The lack of a need to share the data outside of the application is what often drives the decision to use binary data in the first place (why not take advantage of the efficiency binary files have to offer).
Of course, every once in a while an outside user decides they want to use this data. Well, then they have a choice. Either generate it themselves, or spend a few hours writing something that can read it in - not a big price to pay.
- Kurt
Esmond Pitt - 01 Sep 2007 11:45 GMT > I understand Hunter's comments, and and while I don't know much about > ASN.1 encoding, what I am pointing out is that binary files are usually > *not* intended to be used across sytems. Except for all the ones that are, e.g. protocol dumps; databases; interpretive pseudo-code (e.g. .class files), ...
> Every binary data file I have > ever worked with was intended to be used either by the program that wrote > it, or separate applications that used the same utility libraries as the > application which wrote the data. Except for the ones that aren't: e.g. protocol dumps; databases; interpretive pseudo-code (e.g. .class files), ...
> There is nothing wrong with simply writing > the C structure to a file, and reading it in the same way. In this case > the code, and not some specification, drives the format of the data - and there > is *nothing* wrong with this. There is plenty wrong with this. The format of binary data written directly from a struct in memory depends on at least the following:
- the host hardware - the compiler - the compiler version - the surrounding #pragmas - the compiler options that were in effect when the binary that wrote the file it was compiled
This is too many dependencies, on too many things that can't be controlled.
The only time writing a struct from memory to a file or a network can sanely be justified is when the target application is constructed with the same version of the same object file that wrote it. And this is not a guarantee that in general can be met.
Mike Schilling - 01 Sep 2007 16:55 GMT >> I understand Hunter's comments, and and while I don't know much about >> ASN.1 encoding, what I am pointing out is that binary files are >> usually *not* intended to be used across sytems. > > Except for all the ones that are, e.g. protocol dumps; databases; > interpretive pseudo-code (e.g. .class files), ... How often to database *files* get moved from one system to another? In my experience, they stay on the server where the DBMS engine is running.
Arne Vajhøj - 01 Sep 2007 22:59 GMT >>> I understand Hunter's comments, and and while I don't know much about >>> ASN.1 encoding, what I am pointing out is that binary files are [quoted text clipped - 4 lines] > How often to database *files* get moved from one system to another? In my > experience, they stay on the server where the DBMS engine is running. It has been attempted occasionally.
It is usually not supported and often it does not work.
Arne
~kurt - 01 Sep 2007 19:12 GMT > The only time writing a struct from memory to a file or a network can Who is talking about writing data to a network?
> sanely be justified is when the target application is constructed with > the same version of the same object file that wrote it. And this is not > a guarantee that in general can be met. Uh, this is pretty much what I just said other than I see no need for the "guarantee" part - it is not necessary unless the *intent* is to distribute the data externally.
As I said, my gripe is in calling the originator of the OP's data clueless. That statement is simply clueless itself. Yes, if the original program had been written in Java, then maybe that statement would be true. But this is a C++ program. The data files are most likely "private", only to be used internally. Sure, if you port the code to another platform, the binary files between the two versions may not be compatible, but so what - that usually isn't a problem. The new code will create binary files that are compatible with itself. Creating some external specification that this binary data must meet would be stupid because then, if you did port the code, now you may have to modify it to be compatible with the original specification, and this may require more processing of the data. Suddenly, some specification is driving internal data, and robbing some degree of performance from the application.
Just because a bureaucrat comes a long some time down the road and says "though shalt write a Java program (not that Java is the best solution in this case, but because it is the 'in' thing to do) that will use Program X's internal data files" does not mean Program X was poorly designed.
- Kurt
Mike Schilling - 01 Sep 2007 19:39 GMT >> The only time writing a struct from memory to a file or a network can > [quoted text clipped - 23 lines] > internal data, and robbing some degree of performance from the > application. The danger is that a different compiler (or different version of the same compiler) would cause an incompatibility. The good news is that compiler vendors tend not to change struct layouts for that very reason. Still, this needs to be kept in mind and tested for whenever that sort of change is made.
Another point, not yet mentioned (or if it has been, I missed that post.) Any structured data that's saved persistently should contain a version number. If it never changes, you've added a small amount of overhead. When it does change, it's now straightforward to convert older versions and recognize new ones, which, without the explicit versioning, can be difficult or impossible.
Martin Gregorie - 01 Sep 2007 23:42 GMT > The danger is that a different compiler (or different version of the same > compiler) would cause an incompatibility. The good news is that compiler > vendors tend not to change struct layouts for that very reason. Still, this > needs to be kept in mind and tested for whenever that sort of change is > made. Actually, there's a more subtle way of failing that can bite an executable that reloads data that it wrote itself: there's not necessarily a guarantee that the chunks of data will be read back to the same virtual memory address that it was saved from so it had better not contain pointers that are expected to remain valid.
I've been there: I had a program that did lookups on a few hundred million phone numbers. It used a B-tree for in-memory lookups: the same lookup using a database wouldn't run faster than 700 lookups/second and we needed 3000, hence the B-tree which ran at 25,000/second. BUT startup took 40 minutes to populate the B-tree from the database, so I saved the B-tree by simply dumping its dataspace to files that were reloaded on startup. The B-tree grew continuously, so it was split over a number of multi-megabyte memory chunks: each was written to a separate file. Reloading these reduced startup time to under 5 minutes. However, the first iteration merely crashed because the OS (a Mach-based UNIX) didn't reload the chunks into the same places in my process's virtual memory, so the pointers were so much junk. FWIW the fix was to replace standard pointers with my own addressing scheme: this occupied the same space, but replaced pointers with structs containing two fields, chunkno:chunk_offset. This sidestepped the problem and ran acceptably fast.
I know this is somewhat OT for c.j.j.p but knowing about it may save somebody's hide one of these days.
 Signature martin@ | Martin Gregorie gregorie. | Essex, UK org |
Mike Schilling - 02 Sep 2007 00:39 GMT > I've been there: I had a program that did lookups on a few hundred > million phone numbers. It used a B-tree for in-memory lookups: the [quoted text clipped - 14 lines] > chunkno:chunk_offset. This sidestepped the problem and ran acceptably > fast. On some OS's you could have created a memory-mapped file at whatever address you provided, which lets you both use absolute addresses and avoid the startup overhead by letting the file page itself in. Yours is a nice "with simple tools" solution.
Gordon Beaton - 02 Sep 2007 07:44 GMT > On some OS's you could have created a memory-mapped file at whatever > address you provided, which lets you both use absolute addresses and > avoid the startup overhead by letting the file page itself in. Yours > is a nice "with simple tools" solution. There are many components that make up the address space of an application, and there is no guarantee that the same block of addresses will always be available to the application. A program that depends on that particular feature of mmap() is extremely fragile and can't be expected to work across upgrades of the software or any of the libraries it depends on. That might be ok for hobby projects, but I'd never ship such a beast to a customer.
/gordon
--
Mike Schilling - 02 Sep 2007 08:12 GMT >> On some OS's you could have created a memory-mapped file at whatever >> address you provided, which lets you both use absolute addresses and [quoted text clipped - 8 lines] > the libraries it depends on. That might be ok for hobby projects, but > I'd never ship such a beast to a customer. I'm not really familiar with mmap(); wouldn't it be possible to choose a starting address well out of the possible end address of the application proper? I was actually thinking of VMS, where the address could be in a part of virtual memory that isn't used by the application at all.
In any case, if it's possible to allocate enough contiguous virtual memory at some location, all that's needed is to adjust the stored addresses by the difference [1], and you can still page the file in as needed. If you're not sure of contiguous memory, you effectively have the OP's solution of (chunk, offset) pairs.
Though if you're doing this, it's more logical to store offsets to the start of the file rather than addresses.
Martin Gregorie - 01 Sep 2007 13:40 GMT > I understand Hunter's comments, and and while I don't know much about > ASN.1 encoding, what I am pointing out is that binary files are usually > *not* intended to be used across sytems. I think its use is quite industry-dependent: I've never seen it used in financial messaging (that's more likely to use SWIFT formats, which are tagged text) but its common in the telecommunications industry.
Telcos (both fixed line and mobile) use a lot of binary data for control and accounting purposes, mainly because this minimizes message size and there's a LOT of stuff flying around controlling the network in real time and accounting for its use. Switches from large vendors, e.g. Erickson, tend to use proprietary, flat message formats but if the data will be exchanged between different types of kit (e.g. roaming billing data) they tend to use ASN.1: CCITT likes it.
ASN.1 has a lot in common with XML in that its a tagged field protocol, allows nesting, and uses a tag dictionary to associate meanings with tags. Compared with XML its a LOT more compact (tags are one byte, fixed length fields don't have terminators, variable length fields are preceded by a one or two byte length) and it has a number of predefined field types as well as arrays. If you have the dictionary its easy to interpret on the fly though, like XML, you can also use the dictionary to generate code to encode and decode ASN.1 records.
> Every binary data file I have > ever worked with was intended to be used either by the program that wrote > it, or separate applications that used the same utility libraries as the > application which wrote the data. There's also a lot of binary data in large commercial systems. Formerly it was in large serial files, then flat indexed files, now its probably in a database. A really good reason for using an RDBMS is that it not only hides implementation details (like endian conventions) from the application, but the interfaces (SQL, JDBC, ODBC, etc) typically provide field conversion facilities.
> There is nothing wrong with simply writing > the C structure to a file, and reading it in the same way. I'd probably use a CSV format any place where a database would be obvious overkill, but ymmv.
Using CSV rather than binary makes debugging easier and (said with his *NIX hat on) it allows the data to be handled by common scripted utilities like awk, perl and even shell scripts. Oh yeah, Java too :-)
 Signature martin@ | Martin Gregorie gregorie. | Essex, UK org |
Nigel Wade - 03 Sep 2007 10:11 GMT >> If the OP *MUST* move binary data, at least do it in a platform and >> language-independent manner and use ASN.1 encoding. [quoted text clipped - 5 lines] > it, or separate applications that used the same utility libraries as the > application which wrote the data. Pretty much all scientific data I have worked with over the past 25 years has been written in binary, and is intended to be read on just about any platform you'd care to use. The basic principle behind being able to do this is writing the binary data in a well structured form, in a reliable and portable way.
> There is nothing wrong with simply writing > the C structure to a file, and reading it in the same way. There is everything wrong with this. This is the fundamental problem. The amount of padding which is used internally within a struct is undefined by the language - it is entirely up to the compiler developer. If you write a struct in binary both the data *and the padding* will be output together, all intermingled. Further, since the amount of padding is at the discretion of the compiler writers they are free to change the amount they use in any release of their compiler. So you could quite easily find that an upgrade to the compiler causes your code, which you say is perfectly acceptable, to break even on the same hardware and OS.
> In this case > the code, and not some specification, drives the format of the data - and there > is *nothing* wrong with this. Yes there is. Code which writes unspecified data to a binary file is bad code. It will almost certainly break at some time in the future.
> The lack of a need to share the data outside of > the application is what often drives the decision to use binary data in the > first place (why not take advantage of the efficiency binary files have to > offer). But it is wise to know what is being written into your binary file so that you can reliably read it back in. Otherwise it's reverse GIGO, it's GOGI - garbage out, garbage in.
> Of course, every once in a while an outside user decides they want to use this > data. Well, then they have a choice. Either generate it themselves, or > spend a few hours writing something that can read it in - not a big price > to pay. But somewhat difficult if the original program's author didn't know what they were writing into their binary files. I
 Signature Nigel Wade, System Administrator, Space Plasma Physics Group, University of Leicester, Leicester, LE1 7RH, UK E-mail : nmw@ion.le.ac.uk Phone : +44 (0)116 2523548, Fax : +44 (0)116 2523555
Lew - 03 Sep 2007 15:12 GMT ~kurt wrote:
>> There is nothing wrong with simply writing >> the C structure to a file, and reading it in the same way.
> There is everything wrong with this. This is the fundamental problem. The amount > of padding which is used internally within a struct is undefined by the [quoted text clipped - 5 lines] > causes your code, which you say is perfectly acceptable, to break even on the > same hardware and OS. A point which has been made several times in this thread.
>> In this case >> the code, and not some specification, drives the format of the data - and > there >> is *nothing* wrong with this.
> Yes there is. Code which writes unspecified data to a binary file is bad code. > It will almost certainly break at some time in the future. Most emphatically.
>> The lack of a need to share the data outside of >> the application is what often drives the decision to use binary data in the >> first place (why not take advantage of the efficiency binary files have to >> offer).
> But it is wise to know what is being written into your binary file so that you > can reliably read it back in. Otherwise it's reverse GIGO, it's GOGI - garbage > out, garbage in. Another point which has been made several times in this thread, in various ways.
>> Of course, every once in a while an outside user decides they want to use this >> data. Well, then they have a choice. Either generate it themselves, or >> spend a few hours writing something that can read it in - not a big price >> to pay.
> But somewhat difficult if the original program's author didn't know what they > were writing into their binary files. I Which is why we keep advising the OP (who seems to have lost interest in their question) to determine exactly what that format they're using, then to code to that specification. This point seems to have been lost repeatedly.
I would love for the OP to chime in and let us know that they've done this step. How 'bout it, Windsor.Locks? Any luck with that analysis? What did you find?
 Signature Lew
Nigel Wade - 03 Sep 2007 17:08 GMT > ~kurt wrote: >>> There is nothing wrong with simply writing [quoted text clipped - 11 lines] > > A point which has been made several times in this thread. I know.
But certain posters in the thread still seem to be lacking the necessary clue. So continuing to hit them again and again with the same clue-stick the message might eventually begin to sink in.
Maybe we need to introduce lines, write 1000 times (without using the cut-paste buffer): "I must not write C structs to binary files".
As to reading binary data, I prefer to use ByteBuffer to handle big-/little-endian issues. Although it might not be particularly efficient for reading large quantities of binary data it is convenient, reasonably transparent, and it's part of the standard API so should always be available.
 Signature Nigel Wade, System Administrator, Space Plasma Physics Group, University of Leicester, Leicester, LE1 7RH, UK E-mail : nmw@ion.le.ac.uk Phone : +44 (0)116 2523548, Fax : +44 (0)116 2523555
Mike Schilling - 29 Aug 2007 21:45 GMT > I am a C++ programmer, working on a java program. I need to read a > binary file using Java. [quoted text clipped - 17 lines] > Please give me some pointers, how do i read this using Java? Thanks. > BTW, those are not the variable names I use in my program. Java doesn't allow you to read into (or write from) a structure this way. Say you create a Java class:
class SomeData { long data1; short data2; short data3; long data4; }
Unlike in C or C++, there's really no defined order for the fields, and thus no way to issue one read that fills all of them. You need to read into each one individually. See java.io.DataInoutStream for how to do this.
Hunter Gratzner - 29 Aug 2007 22:06 GMT On Aug 29, 9:52 pm, Windsor.Lo...@gmail.com wrote:
> I am a C++ programmer, working on a java program. I need to read a > binary file using Java. [quoted text clipped - 15 lines] > > fread(&someData, 12, 1, inputFile); This is already a stupid idea in C++, since there is no guarantee that sizeof(SOME_DATA) == 12. Since this is a Java group I'd like to recommend that you consult some C++ resource regarding struct alignment and padding, data type size, and (network) byte order.
In Java (assuming you have fixed you C++ problem), one would read this e.g. with a DataInputStream:
/* * Read data using network byte-order, aka big-endian * byte-order (MSB first), and no padding/alignment * between the data. */ class Data { /* * Note, Java has no unsigned data types. * Therefore in this example I store the unsigned short * in a (signed) int, and the unsigned long in a BigInteger * Typically, in a carefully designed application this * can be avoided, but I do it here to avoid discussion * of using signed data types to handle unsigned types. */ private BigInteger data1; // data format: unsigned long64 private int data2; // data format: unsigned short16 private int data3; // data format: unsigned short16 private BigInteger data4; // data format: unsigned long64
public void read(DataInputStream in) throws IOException { byte ulong2big[] = new byte[5]; ulong2big[0] = 0; // ensure MSB is always zero, so // we get an unsigned interpretation // of the following 4 byte data // when converting the array to a // BigInteger
// Read four bytes and convert them to a BigInteger // In carefully designed applications a // data1 = in.readLong() would do. in.read(ulong2big, 1, 4); // TODO: check return value data1 = new BigInteger(ulong2big);
// Read the unsigned short into an int data2 = in.readUnisgnedShort(); // in.skipByte(...) in case padding needs to be skipped
data3 = in.readUnsignedShort(); // in.skipByte(...) in case padding needs to be skipped
// Read four bytes and convert them to a BigInteger // In carefully designed applications a // data4 = in.readLong() would do. in.read(ulong2big, 1, 4); // TODO: check return value data4 = new BigInteger(ulong2big); } }
Roedy Green - 29 Aug 2007 22:09 GMT >I am a C++ programmer, working on a java program. I need to read a >binary file using Java. see http://mindprod.com/applet/fileio.html
It will show you how to read big and little endian binary data.
 Signature Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Roedy Green - 29 Aug 2007 22:11 GMT On Wed, 29 Aug 2007 21:09:01 GMT, Roedy Green <see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted someone who said :
>see http://mindprod.com/applet/fileio.html > >It will show you how to read big and little endian binary data. IF you are trying to slew through records reading only a field or two per long record try nio. see http://mindprod.com/jgloss/nio.html
 Signature Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Windsor.Locks@gmail.com - 30 Aug 2007 00:47 GMT On Aug 29, 2:52 pm, Windsor.Lo...@gmail.com wrote:
> I am a C++ programmer, working on a java program. I need to read a > binary file using Java. [quoted text clipped - 18 lines] > Please give me some pointers, how do i read this using Java? Thanks. > BTW, those are not the variable names I use in my program. Thank you for all who tried to help. I got it working and in the interest of future programmers here is how I did it.
Of course this is my crappy program with crappy variable names etc, which I am going to rewrite. Also, the arr2long function is from here
http://www.captain.at/howto-java-convert-binary-data.php
public class Convert {
public static void main(String [] args) {
int crap = 0, doublecrap = 0, counter = 0;
try { String file = "/opt/workspace/blahblah/binary.file"; FileInputStream fis = new FileInputStream(file); DataInputStream dis = new DataInputStream(fis);
int numberBytes = 4; byte data1[] = new byte[numberBytes]; byte data2 [] = new byte[2]; byte data3 [] = new byte[2]; byte data4 [] = new byte[numberBytes];
while (true) {
int retval = dis.read(data1); dis.read(data2); dis.read(data3); dis.read(data4);
if(retval == -1) break;
long stuff = arr2long(data1, 0); long stuff1 = arr2long(data4, 0); System.out.println(stuff + " : " + stuff1); counter ++;
}
// fis.close(); } catch (IOException ioex) {
} finally { System.out.println("number of records read : " + counter); } }
public static long arr2long (byte[] arr, int start) { int i = 0; int len = 4; int cnt = 0; byte[] tmp = new byte[len]; for (i = start; i < (start + len); i++) { tmp[cnt] = arr[i]; cnt++; } long accum = 0; i = 0; for ( int shiftBy = 0; shiftBy < 32; shiftBy += 8 ) { accum |= ( (long)( tmp[i] & 0xff ) ) << shiftBy; i++; } return accum; } }
Lew - 30 Aug 2007 02:25 GMT >> Here is how I read it in C++, >> [quoted text clipped - 14 lines] >> >> Please give me some pointers, how do i read this using Java? Thanks. ...
> public class Convert { > > public static void main(String [] args) { > > int crap = 0, doublecrap = 0, counter = 0; etc.
> } > } java.nio.ByteOrder will help you if you use the java.nio package as Roedy suggested.
Please do not embed TABs in Usenet posts; it really fubars the alignment.
 Signature Lew
Roedy Green - 30 Aug 2007 04:15 GMT >Thank you for all who tried to help. I got it working and in the >interest of future programmers here is how I did it. You are trying to read little-endian data. It is a lot easier with LEDatastream.
float f = dis.readFloat(); double d = dis.readDouble(); int i = dis.readInt();
 Signature Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Windsor.Locks@gmail.com - 30 Aug 2007 15:26 GMT On Aug 29, 10:15 pm, Roedy Green <see_webs...@mindprod.com.invalid> wrote:
> On Wed, 29 Aug 2007 16:47:00 -0700, Windsor.Lo...@gmail.com wrote, > quoted or indirectly quoted someone who said : [quoted text clipped - 11 lines] > Roedy Green Canadian Mind Products > The Java Glossaryhttp://mindprod.com Well, that actually does not work. See the reply above by "shakah"
shakah - 30 Aug 2007 16:22 GMT On Aug 30, 10:26 am, Windsor.Lo...@gmail.com wrote:
> On Aug 29, 10:15 pm, Roedy Green <see_webs...@mindprod.com.invalid> > wrote: [quoted text clipped - 16 lines] > > Well, that actually does not work. See the reply above by "shakah" He's suggesting you use his "little-endian DataInputStream" class, where I'm guessing it would work: http://mindprod.com/jgloss/ledatinputstream.html
DRS.Usenet@sengsational.com - 31 Aug 2007 17:15 GMT I'm not sure if this is the same issue, but I'm trying to interpret numeric values out of a chunk of data as follows:
int toBinary theValue 124 1111100 3.8 63 111111 4 224 11100000 4.8 63 111111 4 63 111111 4 224 11100000 4.8 64 1000000 3.2 63 111111 4 244 11110100 5 124 1111100 3.8
I can read "int" out of my blob of data, and I ran toBinaryString on it just to visualize it. I manually typed "theValue" (that is what I KNOW the test data is). Can someone help me figure out what code to run in order to get "theValue"?
--Dale--
Roedy Green - 01 Sep 2007 05:01 GMT On Fri, 31 Aug 2007 09:15:55 -0700, "DRS.Usenet@sengsational.com" <DRS.Usenet@sengsational.com> wrote, quoted or indirectly quoted someone who said :
>int toBinary theValue >124 1111100 3.8 [quoted text clipped - 12 lines] >KNOW the test data is). Can someone help me figure out what code to >run in order to get "theValue"? If you get enough samples you can create a private static final double[] translate = new double[256]; to do the translation for you.
In what context did you see this code? It looks like it might be some sort of sound encoding technique. You can read up the specs on the encoding.
see http://mindprod.com/jgloss/sound.html to help get you started.
It might also be some sort of Huffman encoding. See http://mindprod.com/jgloss/huffman.html
 Signature Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|