Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / November 2007

Tip: Looking for answers? Try searching our database.

Read a file multiple times

Thread view: 
Federico - 17 Nov 2007 13:55 GMT
I, I've this code:

public class Main {

   public static void main(String[] args) {

       try {

           FileWriter fstream = new FileWriter("out.txt");

           BufferedWriter out = new BufferedWriter(fstream);

           FileReader input = new FileReader("in1.txt");

           BufferedReader bufRead = new BufferedReader(input);

           FileReader input2 = new FileReader("in2.txt");

           BufferedReader bufRead2 = new BufferedReader(input2);

           String line;

           String line2;

           line = bufRead.readLine();

           line2 = bufRead2.readLine();

           bufRead2.mark(7000000);

           while (line != null) {
               while(line2 != null) {
                   out.write(line + line2 + "\n");
                   line2 = bufRead2.readLine();
               }

               line = bufRead.readLine();
               bufRead2.reset();
           }

           bufRead.close();
           bufRead2.close();
       } catch (ArrayIndexOutOfBoundsException e) {
           e.printStackTrace();

       } catch (IOException e) {
           e.printStackTrace();
       }
}
}

Basically I want to read the file in1 and for each string founded,
combine it with all the in2 strings and write all in out:
For example:
in1.txt:

aaa
bbb
ccc
ddd

in2.txt:

eee
fff
ggg
hhh
iii

out.txt:

aaaeee
aaafff
aaaggg
aaahhh
aaaiii
bbbeee
bbbfff
...
dddiii

This will implement that for each line in in1.txt the bufferedreader
of in2.txt will be reset.
Obviusly this work onli for the first string of in1.txt.
Hi read the documentation for mark() and reset() but I can't solve
nothing with these methods.
I've to use vectors?
Maybe is better to use fileinputstream?

Please help me.

ps: I'm out of scholl and this is not a trick for solve homeworks (I
hate that) and I really try to solve myself but I can't.

cheer,
Federico.
Patricia Shanahan - 17 Nov 2007 14:21 GMT
> I, I've this code:
>
[quoted text clipped - 84 lines]
> I've to use vectors?
> Maybe is better to use fileinputstream?

The choice depends on the file size, relative to the available memory.
The simplest solution is going to be to read one of the files into an
in-memory data structure, such as an ArrayList. Once you have done that,
you can read the other file a line at a time and output all pairs for
that line.

The fact that outputting all pairs seems reasonable to you suggests that
at least one of the files is reasonably small. For example, if the
smaller file contains a million lines, the list of pairs has at least
10**12 elements.

However, if neither file fits in memory, you are going to have to open
one of them as a RandomAccessFile. For each line of the other file, you
need to seek(0) in the RandomAccessFile and use readLine to advance
through it a line at a time.

Patricia
Federico - 17 Nov 2007 17:54 GMT
First thanks for your response;
nope, the file is relatively large and do not fit in memory :(
Yes I've had in mind to open as a random access file.
If you have the time can you show me a little general example to do
it?

Again thanks!

Federico.
Jim Korman - 18 Nov 2007 03:07 GMT
>First thanks for your response;
>nope, the file is relatively large and do not fit in memory :(
[quoted text clipped - 5 lines]
>
>Federico.

Federico, Using a RandomAccessFile as Patrica mentioned is easy

// Open for read only "r"
RandomAccessFile myFile = new RandomAccessFile("myfile.dat","r");

// read some data
String line = myfile.readLine();

// If you want to "mark" your current position
long markPos = myfile.getFilePointer();

// And to go back to that position
myfile.seek(markPos);

Jim
Mark Space - 20 Nov 2007 03:25 GMT
> First thanks for your response;
> nope, the file is relatively large and do not fit in memory :(

Yuck.

What kind of madness is this program?  Are you just testing or did
someone actually think this is a good idea and pay you for it?
Federico - 17 Nov 2007 17:57 GMT
Eheh... sorry for the *very* bad English in the first post! :)

Federico.
Martin Gregorie - 20 Nov 2007 14:28 GMT
There's a simple solution to this problem: use a relational database.
Load each list into a separate table and then do a join. The result is
exactly what you want.

Here's a test script I ran using Postgres:
==========================================
create table atab ( a char(3) );
create table btab ( b char(3) );
insert into atab ( a ) values ( 'aaa' );
insert into atab ( a ) values ( 'bbb' );
insert into atab ( a ) values ( 'ccc' );
insert into atab ( a ) values ( 'ddd' );
select a from atab;

insert into btab ( b ) values ( 'eee' );
insert into btab ( b ) values ( 'fff' );
insert into btab ( b ) values ( 'ggg' );
insert into btab ( b ) values ( 'hhh' );
insert into btab ( b ) values ( 'iii' );

select b from btab;

select a,b from atab, btab;

drop table atab;
drop table btab;

and here is the output from the three SELECT statements:
========================================================
  a
-----
 aaa
 bbb
 ccc
 ddd
(4 rows)

  b
-----
 eee
 fff
 ggg
 hhh
 iii
(5 rows)

  a  |  b
-----+-----
 aaa | eee
 aaa | fff
 aaa | ggg
 aaa | hhh
 aaa | iii
 bbb | eee
 bbb | fff
 bbb | ggg
 bbb | hhh
 bbb | iii
 ccc | eee
 ccc | fff
 ccc | ggg
 ccc | hhh
 ccc | iii
 ddd | eee
 ddd | fff
 ddd | ggg
 ddd | hhh
 ddd | iii
(20 rows)

...and the beauty of this approach is that the file sizes are only
limited by the disk space available for the database. Its fast too:

$ time psql -f dbtest.sql >dbtest.txt


real    0m0.088s
user    0m0.002s
sys     0m0.011s

This was run on a 256 MB, 866 MHz NetVista running Postgres 8.2.5 under
Linux (Fedora Core 7).

Obligatory Java content: If this is written in Java you'd use JDBC to
insert the files into the tables and write the file containing the
output. If you use the Derby database you can have an all-Java solution.
I ran the test under Postgres because that's what I have installed.

However, there may be faster approaches. Its likely that the database
has table loading utilities that will be at least as fast as anything
you can write (e.g. the Postgres COPY verb). If the output must simply
go to a file a script may handle this too: I'd probably just pipe the
SELECT output through gawk to adjust the line format and remove the
header and trailer lines.

Signature

martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.