Java Forum / General / February 2007
How to make the each looping concurrent thread to improve WHILE-loop performance?
www - 01 Feb 2007 13:40 GMT Hi,
I have a while-loop which loops 360 times. Each looping takes 100ms, so in total it takes 36 seconds, which is very long.
while(true) //looping 360 times { ....//code for preparation of the method calling in the end
doIt(); //this method takes time. It inserts data into database }
Right now, the flow is:
first looping -> second looping -> ..... -> 360th looping
I am wondering if I can make the loopings more or less concurrent so no need for next looping to wait for the previous looping ends:
first looping -> second looping -> ... 360th looping ->
Could you please give me some help? Thank you.
Patricia Shanahan - 01 Feb 2007 13:55 GMT > Hi, > [quoted text clipped - 21 lines] > > Could you please give me some help? Thank you. Is most of doIt's time spent waiting for the database insert? If so, there may be potential, depending on the capabilities of the database.
You will need to use multiple threads to run the doIt calls. At the other extreme from using a single thread to do all the calls, you could start a new thread for each call. However, that will probably involve more thread start overhead than is needed.
I think you will get better control over resources if you use the new java.util.Concurrent features. See the API documentation introduction to java.util.ThreadPoolExecutor.
Patricia
buggy - 01 Feb 2007 21:48 GMT doIt(); //this method takes time. It inserts data into database
Have doIt() store the information to be inserted into the database in an list
After the loop has completed, create an SQL prepared statement then loop through the saved list filling in the values into the prepared statement.
This will let the database engine compile the insert statement once, rahte than 360 times.
Lew - 01 Feb 2007 23:09 GMT > doIt(); //this method takes time. It inserts data into database > [quoted text clipped - 6 lines] > This will let the database engine compile the insert statement once, > rahte than 360 times. Keep an eye on transaction integrity with this approach if you are not auto-committing, because that could places all the inserts into one transaction. If you want them to individually commit then you would need to attend to that. OTOH, this is a powerful idiom when you do want all-or-nothing for a transaction.
- Lew
Daniel Pitts - 01 Feb 2007 23:03 GMT > Hi, > [quoted text clipped - 22 lines] > > Could you please give me some help? Thank you. First, look into using Batches instead of concurrency. If you find that you absolutely can't use batches, then look into java.util.concurrent.Executors <http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ Executors.html> It will help you create a set of threads workers. This helps two ways. One way is that you don't create 360 threads (that could cause serious resource problems). The other help is that you don't have to worry about queuing it up yourself, its built into the executors.
You may find that there are ways to speed up your database access.
Chris Uppal - 02 Feb 2007 12:08 GMT > First, look into using Batches instead of concurrency. The way you have capitalised "Batches" makes it sound as if there's a specific software package with that name which would help with this sort of problem. I haven't heard of one myself (and Google shows nothing obviously helpful); am I missing something interesting ?
-- chris
Daniel Pitts - 02 Feb 2007 17:36 GMT On Feb 2, 4:08 am, "Chris Uppal" <chris.up...@metagnostic.REMOVE- THIS.org> wrote:
> > First, look into using Batches instead of concurrency. > [quoted text clipped - 4 lines] > > -- chris Ah, sorry. I was struck by the RCM (Random Capitalisation Monster). When you are inserting or updating many rows in a database, you can often "batch" the process to improve throughput. Most database interfaces support batching.
Basically, the concept goes like this: 1. Start batch 2. Insert a bunch of rows 3. commit batch 4. --- All of the inserts get sent to the DB in one go.
This has the downside that you can't rely on side-effects of the inserts until after commit. Specifically, you can't get the auto- generated primary key for each insert.
Lew - 02 Feb 2007 18:29 GMT > Basically, the concept goes like this: > 1. Start batch > 2. Insert a bunch of rows > 3. commit batch > 4. --- All of the inserts get sent to the DB in one go. Pros:
- good use of connection and potentially of PreparedStatement to augment performance. - the only way to maintain consistency across related modifications. - one part of the transaction fails, the whole thing rolls back, if you're vigilant.
Cons:
- one part of the transaction fails, the whole thing rolls back, unless you're vigilant. - ties up a thread until it's all over. - ties up db resources (e.g., the connection) until it's all over.
> This has the downside that you can't rely on side-effects of the > inserts until after commit. Specifically, you can't get the auto- > generated primary key for each insert. The use of auto-generated items as keys is controversial, and at best fraught with peril. Thiw downside would not exist if one used real keys, i.e., columns that correspond to attributes of the model. Auto-generated values require special handling for data loads and unloads. Auto-generated values need to be kept hidden from the model domain.
There are apologists for the route of using only auto-generated values as keys. They feel the cited difficulties to be worth the effort.
There are those in the latter group who go beyond any justifiable use of auto-generated key values to assign single-column keys to multi-column-key (relationship) tables, those whose composite keys comprise only a concatenation of foreign-key references.
I used to use auto-generated keys all over the place. (Not in composite-key tables, however.) Now I'm in the natural-key (a.k.a., "real-key") camp.
- Lew
Chris Uppal - 05 Feb 2007 15:05 GMT [me:]
> > The way you have capitalised "Batches" makes it sound as if there's a > > specific software package with that name which would help with this > > sort of problem. [...]
> Ah, sorry. I was struck by the RCM (Random Capitalisation Monster). No problem.
But now I'm wondering if there's useful mileage in abstracting the batching pattern out into some sort of framework -- something like
interface BatchProcessor { void submitTask(Runnable action); void implementAbortBy(Runnable action); void implementCommitBy(Runnable action); void abort(); void commit(); .... }
(with extensions for threading and the like). Probably overkill, or at least over-engineering something simple, but... it might make more sense if the BatchProcessor were specific to use in DB contexts, since there is a fair amount of common extra semantics to be managed in such cases.
Hey ho.
-- chris
christopher@dailycrossword.com - 05 Feb 2007 09:09 GMT there are 2 more alternatives -- one is to save all the data as you loop and write it to the database once*without* using PreparedStatement, which still writes the data in order but only opens the database once. the other is use connection pooling, which can maintain an open connection. The point here is opening a database connection can be *very* slow. it should be easy to check and see if this is what is slowing you down.
> Hi, > [quoted text clipped - 22 lines] > > Could you please give me some help? Thank you. Patricia Shanahan - 05 Feb 2007 18:49 GMT > there are 2 more alternatives -- one is to save all the data as you > loop and write it to the database once*without* using [quoted text clipped - 3 lines] > connection can be *very* slow. it should be easy to check and see if > this is what is slowing you down. I now have a question that is very similar to this one.
I have some data I need to examine in many different ways. The main files, which represent one logical table, total a bit over 10GB, about 88 million lines of 123 bytes each.
I'm considering converting this to a MySQL database, and accessing it through Java.
What is the best way of inserting the 88 million rows in the main table? Do it in batches of some reasonable size?
Patricia
Alex Hunsley - 05 Feb 2007 19:50 GMT >> there are 2 more alternatives -- one is to save all the data as you >> loop and write it to the database once*without* using [quoted text clipped - 15 lines] > What is the best way of inserting the 88 million rows in the main > table? Do it in batches of some reasonable size? Yup, I've done something similar before. For loading a large database, it's worth spending some time benchmarking what an efficient 'load chunk' size is (for the method, or methods, you are using for your load).
> Patricia Chris Uppal - 07 Feb 2007 17:10 GMT > I have some data I need to examine in many different ways. The main > files, which represent one logical table, total a bit over 10GB, about [quoted text clipped - 5 lines] > What is the best way of inserting the 88 million rows in the main > table? Do it in batches of some reasonable size? If you haven't already, then I suggest you look into "bulk load" or "bulk insert". Some links (for MySQL) http://dev.mysql.com/doc/refman/5.1/en/load-data.html http://dev.mysql.com/doc/refman/5.1/en/insert-speed.html
(There's a comment from a "Nathan Huebner" near the bottom of the first page which describes how he loaded data with fixed size columns but without separators using LOAD DATA INFILE.)
Also consider "standard" tricks like turning off all indexing, triggers, referential integrity constraints, etc, while doing the insert.
Again, if you haven't already, then its worth considering whether you require transactional integrity on the DB you're building. Presumably MySQL works faster for non-transactional table-types. http://dev.mysql.com/doc/refman/5.1/en/storage-engine-compare-transactions.html
-- chris
Patricia Shanahan - 07 Feb 2007 17:52 GMT >>I have some data I need to examine in many different ways. The main >>files, which represent one logical table, total a bit over 10GB, about [quoted text clipped - 14 lines] > which describes how he loaded data with fixed size columns but without > separators using LOAD DATA INFILE.) Yup, I tracked down LOAD DATA INFILE after posting, and that seems to be the way to go. I've converted my text file to tab delimited columns, newline at end of row, and loaded up an extract that way.
> Also consider "standard" tricks like turning off all indexing, triggers, > referential integrity constraints, etc, while doing the insert. [quoted text clipped - 3 lines] > faster for non-transactional table-types. > http://dev.mysql.com/doc/refman/5.1/en/storage-engine-compare-transactions.html Thanks for the tips. I'm mining a fixed body of data. Once I get it loaded I don't plan to change the table contents, so I don't see any need at all for transactional integrity.
Patricia
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|