Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / February 2007

Tip: Looking for answers? Try searching our database.

How to make the each looping concurrent thread to improve WHILE-loop performance?

Thread view: 
www - 01 Feb 2007 13:40 GMT
Hi,

I have a while-loop which loops 360 times. Each looping takes 100ms, so
in total it takes 36 seconds, which is very long.

while(true) //looping 360 times
{
    ....//code for preparation of the method calling in the end

    doIt();  //this method takes time. It inserts data into database
}

Right now, the flow is:

first looping -> second looping -> ..... -> 360th looping

I am wondering if I can make the loopings more or less concurrent so no
need for next looping to wait for the previous looping ends:

first looping ->
second looping ->
...
360th looping ->

Could you please give me some help? Thank you.
Patricia Shanahan - 01 Feb 2007 13:55 GMT
> Hi,
>
[quoted text clipped - 21 lines]
>
> Could you please give me some help? Thank you.

Is most of doIt's time spent waiting for the database insert? If so,
there may be potential, depending on the capabilities of the database.

You will need to use multiple threads to run the doIt calls. At the
other extreme from using a single thread to do all the calls, you could
start a new thread for each call. However, that will probably involve
more thread start overhead than is needed.

I think you will get better control over resources if you use the new
java.util.Concurrent features. See the API documentation introduction to
java.util.ThreadPoolExecutor.

Patricia
buggy - 01 Feb 2007 21:48 GMT
doIt();  //this method takes time. It inserts data into database

Have doIt() store the information to be inserted into the database in an
list

After the loop has completed, create an SQL prepared statement then loop
through the saved list filling in the values into the prepared statement.

This will let the database engine compile the insert statement once,
rahte than 360 times.
Lew - 01 Feb 2007 23:09 GMT
> doIt();  //this method takes time. It inserts data into database
>
[quoted text clipped - 6 lines]
> This will let the database engine compile the insert statement once,
> rahte than 360 times.

Keep an eye on transaction integrity with this approach if you are not
auto-committing, because that could places all the inserts into one
transaction. If you want them to individually commit then you would need to
attend to that. OTOH, this is a powerful idiom when you do want all-or-nothing
for a transaction.

- Lew
Daniel Pitts - 01 Feb 2007 23:03 GMT
> Hi,
>
[quoted text clipped - 22 lines]
>
> Could you please give me some help? Thank you.

First, look into using Batches instead of concurrency.
If you find that you absolutely can't use batches, then look into
java.util.concurrent.Executors
<http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/
Executors.html>
It will help you create a set of threads workers. This helps two ways.
One way is that you don't create 360 threads (that could cause serious
resource problems). The other help is that you don't have to worry
about queuing it up yourself, its built into the executors.

You may find that there are ways to speed up your database access.
Chris Uppal - 02 Feb 2007 12:08 GMT
> First, look into using Batches instead of concurrency.

The way you have capitalised "Batches" makes it sound as if there's a specific
software package with that name which would help with this sort of problem.  I
haven't heard of one myself (and Google shows nothing obviously helpful); am I
missing something interesting ?

   -- chris
Daniel Pitts - 02 Feb 2007 17:36 GMT
On Feb 2, 4:08 am, "Chris Uppal" <chris.up...@metagnostic.REMOVE-
THIS.org> wrote:
> > First, look into using Batches instead of concurrency.
>
[quoted text clipped - 4 lines]
>
>     -- chris

Ah, sorry. I was struck by the RCM (Random Capitalisation Monster).
When you are inserting or updating many rows in a database, you can
often "batch" the process to improve throughput.  Most database
interfaces support batching.

Basically, the concept goes like this:
1. Start batch
2. Insert a bunch of rows
3. commit batch
4. --- All of the inserts get sent to the DB in one go.

This has the downside that you can't rely on side-effects of the
inserts until after commit. Specifically, you can't get the auto-
generated primary key for each insert.
Lew - 02 Feb 2007 18:29 GMT
> Basically, the concept goes like this:
> 1. Start batch
> 2. Insert a bunch of rows
> 3. commit batch
> 4. --- All of the inserts get sent to the DB in one go.

Pros:

- good use of connection and potentially of PreparedStatement to augment
performance.
- the only way to maintain consistency across related modifications.
- one part of the transaction fails, the whole thing rolls back, if you're
vigilant.

Cons:

- one part of the transaction fails, the whole thing rolls back, unless you're
vigilant.
- ties up a thread until it's all over.
- ties up db resources (e.g., the connection) until it's all over.

> This has the downside that you can't rely on side-effects of the
> inserts until after commit. Specifically, you can't get the auto-
> generated primary key for each insert.

The use of auto-generated items as keys is controversial, and at best fraught
with peril. Thiw downside would not exist if one used real keys, i.e., columns
that correspond to attributes of the model. Auto-generated values require
special handling for data loads and unloads. Auto-generated values need to be
kept hidden from the model domain.

There are apologists for the route of using only auto-generated values as
keys. They feel the cited difficulties to be worth the effort.

There are those in the latter group who go beyond any justifiable use of
auto-generated key values to assign single-column keys to multi-column-key
(relationship) tables, those whose composite keys comprise only a
concatenation of foreign-key references.

I used to use auto-generated keys all over the place. (Not in composite-key
tables, however.) Now I'm in the natural-key (a.k.a., "real-key") camp.

- Lew
Chris Uppal - 05 Feb 2007 15:05 GMT
[me:]
> > The way you have capitalised "Batches" makes it sound as if there's a
> > specific software package with that name which would help with this
> > sort of problem.
[...]
> Ah, sorry. I was struck by the RCM (Random Capitalisation Monster).

No problem.

But now I'm wondering if there's useful mileage in abstracting the batching
pattern out into some sort of framework -- something like

   interface BatchProcessor
   {
       void submitTask(Runnable action);
       void implementAbortBy(Runnable action);
       void implementCommitBy(Runnable action);
       void abort();
       void commit();
       ....
   }

(with extensions for threading and the like).  Probably overkill, or at least
over-engineering something simple, but...    it might make more sense if the
BatchProcessor were specific to use in DB contexts, since there is a fair
amount of common extra semantics to be managed in such cases.

Hey ho.

   -- chris
christopher@dailycrossword.com - 05 Feb 2007 09:09 GMT
there are 2 more alternatives -- one is to save all the data as you
loop and write it to the database once*without* using
PreparedStatement, which still writes the data in order but only opens
the database once.  the other is use connection pooling, which can
maintain an open connection.  The point here is opening a database
connection can be *very* slow.  it should be easy to check and see if
this is what is slowing you down.

> Hi,
>
[quoted text clipped - 22 lines]
>
> Could you please give me some help? Thank you.
Patricia Shanahan - 05 Feb 2007 18:49 GMT
> there are 2 more alternatives -- one is to save all the data as you
> loop and write it to the database once*without* using
[quoted text clipped - 3 lines]
> connection can be *very* slow.  it should be easy to check and see if
> this is what is slowing you down.

I now have a question that is very similar to this one.

I have some data I need to examine in many different ways. The main
files, which represent one logical table, total a bit over 10GB, about
88 million lines of 123 bytes each.

I'm considering converting this to a MySQL database, and accessing it
through Java.

What is the best way of inserting the 88 million rows in the main
table? Do it in batches of some reasonable size?

Patricia
Alex Hunsley - 05 Feb 2007 19:50 GMT
>> there are 2 more alternatives -- one is to save all the data as you
>> loop and write it to the database once*without* using
[quoted text clipped - 15 lines]
> What is the best way of inserting the 88 million rows in the main
> table? Do it in batches of some reasonable size?

Yup, I've done something similar before.
For loading a large database, it's worth spending some time benchmarking
what an efficient 'load chunk' size is (for the method, or methods, you
are using for your load).

> Patricia
Chris Uppal - 07 Feb 2007 17:10 GMT
> I have some data I need to examine in many different ways. The main
> files, which represent one logical table, total a bit over 10GB, about
[quoted text clipped - 5 lines]
> What is the best way of inserting the 88 million rows in the main
> table? Do it in batches of some reasonable size?

If you haven't already, then I suggest you look into "bulk load" or "bulk
insert".  Some links (for MySQL)
   http://dev.mysql.com/doc/refman/5.1/en/load-data.html
   http://dev.mysql.com/doc/refman/5.1/en/insert-speed.html

(There's a comment from a "Nathan Huebner" near the bottom of the first page
which describes how he loaded data with fixed size columns but without
separators using LOAD DATA INFILE.)

Also consider "standard" tricks like turning off all indexing, triggers,
referential integrity constraints, etc, while doing the insert.

Again, if you haven't already, then its worth considering whether you require
transactional integrity on the DB you're building.  Presumably MySQL works
faster for non-transactional table-types.
http://dev.mysql.com/doc/refman/5.1/en/storage-engine-compare-transactions.html

   -- chris
Patricia Shanahan - 07 Feb 2007 17:52 GMT
>>I have some data I need to examine in many different ways. The main
>>files, which represent one logical table, total a bit over 10GB, about
[quoted text clipped - 14 lines]
> which describes how he loaded data with fixed size columns but without
> separators using LOAD DATA INFILE.)

Yup, I tracked down LOAD DATA INFILE after posting, and that seems to be
the way to go. I've converted my text file to tab delimited columns,
newline at end of row, and loaded up an extract that way.

> Also consider "standard" tricks like turning off all indexing, triggers,
> referential integrity constraints, etc, while doing the insert.
[quoted text clipped - 3 lines]
> faster for non-transactional table-types.
> http://dev.mysql.com/doc/refman/5.1/en/storage-engine-compare-transactions.html

Thanks for the tips. I'm mining a fixed body of data. Once I get it
loaded I don't plan to change the table contents, so I don't see any
need at all for transactional integrity.

Patricia


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.