> I am performing millions of inserts using PreparedStatements and
> executing them in batches. My question is simply how big should my
> batch be?
IMHO there's no general answer to this as this depends on the driver, the
JVM, amount of mem you have available, the database engine, resources on
the db server, DDL, indexes...
> Im assuming that attempting to batch millions of them at a time is
> bad from a GC point of view ... they are going to be taking up memory
> for long periods of time (until the batch "is full") causing the GC
> to lots of slow full (old-generation) collections.
There's a completely different set of problems: if your batch grows too
large, the database transaction handling might choke. The larger your
batches the more likely that you reach limits on the rollback segments
(Oracle) or run into locking issues.
> So has anyone got any research/experience that suggests an upperlimit
> on batch size? Or perhaps its better for the batch to be executed
> periodically rather than when a certain number of statements have
> been added?
This is a typical optimization task. I'm afraid you'll have to measure
your performance and find out the optimum for *your* application case. I
guess the range will be between 100 and several tenthousands - but this is
just a wild guess.
Kind regards
robert
Ellarco - 26 Jan 2006 13:52 GMT
>> I am performing millions of inserts using PreparedStatements and
>> executing them in batches. My question is simply how big should my
[quoted text clipped - 27 lines]
>
> robert
As I feared. I would just experiment with it but the executions are slow (min.
30 mins) so I was hoping someone would hazard a concrete guess.
Thanks for the prompt reply.
Roedy Green - 27 Jan 2006 00:41 GMT
>This is a typical optimization task. I'm afraid you'll have to measure
>your performance and find out the optimum for *your* application case. I
>guess the range
Are you familiar with the concept of binary search to home in on the
rough optimum?
see http://mindprod.com/jgloss/binarysearch.html
you do manually what binarysearch does to find a item in an ordered
table.

Signature
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.
Robert Klemme - 27 Jan 2006 08:24 GMT
>> This is a typical optimization task. I'm afraid you'll have to
>> measure your performance and find out the optimum for *your*
[quoted text clipped - 7 lines]
> you do manually what binarysearch does to find a item in an ordered
> table.
And the nice thing: you can even automate it. :-)
robert
Ellarco - 27 Jan 2006 10:34 GMT
>> This is a typical optimization task. I'm afraid you'll have to measure
>> your performance and find out the optimum for *your* application case. I
[quoted text clipped - 7 lines]
> you do manually what binarysearch does to find a item in an ordered
> table.
Yeah, it did occur to me. What put me off when I started to consider a
scientific approach is the number of parameters there are that can be tweaked
... batch size, heap generation sizes, and a host of buffers I keep in my
program. If Im going to estimate a suitable value for one I should really
estimate a suitable value for all. Its all making my head swirl.
Suffice it to say that extremely large batches are not helpful as the GC becomes
too frequent. I have revised the size down substantially (O(10000)) and there is
some noticeable speed up and whats more it can run in a much smaller heap.
Thanks.