> Can anyone make a recommendation on how to design a Java system that
> will make SQL queries on a mySQL database that results in very large
> data sets? What can be done to improve performance? Is hybernate
> better than using straight JDBC?
Please define "very large".
My application stores e-mail messages in a database. Its standard search
query uses a join over tables storing an e-mail body, correspondents and a
subject line. There's a many:many relationship between correspondents and
messages with the intersection table saying whether the correspondent is
sender or recipient of each message. Various 'where' clauses that limit
the search scope (by correspondent, title and date range) are added
depending on the search terms the user provides. All prime keys and search
terms are indexed.
- specifying no search terms gets all possible rows in the result set.
That retrieved 83,000 rows in 14 secs.
- specifying a correspondent's name with a "like '%david%" clause
returned 7,000 rows in 2 seconds.
I'm using Postgres 7.3, its matching JDBC driver and Java 6.05. The
machine isn't big: 512 MB RAM, 866 MHz and running Fedora 8.
The result set is used to populate the model underlying a JTable. The
query that populates the table only returns three displayable columns of
v/l text (To/from NAME, subject, date), i.e. enough to form a displayable
row in the table, plus a soft dbk. The much larger underlying structures
are only retrieved when selected from the table to display the full
message: it takes around a second to get all the data from a join on the
row containing the dbk, populate a JFrame and display it. This join is
over the four tables and typically accesses 6-10 rows in total.
I'm not claimimg this idea (store a minimal list and allow selecting from
it to pull back full details of the selected item) is the best solution
but it suits my application, hasn't caused memory problems so far and
gives acceptable performance.

Signature
martin@ | Martin Gregorie
gregorie. |
org | Zappa fan & glider pilot
Lew - 23 Apr 2008 02:48 GMT
> I'm using Postgres 7.3, its matching JDBC driver and Java 6.05. The
> machine isn't big: 512 MB RAM, 866 MHz and running Fedora 8.
BTW, that's a rather old version of PG. Apparently there have been
significant improvements since then. On the Postgres boards they don't even
regard version 7.4 as sufficiently current.
Also, PG is fully capable of using much more horsepower if you make it
available. I routinely read about RAID arrays and multi-GB database servers
for PG. If one is truly doing large queries, they might find 512 MB a little
skinny for the need.
One more advice: if data integrity is worth something to you, either turn of
write-back caching, or use a battery-backed RAID controller. If you suffer a
crash before things are fully written to the drive from the cache it could
seriously fubar your data.

Signature
Lew
>Hi,
>
[quoted text clipped - 8 lines]
>
>Terry
The best way to improve performance is to have small result sets. Do you
really need large results? Can you partition your queries?
Try to minimize things going across the network. I don't like stored
procedures, but you may need to consider them.
I don't think Hinernate will help you much for this.
Assuming large is really large, and you are talking about the query result,
not the size of the database.
Eric
Terry, 22.04.2008 20:39:
> Hi,
>
[quoted text clipped - 4 lines]
> performance? Is hybernate better than using
> straight JDBC?
I find hibernate is lot more complicated to be used in such a situation.
Straight JDBC gives you a lot more control over what is beeing fetched.
The question is: what do you want to do with the data set?
The strategies for displaying large data sets to the end user are completely different than processing large data sets in the backend.
Check the driver documentation, maybe the driver is first caching the complete result set in memory before delivering it to the caller.
Thomas
Arne Vajhøj - 24 Apr 2008 02:50 GMT
> Check the driver documentation, maybe the driver is first caching the
> complete result set in memory before delivering it to the caller.
At least older MySQL drivers does just that. I have not looked at
the newer.
Arne