Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / Databases / May 2005

Tip: Looking for answers? Try searching our database.

large queries: optimal settings (sqlserver, jtds)

Thread view: 
Mikee - 11 May 2005 15:31 GMT
Hi

I'm using java servlets with jdbc to query MS sqlserver.

Some of the queries can be large i.e. querying a table of 1 billion
rows and resultsets of 1 million rows. The queries
are all read only looping through the resultset up to a maximum
of 5 million rows. Sometimes the query is performed using
joined tables (select * from a,b where a.ib=b.id). A full
table scan on our hardware takes 20-30 minutes.

I'm using the sqlserver supplied driver and more recently trying the
jtds one.

So what are the best settings I should use for my connection
and statement objects.

Specifics:
i) When doing a statment.execute() does this only
return once the query has finished or are results passed
to the client on the fly.

ii) selectMethod - this option only seems to apply to the
sqlserver supplied driver. Default is "direct" but the docs
say this caches the enire result set to memory, so for large
queries use "cursor". However using direct I don't seem
to witness more memory being used up than when using cursor.
Speed performance seems about the same if the setFetchSize is not
set too small.

iii) Is there a way of roughly knowing waht setFetchSize to use.

iv) I want to use the statement.setQueryTimeout to halt long running
queries and put them into a queue. The queryTimeout only
seems to have an effect if the statement.execute has not
yet returned in the time set but this goes back to i). If the execute
returns has the query completely finished and are all the results
stored somewhere or is the query still running.  Logic tells me the
former but I've run a query that takes several minutes but results
start appearing after 30 secs or so.

Thanks
 Mike
Joe Weinstein - 11 May 2005 17:26 GMT
> Hi
>
[quoted text clipped - 4 lines]
> are all read only looping through the resultset up to a maximum
> of 5 million rows.

Hi. I'd have to say that you should probably spend whatever resources
you have to redesign this. It's not right for a servlet to select millions
of rows of data out of the DBMS. Operate on raw data where it is, in the
DBMS. Build your saw mill where the trees are. Only bring out data that a
user is going to need to look at. I helped change a payroll application
from external processing that pulled all the raw data out and processing
it, to moving the same algorithms to DBMS stored procedures. This changed
the system from requiring a 16-CPU HP box and taking 8 hours to run,
to taking under 50 minutes.
Joe Weinstein at BEA

 Sometimes the query is performed using
> joined tables (select * from a,b where a.ib=b.id). A full
> table scan on our hardware takes 20-30 minutes.
[quoted text clipped - 31 lines]
> Thanks
>   Mike
Mikee - 11 May 2005 23:02 GMT
Hi Joe

> Hi. I'd have to say that you should probably spend whatever resources

> you have to redesign this. It's not right for a servlet to select millions
> of rows of data out of the DBMS. Operate on raw data where it is, in the
> DBMS.

Point taken. Though the database will contain a list of astronomical
objects. Allowing the users to mine for rare objects is handled fairly
easily in SQL but users will still want to extract relatively large
subsets of the data to do their own complex analysis which would be
very difficult/impossible to do on the DBMS side. Allowing users to
extract millions of rows but restrict themselves to a subset of the
hundreds of parameters available for each object reduces the data size
significantly from Tb to
a few hundres of Mb.

Regards
 Mike
Joe Weinstein - 12 May 2005 00:04 GMT
> Hi Joe
>
[quoted text clipped - 15 lines]
> subsets of the data to do their own complex analysis which would be
> very difficult/impossible to do on the DBMS side.

You may be correct, but maybe not, too. If you find (or become) a SQL
guru that can do pivots, and generated SQL from macros and stuff like
that, you might be astonished at the power of SQL.

> Allowing users to
> extract millions of rows but restrict themselves to a subset of the
[quoted text clipped - 4 lines]
> Regards
>   Mike

Good luck then. Yes it is good to pare down what users want. For bulk
downloads I might investigate the DBMSes dumping facilities to a file format
rather than a direct servlet-to-DBMS transfer. In fact I wonder if the
raw data is non-volatile enough that the majority of it can reside in
simple OS files for rapid transfer, and have them updated periodically
from the DBMS?

Joe Weinstein at BEA


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.