Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / Databases / March 2005

Tip: Looking for answers? Try searching our database.

JDBC and ResultSet problem

Thread view: 
Peter L Skoglund - 09 Mar 2005 09:41 GMT
Dear all,

When trying to output a rather large database (around 10M rows) to a
textfile I get a memory problem that I guess has to do with the
ResaultSet (see a part of the code below). Is there any way in java to
get data direcly from the db NOT using ResultSet, something like a
datareader or connect a stream directly to the db allowing to read
line by line or even smaller parts? In my case there is no way I can
extend the memory sapce.

Regards
Peter

/* ----------------------- BEGIN CODE PART ----------------------- */
Class.forName(driverName);
           System.out.println("Opening db connection");
           connection = DriverManager.getConnection(url, user,
passwd);
           stmt = connection.createStatement();
           //Execute query
           try {
           //TODO Find a better way to stream data, NOT using
ResultSet
               stmt.getResultSet();
               rs = stmt.executeQuery(query);
               System.out.println("ResultSet OK");
               //Write information to file
               FileOutputStream out;
               PrintStream p;
              
               try
               {
                   out = new FileOutputStream(fileName);
                   System.out.println("Printing data to file...");
                   p = new PrintStream(out);
                  
                   while (rs.next())
                   {
                       p.println(rs.getString(1));
                       myrowcounter +=1;                    
                       }
                   p.close();
                   System.out.println("File printed with " + myrowcounter + "
rows");
                   }
/* ------------------------ END CODE PART ------------------------ */
joeNOSPAM@BEA.com - 09 Mar 2005 17:06 GMT
Hi. All DBMSes offer APIs to get data from the DBMS. What
DBMS and driver are you using? You shouldn't be hitting any
memory issue. Typically the result set only buffers a small amount
in the client unless you ask for a scrollable result set.
Joe Weinstein at BEA

PS: remove your line with the getResultSet() call. It doesn't do
anything and you're ignoring the return (if any).
Peter L Skoglund - 09 Mar 2005 21:33 GMT
Strange, because when setting a limit to the sql statement to 100, it works
fine. Without the limitation I get "Exception in thread "main"
java.lang.OutOfMemoryError: Java heap space", and setting -Xmx to allocate
256M doesn't help (even tried 400M)

I'm using 'mysql-connector-java-3.1.7'

/Peter

> Hi. All DBMSes offer APIs to get data from the DBMS. What
> DBMS and driver are you using? You shouldn't be hitting any
[quoted text clipped - 4 lines]
> PS: remove your line with the getResultSet() call. It doesn't do
> anything and you're ignoring the return (if any).
joeNOSPAM@BEA.com - 09 Mar 2005 22:49 GMT
what limit is this you're setting? SetFetchSize() or setMaxRows()?
Joe
Peter L Skoglund - 10 Mar 2005 07:43 GMT
I'm just set the select statement to return first 100 rows by doing:
resultset = statement.executeQuery("select * from mytable limit 100");
/Peter

> what limit is this you're setting? SetFetchSize() or setMaxRows()?
> Joe
Silvio Bierman - 10 Mar 2005 01:52 GMT
Hello Peter,

I think you are in trouble because of the way MySQL handles resultsets.
Where most DBMSs use a streaming-like mechanism when returning (possibly
large) resultsets delivering the first rows fast without completely
evaluating the query until more rows are fetched MySQL seems to do a
complete resultset evaluation before returning the first row. Also, the
driver on the client side seems to have to collect the whole resultset
before delivering rows to the application code. Don't ask me why...

I have had these problems with MySQL also. Running on SQLServer, Oracle or
SapDB (now MaxDB) never resulted in any problems using exactly the same
application code (and database content).

Regards,

Silvio Bierman
Daniel Dittmar - 10 Mar 2005 09:29 GMT
> I think you are in trouble because of the way MySQL handles resultsets.
> Where most DBMSs use a streaming-like mechanism when returning (possibly
[quoted text clipped - 3 lines]
> driver on the client side seems to have to collect the whole resultset
> before delivering rows to the application code. Don't ask me why...

I think MySQL allowed/allows only one open cursor at a time. Reading all
the data into memory was probably the easiest way to simulate multiple
open cursors in the driver.

Daniel
Lee Fesperman - 10 Mar 2005 19:34 GMT
> > I think you are in trouble because of the way MySQL handles resultsets.
> > Where most DBMSs use a streaming-like mechanism when returning (possibly
[quoted text clipped - 7 lines]
> the data into memory was probably the easiest way to simulate multiple
> open cursors in the driver.

The MySQL protocol only allows this, but their current JDBC driver provides several
workarounds. Check the connection properties for com.mysql.jdbc.JdbcDriver.

Signature

Lee Fesperman, FFE Software, Inc. (http://www.firstsql.com)
==============================================================
* The Ultimate DBMS is here!
* FirstSQL/J Object/Relational DBMS  (http://www.firstsql.com)

Mark Matthews - 11 Mar 2005 18:13 GMT
>>>I think you are in trouble because of the way MySQL handles resultsets.
>>>Where most DBMSs use a streaming-like mechanism when returning (possibly
[quoted text clipped - 10 lines]
> The MySQL protocol only allows this, but their current JDBC driver provides several
> workarounds. Check the connection properties for com.mysql.jdbc.JdbcDriver.

Guys,

Lee's correct. Back a long time ago, I implemented the JDBC driver to
cover the 80%+ use case, which was small-to-medium sized result sets
with multiple open cursors (the multiple open cursors part is required
by the JDBC spec), so therefore the JDBC driver does buffer the complete
result set in memory by default.

For the use case just described, this is usually a better option because
it greatly increases concurrency at the database server, since clients
aren't holding onto locks while they process results. It also requires
potentially a lot less network traffic to do things this way (no
request/response for each batch of rows).

You can read a row-at-a-time with only one open statement if you do the
following non-standard (to JDBC) setup on your statement:

stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
             java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

(btw, this is in the docs that came with your driver or available at
http://dev.mysql.com/doc/connector/j/en/cj-implementation-notes.html)

Starting with Connector/J 3.2 and MySQL-5.0, Statement.setFetchSize(> 0)
on any statement will use fetch-n-rows-at-a-time functionality that is
in MySQL-5.0.

(btw, many JDBC drivers for SQL Server, including the ones shipped by MS
actually do something similar, if you ask for another result set and are
using cursors to fetch them rather than 'firehose' mode, they actually
open _another_connection_ to the database!)

    -Mark
ableofhighheart@gmail.com - 11 Mar 2005 20:00 GMT
Alternatively, you can try to increase the maximum Java heap size.

For example:

java -Xmx512m com.YourMain

> >>>I think you are in trouble because of the way MySQL handles resultsets.
> >>>Where most DBMSs use a streaming-like mechanism when returning (possibly
[quoted text clipped - 45 lines]
>
>     -Mark
Lee Fesperman - 11 Mar 2005 23:04 GMT
> >>I think MySQL allowed/allows only one open cursor at a time. Reading all
> >>the data into memory was probably the easiest way to simulate multiple
[quoted text clipped - 10 lines]
>
> ....

Thanks, I was just filling in until you showed up.

> (btw, many JDBC drivers for SQL Server, including the ones shipped by MS
> actually do something similar, if you ask for another result set and are
> using cursors to fetch them rather than 'firehose' mode, they actually
> open _another_connection_ to the database!)

Originally, the SQL Server protocol for both MS and Sybase only supported 'firehose'
mode. Our web language, Jive (available under GPL at http://www.firstsql.com/jive/),
supports nested queries by opening a new (internal) connection for the nested query just
for compatibility with this oddity in SQL Server's ODBC driver (Jive automatically
tracks open connections and closes them when no longer needed). Since it's GPL, you get
the source and can see for yourself this 'magic' ;^)

MS SQL Server now supports "server-side" cursors that doesn't use firehose mode. I don't
know if Sybase does yet.

Note: Firehose mode can be more efficient because it avoid the response overhead.
However, it normally sends each row as a separate packet which adds its own overhead.
The driver for FirstSQL/J (see my sig) uses the setFetchSize() hint in JDBC to batch
multiple rows in a packet to give a kind of 'best of both worlds' implementation. I'm
sure drivers for other databases provide similar functionality.

Signature

Lee Fesperman, FFE Software, Inc. (http://www.firstsql.com)
==============================================================
* The Ultimate DBMS is here!
* FirstSQL/J Object/Relational DBMS  (http://www.firstsql.com)

Silvio Bierman - 14 Mar 2005 09:21 GMT
Hello Lee,

Several MSSQL JDBC drivers employ the same scheme, they create multiple
connections transparantly and use those for the multiple rowsets. We where
extremely bitten by this behaviour once using the JTDS driver.

Where multiple statements on the same connection do not suffer from locking
issues caused by MSSQLs tendency to lock al SELECT-ed records statements on
seperate connections do. Also, transactions only work on single connections
so the begin-transaction/rollback/commit logic is broken by this also.

Luckily we found out about this quickly and where able to setup a different
driver that supported dynamic cursors fixing the problem. I think it is a
very dangerous default behaviour of a JDBC driver...

Regards,

Silvio Bierman

> > >>I think MySQL allowed/allows only one open cursor at a time. Reading all
> > >>the data into memory was probably the easiest way to simulate multiple
[quoted text clipped - 39 lines]
> * The Ultimate DBMS is here!
> * FirstSQL/J Object/Relational DBMS  (http://www.firstsql.com)
Alin Sinpalean - 14 Mar 2005 12:45 GMT
> Several MSSQL JDBC drivers employ the same scheme, they create multiple
> connections transparantly and use those for the multiple rowsets. We where
> extremely bitten by this behaviour once using the JTDS driver.

I need to make an observation here. jTDS now implements something
similar to the MySQL JDBC driver, only with a minor difference: the
ResultSet isn't cached into memory _unless_ another query is executed
on another Statement on the same Connection (data is also cached to
disk after a certain threshold is reached so you won't run out of
memory). This means that in the most common case, when you're
retrieving a ResultSet from the DBMS without doing anything else at the
same time on the same Connection, no caching is needed so memory
problems are avoided. And, of course, no extra connections have to be
created anymore, so no locking occurs and transaction semantics are
respected.

Also, when creating a scrollable/updateable ResultSet a server side
cursor is created and this ensures real concurrency for multiple
Statements (read a few rows here, a few rows there, controllable
through setFetchSize() ).

Alin,
The jTDS Project.
Silvio Bierman - 14 Mar 2005 15:09 GMT
Hello Alin,

I should have mentioned this problem occurred with an older version of the
jTDS driver. I have been following your advances and know that you have
solved that problem in the current versions.

I would like a configuration level property of specifying that I want server
side cursor, changing the code to scrollable resultsets is not acceptable
for me since I do not need them and don't know what other databases might do
when I specify such cursor attributes.

Is it not possible to change default read cursors to be server side by
specifying this in the connect-string. If so I would probably not hesitate
to switch back to the jTDS driver (using the MS-driver now).

Regards,

Silvio Bierman

> > Several MSSQL JDBC drivers employ the same scheme, they create
> multiple
[quoted text clipped - 21 lines]
> Alin,
> The jTDS Project.
Alin Sinpalean - 15 Mar 2005 11:23 GMT
> I should have mentioned this problem occurred with an older version of the
> jTDS driver. I have been following your advances and know that you have
> solved that problem in the current versions.

No problem. I just wanted to let everybody else know about this.

> I would like a configuration level property of specifying that I want server
> side cursor, changing the code to scrollable resultsets is not acceptable
[quoted text clipped - 4 lines]
> specifying this in the connect-string. If so I would probably not hesitate
> to switch back to the jTDS driver (using the MS-driver now).

It would be possible to have a parameter to specify whether forward
only server side cursors should be used instead of firehose cursors
(i.e. direct execution). The reason that hasn't been done before is
that we did not see a real need for it; why would anyone want to use a
connection to retrieve a ResultSet and do something else at the same
time? And why couldn't this be achieved using two connections? Or a
single connection but processing the ResultSet first and doing the
other operations afterwards?

[BTW, this behavior is fully supported by jTDS now, only it is done by
caching the ResultSet to memory/disk when another request is submitted
instead of creating a server side cursor. It's just a different
approach.]

There is one other request with a good explanation of why fast forward
only cursors would be the right choice for this on the jTDS Help forum:
https://sourceforge.net/forum/forum.php?thread_id=1246387&forum_id=104389

It still doesn't have a good explanation of why this would be needed,
other than "we have a very complicated application layer that needs
this", but if you really-really need this, please submit a RFE (request
for enhancement) for jTDS here:
https://sourceforge.net/tracker/?group_id=33291&atid=407765

Alin,
The jTDS Project.
Silvio Bierman - 15 Mar 2005 13:27 GMT
> > I should have mentioned this problem occurred with an older version
> of the
[quoted text clipped - 44 lines]
> Alin,
> The jTDS Project.

Hi Alin,

I have a very simple example. I need the N (unknown) largest customers
ordered by revenue (desc) up to a total revenue of K (application parameter)
which goes like:

SELECT CUSTOMER_NAME,CUSTOMER_REVENUE
FROM CUSTOMERS ORDER BY CUSTOMER_REVENUE DESC

where the application processes N rows out of a zillion until the sum of the
revenues read reaches K.

Caching all records on the client side will obviously work (except when
running inside an applet) but will occupy quite some bandwidth.

Regards,

Silvio Bierman
Alin Sinpalean - 16 Mar 2005 11:10 GMT
> I have a very simple example. I need the N (unknown) largest customers
> ordered by revenue (desc) up to a total revenue of K (application parameter)
[quoted text clipped - 8 lines]
> Caching all records on the client side will obviously work (except when
> running inside an applet) but will occupy quite some bandwidth.

Silvio,

Indeed in this case reading the whole ResultSet is not a choice. You
could, however take other approaches, such as limiting the number of
rows to a reasonable value, which may or may not work; you could
combine this with additional SELECTs if the limit is not reached:

SELECT CUSTOMER_NAME,CUSTOMER_REVENUE
FROM CUSTOMERS WHERE CUSTOMER_REVENUE <= <revenue_of_last_customer>
ORDER BY CUSTOMER_REVENUE DESC

Yet another solution would be to use a stored procedure to do all the
processing on the server.

If you consider portability to other DBMSs these are both better
solutions: Oracle doesn't support server side cursors at all as far as
I know; scrollable ResultSets are implemented through client side
cursors, where all the rows are cached _in memory_ by the driver. Also,
as you can see, even some SQL Server drivers do not support the
behavior you need (see the current implementation of jTDS).

One more thing you could take into consideration is that SQL Server
will have to sort all X million rows just so that you can read the
first few of them. This is not very efficient. From this point of view,
I think my first suggestion would be better, except that it would
require more work on your side.

Anyway, a RFE has been created for jTDS, requesting the possibility of
choosing between firehose cursors and fast forward only cursors and
this functionality will be implemented when time allows.

Mark Matthews wrote:
> I've learned not to underestimate the interesting ways in which users
> might use the functionality that is given to them ;).
>
> We have quite a few users of MySQL that do just this, interleaving
> various statements with others and processing more than one open
result
> set at a time or issuing updates based on data they work with from a
> currently-open result set.

Mark,

Thanks for your comments. While you are entirely correct, I don't want
to encourage such behavior from the users; there are cases in which
they have to do with legacy code and it would require great effort to
fix things on their side but that doesn't mean I have to provide them
with a solution. Your example of "thread-safeness" seems like a good
example of a "problem" I will never agree to fix on my side (don't get
me wrong, jTDS provides thread safe Statements, but that's very much
the limit -- I don't see the need for the new shiny feature of JDBC 3.0
of being able to keep a ResultSet open when moving on to another result
and that's not implemented).

In this particular case, however, it doesn't seem like too much of an
effort to implement fast forward only cursors and we will probably do
it, especially since it would ease life for some and make the
transition from the brain-dead MS driver to jTDS much easier for them.

Alin,
The jTDS Project.
Silvio Bierman - 17 Mar 2005 23:27 GMT
> > I have a very simple example. I need the N (unknown) largest
> customers
[quoted text clipped - 43 lines]
> choosing between firehose cursors and fast forward only cursors and
> this functionality will be implemented when time allows.

Hello Alin,

That I already understood and I am glad about that. I will keep following
jTDS progress.

About Oracle: it may not support scrollable cursurs (which for me are
totally useless anyway) but it does support chunked transfer of resultsets.
There is (or used to be) a special optimization hint/setting that favours
query plans that result in fast first rows.

Sorting is not neccesary if the column is indexed, which is exactly the
strategy SQL-server and Oracle utilize. This icw. chunked fetching is the
best solution for the use-case. Different solutions are abviously possible.

Regards,

Silvio Bierman

> Mark Matthews wrote:
> > I've learned not to underestimate the interesting ways in which users
[quoted text clipped - 26 lines]
> Alin,
> The jTDS Project.
Alin Sinpalean - 18 Mar 2005 16:28 GMT
> About Oracle: it may not support scrollable cursurs (which for me are
> totally useless anyway) but it does support chunked transfer of resultsets.
> There is (or used to be) a special optimization hint/setting that favours
> query plans that result in fast first rows.

Scrollable ResultSets aren't totally useless in desktop applications,
e.g. database managers. They are very much useless in server side apps,
although some JDBC users seem to prefer using a scrollable ResultSet
over a separate query to find out the number of rows in a table.

The fact that Oracle supports some optimization hint that favors fast
first rows does not imply that those rows will be returned in chunks
(it doesn't mean the opposite either). It _could_ simply mean that when
a client reads the resulting rows, the first rows will be available
immediately while following reads will probably block. Anyway, I
mentioned Oracle just as an example; MySQL is another one. And the fact
that Oracle implements scrollable result sets through client side
cursors could imply that it is not able to return result sets in chunks
between which you could run other queries. So I was implying that
forward only result sets could be implemented just like they are
currently in jTDS.

> Sorting is not neccesary if the column is indexed, which is exactly the
> strategy SQL-server and Oracle utilize. This icw. chunked fetching is the
> best solution for the use-case. Different solutions are abviously possible.

I don't necessarily agree that chunked fetching is definitely the best
solution for your problem. Depending on the situation (complex query,
missing index, whatever, change sensitivity) it could happen that the
overhead of a server side cursor is much higher that that of a firehose
cursor/direct select. The server could have to lock all selected rows
or keep a copy of rows altered while the cursor is open, depending on
transaction isolation settings. As I said before, it is definitely the
easiest way for you; not necessarily for the server.

Alin,
The jTDS Project.
Mark Matthews - 15 Mar 2005 16:06 GMT
[snip]
> It would be possible to have a parameter to specify whether forward
> only server side cursors should be used instead of firehose cursors
[quoted text clipped - 4 lines]
> single connection but processing the ResultSet first and doing the
> other operations afterwards?

Alin,

I've learned not to underestimate the interesting ways in which users
might use the functionality that is given to them ;).

We have quite a few users of MySQL that do just this, interleaving
various statements with others and processing more than one open result
set at a time or issuing updates based on data they work with from a
currently-open result set.

Many of these result sets are small, and using non-firehose cursors
would actually be a performance loss, so we by default spool result sets
into memory.

On the other side of the coin, there are users that read in the JDBC
specification that drivers should be thread-safe, so they assume that
means that it is a good idea to share a connection between multiple
threads because they believe it improves performance, which isn't true
for any database I'm aware of.

(just my two cents).

    -Mark
Lee Fesperman - 14 Mar 2005 19:00 GMT
> Hello Lee,
>
[quoted text clipped - 10 lines]
> driver that supported dynamic cursors fixing the problem. I think it is a
> very dangerous default behaviour of a JDBC driver...

Please don't top-post. It's just basic netiquette. (I'm too lazy to rearrange)

However, I do agree that it is dangerous behaviour with some very problematic side
effects. Alin's solution is a fairly reasonable workaround.

Note: Alin posted his solution in a reply to your article, but I replied to you just so
I could chide you about top-posting ;^)

> > > >>I think MySQL allowed/allows only one open cursor at a time. Reading
> > > >>all the data into memory was probably the easiest way to simulate multiple
[quoted text clipped - 26 lines]
> > and closes them when no longer needed). Since it's GPL, you get the source and
> > can see for yourself this 'magic' ;^)

Signature

Lee Fesperman, FFE Software, Inc. (http://www.firstsql.com)
==============================================================
* The Ultimate DBMS is here!
* FirstSQL/J Object/Relational DBMS  (http://www.firstsql.com)

Silvio Bierman - 15 Mar 2005 00:01 GMT
Thanks for the correction Lee,

I normally don't toppost but sometimes I forget...

Regards,

Silvio

> > Hello Lee,
> >
[quoted text clipped - 55 lines]
> * The Ultimate DBMS is here!
> * FirstSQL/J Object/Relational DBMS  (http://www.firstsql.com)
Gerd Nachtsheim - 15 Mar 2005 00:07 GMT
Silvio Bierman wrote, On 15.03.2005 00:01:
> Thanks for the correction Lee,
>
> I normally don't toppost but sometimes I forget...

...he said and topposted again ;^)

You may want to read <http://learn.to/quote>

SCNR

Gerd
Signature

Gerd Nachtsheim mailto:Gerd.Nachtsheim@t-online.de ICQ:#13126958

Silvio Bierman - 17 Mar 2005 23:29 GMT
> Silvio Bierman wrote, On 15.03.2005 00:01:
> > Thanks for the correction Lee,
[quoted text clipped - 10 lines]
> --
> Gerd Nachtsheim mailto:Gerd.Nachtsheim@t-online.de ICQ:#13126958

Gerd,

That second top-post was a joke...

Regards,

Silvio


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.