Java Forum / Databases / March 2005
JDBC and ResultSet problem
Peter L Skoglund - 09 Mar 2005 09:41 GMT Dear all,
When trying to output a rather large database (around 10M rows) to a textfile I get a memory problem that I guess has to do with the ResaultSet (see a part of the code below). Is there any way in java to get data direcly from the db NOT using ResultSet, something like a datareader or connect a stream directly to the db allowing to read line by line or even smaller parts? In my case there is no way I can extend the memory sapce.
Regards Peter
/* ----------------------- BEGIN CODE PART ----------------------- */ Class.forName(driverName); System.out.println("Opening db connection"); connection = DriverManager.getConnection(url, user, passwd); stmt = connection.createStatement(); //Execute query try { //TODO Find a better way to stream data, NOT using ResultSet stmt.getResultSet(); rs = stmt.executeQuery(query); System.out.println("ResultSet OK"); //Write information to file FileOutputStream out; PrintStream p; try { out = new FileOutputStream(fileName); System.out.println("Printing data to file..."); p = new PrintStream(out); while (rs.next()) { p.println(rs.getString(1)); myrowcounter +=1; } p.close(); System.out.println("File printed with " + myrowcounter + " rows"); } /* ------------------------ END CODE PART ------------------------ */
joeNOSPAM@BEA.com - 09 Mar 2005 17:06 GMT Hi. All DBMSes offer APIs to get data from the DBMS. What DBMS and driver are you using? You shouldn't be hitting any memory issue. Typically the result set only buffers a small amount in the client unless you ask for a scrollable result set. Joe Weinstein at BEA
PS: remove your line with the getResultSet() call. It doesn't do anything and you're ignoring the return (if any).
Peter L Skoglund - 09 Mar 2005 21:33 GMT Strange, because when setting a limit to the sql statement to 100, it works fine. Without the limitation I get "Exception in thread "main" java.lang.OutOfMemoryError: Java heap space", and setting -Xmx to allocate 256M doesn't help (even tried 400M)
I'm using 'mysql-connector-java-3.1.7'
/Peter
> Hi. All DBMSes offer APIs to get data from the DBMS. What > DBMS and driver are you using? You shouldn't be hitting any [quoted text clipped - 4 lines] > PS: remove your line with the getResultSet() call. It doesn't do > anything and you're ignoring the return (if any). joeNOSPAM@BEA.com - 09 Mar 2005 22:49 GMT what limit is this you're setting? SetFetchSize() or setMaxRows()? Joe
Peter L Skoglund - 10 Mar 2005 07:43 GMT I'm just set the select statement to return first 100 rows by doing: resultset = statement.executeQuery("select * from mytable limit 100"); /Peter
> what limit is this you're setting? SetFetchSize() or setMaxRows()? > Joe Silvio Bierman - 10 Mar 2005 01:52 GMT Hello Peter,
I think you are in trouble because of the way MySQL handles resultsets. Where most DBMSs use a streaming-like mechanism when returning (possibly large) resultsets delivering the first rows fast without completely evaluating the query until more rows are fetched MySQL seems to do a complete resultset evaluation before returning the first row. Also, the driver on the client side seems to have to collect the whole resultset before delivering rows to the application code. Don't ask me why...
I have had these problems with MySQL also. Running on SQLServer, Oracle or SapDB (now MaxDB) never resulted in any problems using exactly the same application code (and database content).
Regards,
Silvio Bierman
Daniel Dittmar - 10 Mar 2005 09:29 GMT > I think you are in trouble because of the way MySQL handles resultsets. > Where most DBMSs use a streaming-like mechanism when returning (possibly [quoted text clipped - 3 lines] > driver on the client side seems to have to collect the whole resultset > before delivering rows to the application code. Don't ask me why... I think MySQL allowed/allows only one open cursor at a time. Reading all the data into memory was probably the easiest way to simulate multiple open cursors in the driver.
Daniel
Lee Fesperman - 10 Mar 2005 19:34 GMT > > I think you are in trouble because of the way MySQL handles resultsets. > > Where most DBMSs use a streaming-like mechanism when returning (possibly [quoted text clipped - 7 lines] > the data into memory was probably the easiest way to simulate multiple > open cursors in the driver. The MySQL protocol only allows this, but their current JDBC driver provides several workarounds. Check the connection properties for com.mysql.jdbc.JdbcDriver.
 Signature Lee Fesperman, FFE Software, Inc. (http://www.firstsql.com) ============================================================== * The Ultimate DBMS is here! * FirstSQL/J Object/Relational DBMS (http://www.firstsql.com)
Mark Matthews - 11 Mar 2005 18:13 GMT >>>I think you are in trouble because of the way MySQL handles resultsets. >>>Where most DBMSs use a streaming-like mechanism when returning (possibly [quoted text clipped - 10 lines] > The MySQL protocol only allows this, but their current JDBC driver provides several > workarounds. Check the connection properties for com.mysql.jdbc.JdbcDriver. Guys,
Lee's correct. Back a long time ago, I implemented the JDBC driver to cover the 80%+ use case, which was small-to-medium sized result sets with multiple open cursors (the multiple open cursors part is required by the JDBC spec), so therefore the JDBC driver does buffer the complete result set in memory by default.
For the use case just described, this is usually a better option because it greatly increases concurrency at the database server, since clients aren't holding onto locks while they process results. It also requires potentially a lot less network traffic to do things this way (no request/response for each batch of rows).
You can read a row-at-a-time with only one open statement if you do the following non-standard (to JDBC) setup on your statement:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY); stmt.setFetchSize(Integer.MIN_VALUE);
(btw, this is in the docs that came with your driver or available at http://dev.mysql.com/doc/connector/j/en/cj-implementation-notes.html)
Starting with Connector/J 3.2 and MySQL-5.0, Statement.setFetchSize(> 0) on any statement will use fetch-n-rows-at-a-time functionality that is in MySQL-5.0.
(btw, many JDBC drivers for SQL Server, including the ones shipped by MS actually do something similar, if you ask for another result set and are using cursors to fetch them rather than 'firehose' mode, they actually open _another_connection_ to the database!)
-Mark
ableofhighheart@gmail.com - 11 Mar 2005 20:00 GMT Alternatively, you can try to increase the maximum Java heap size.
For example:
java -Xmx512m com.YourMain
> >>>I think you are in trouble because of the way MySQL handles resultsets. > >>>Where most DBMSs use a streaming-like mechanism when returning (possibly [quoted text clipped - 45 lines] > > -Mark Lee Fesperman - 11 Mar 2005 23:04 GMT > >>I think MySQL allowed/allows only one open cursor at a time. Reading all > >>the data into memory was probably the easiest way to simulate multiple [quoted text clipped - 10 lines] > > .... Thanks, I was just filling in until you showed up.
> (btw, many JDBC drivers for SQL Server, including the ones shipped by MS > actually do something similar, if you ask for another result set and are > using cursors to fetch them rather than 'firehose' mode, they actually > open _another_connection_ to the database!) Originally, the SQL Server protocol for both MS and Sybase only supported 'firehose' mode. Our web language, Jive (available under GPL at http://www.firstsql.com/jive/), supports nested queries by opening a new (internal) connection for the nested query just for compatibility with this oddity in SQL Server's ODBC driver (Jive automatically tracks open connections and closes them when no longer needed). Since it's GPL, you get the source and can see for yourself this 'magic' ;^)
MS SQL Server now supports "server-side" cursors that doesn't use firehose mode. I don't know if Sybase does yet.
Note: Firehose mode can be more efficient because it avoid the response overhead. However, it normally sends each row as a separate packet which adds its own overhead. The driver for FirstSQL/J (see my sig) uses the setFetchSize() hint in JDBC to batch multiple rows in a packet to give a kind of 'best of both worlds' implementation. I'm sure drivers for other databases provide similar functionality.
 Signature Lee Fesperman, FFE Software, Inc. (http://www.firstsql.com) ============================================================== * The Ultimate DBMS is here! * FirstSQL/J Object/Relational DBMS (http://www.firstsql.com)
Silvio Bierman - 14 Mar 2005 09:21 GMT Hello Lee,
Several MSSQL JDBC drivers employ the same scheme, they create multiple connections transparantly and use those for the multiple rowsets. We where extremely bitten by this behaviour once using the JTDS driver.
Where multiple statements on the same connection do not suffer from locking issues caused by MSSQLs tendency to lock al SELECT-ed records statements on seperate connections do. Also, transactions only work on single connections so the begin-transaction/rollback/commit logic is broken by this also.
Luckily we found out about this quickly and where able to setup a different driver that supported dynamic cursors fixing the problem. I think it is a very dangerous default behaviour of a JDBC driver...
Regards,
Silvio Bierman
> > >>I think MySQL allowed/allows only one open cursor at a time. Reading all > > >>the data into memory was probably the easiest way to simulate multiple [quoted text clipped - 39 lines] > * The Ultimate DBMS is here! > * FirstSQL/J Object/Relational DBMS (http://www.firstsql.com) Alin Sinpalean - 14 Mar 2005 12:45 GMT > Several MSSQL JDBC drivers employ the same scheme, they create multiple > connections transparantly and use those for the multiple rowsets. We where > extremely bitten by this behaviour once using the JTDS driver. I need to make an observation here. jTDS now implements something similar to the MySQL JDBC driver, only with a minor difference: the ResultSet isn't cached into memory _unless_ another query is executed on another Statement on the same Connection (data is also cached to disk after a certain threshold is reached so you won't run out of memory). This means that in the most common case, when you're retrieving a ResultSet from the DBMS without doing anything else at the same time on the same Connection, no caching is needed so memory problems are avoided. And, of course, no extra connections have to be created anymore, so no locking occurs and transaction semantics are respected.
Also, when creating a scrollable/updateable ResultSet a server side cursor is created and this ensures real concurrency for multiple Statements (read a few rows here, a few rows there, controllable through setFetchSize() ).
Alin, The jTDS Project.
Silvio Bierman - 14 Mar 2005 15:09 GMT Hello Alin,
I should have mentioned this problem occurred with an older version of the jTDS driver. I have been following your advances and know that you have solved that problem in the current versions.
I would like a configuration level property of specifying that I want server side cursor, changing the code to scrollable resultsets is not acceptable for me since I do not need them and don't know what other databases might do when I specify such cursor attributes.
Is it not possible to change default read cursors to be server side by specifying this in the connect-string. If so I would probably not hesitate to switch back to the jTDS driver (using the MS-driver now).
Regards,
Silvio Bierman
> > Several MSSQL JDBC drivers employ the same scheme, they create > multiple [quoted text clipped - 21 lines] > Alin, > The jTDS Project. Alin Sinpalean - 15 Mar 2005 11:23 GMT > I should have mentioned this problem occurred with an older version of the > jTDS driver. I have been following your advances and know that you have > solved that problem in the current versions. No problem. I just wanted to let everybody else know about this.
> I would like a configuration level property of specifying that I want server > side cursor, changing the code to scrollable resultsets is not acceptable [quoted text clipped - 4 lines] > specifying this in the connect-string. If so I would probably not hesitate > to switch back to the jTDS driver (using the MS-driver now). It would be possible to have a parameter to specify whether forward only server side cursors should be used instead of firehose cursors (i.e. direct execution). The reason that hasn't been done before is that we did not see a real need for it; why would anyone want to use a connection to retrieve a ResultSet and do something else at the same time? And why couldn't this be achieved using two connections? Or a single connection but processing the ResultSet first and doing the other operations afterwards?
[BTW, this behavior is fully supported by jTDS now, only it is done by caching the ResultSet to memory/disk when another request is submitted instead of creating a server side cursor. It's just a different approach.]
There is one other request with a good explanation of why fast forward only cursors would be the right choice for this on the jTDS Help forum: https://sourceforge.net/forum/forum.php?thread_id=1246387&forum_id=104389
It still doesn't have a good explanation of why this would be needed, other than "we have a very complicated application layer that needs this", but if you really-really need this, please submit a RFE (request for enhancement) for jTDS here: https://sourceforge.net/tracker/?group_id=33291&atid=407765
Alin, The jTDS Project.
Silvio Bierman - 15 Mar 2005 13:27 GMT > > I should have mentioned this problem occurred with an older version > of the [quoted text clipped - 44 lines] > Alin, > The jTDS Project. Hi Alin,
I have a very simple example. I need the N (unknown) largest customers ordered by revenue (desc) up to a total revenue of K (application parameter) which goes like:
SELECT CUSTOMER_NAME,CUSTOMER_REVENUE FROM CUSTOMERS ORDER BY CUSTOMER_REVENUE DESC
where the application processes N rows out of a zillion until the sum of the revenues read reaches K.
Caching all records on the client side will obviously work (except when running inside an applet) but will occupy quite some bandwidth.
Regards,
Silvio Bierman
Alin Sinpalean - 16 Mar 2005 11:10 GMT > I have a very simple example. I need the N (unknown) largest customers > ordered by revenue (desc) up to a total revenue of K (application parameter) [quoted text clipped - 8 lines] > Caching all records on the client side will obviously work (except when > running inside an applet) but will occupy quite some bandwidth. Silvio,
Indeed in this case reading the whole ResultSet is not a choice. You could, however take other approaches, such as limiting the number of rows to a reasonable value, which may or may not work; you could combine this with additional SELECTs if the limit is not reached:
SELECT CUSTOMER_NAME,CUSTOMER_REVENUE FROM CUSTOMERS WHERE CUSTOMER_REVENUE <= <revenue_of_last_customer> ORDER BY CUSTOMER_REVENUE DESC
Yet another solution would be to use a stored procedure to do all the processing on the server.
If you consider portability to other DBMSs these are both better solutions: Oracle doesn't support server side cursors at all as far as I know; scrollable ResultSets are implemented through client side cursors, where all the rows are cached _in memory_ by the driver. Also, as you can see, even some SQL Server drivers do not support the behavior you need (see the current implementation of jTDS).
One more thing you could take into consideration is that SQL Server will have to sort all X million rows just so that you can read the first few of them. This is not very efficient. From this point of view, I think my first suggestion would be better, except that it would require more work on your side.
Anyway, a RFE has been created for jTDS, requesting the possibility of choosing between firehose cursors and fast forward only cursors and this functionality will be implemented when time allows.
Mark Matthews wrote:
> I've learned not to underestimate the interesting ways in which users > might use the functionality that is given to them ;). > > We have quite a few users of MySQL that do just this, interleaving > various statements with others and processing more than one open result
> set at a time or issuing updates based on data they work with from a > currently-open result set. Mark,
Thanks for your comments. While you are entirely correct, I don't want to encourage such behavior from the users; there are cases in which they have to do with legacy code and it would require great effort to fix things on their side but that doesn't mean I have to provide them with a solution. Your example of "thread-safeness" seems like a good example of a "problem" I will never agree to fix on my side (don't get me wrong, jTDS provides thread safe Statements, but that's very much the limit -- I don't see the need for the new shiny feature of JDBC 3.0 of being able to keep a ResultSet open when moving on to another result and that's not implemented).
In this particular case, however, it doesn't seem like too much of an effort to implement fast forward only cursors and we will probably do it, especially since it would ease life for some and make the transition from the brain-dead MS driver to jTDS much easier for them.
Alin, The jTDS Project.
Silvio Bierman - 17 Mar 2005 23:27 GMT > > I have a very simple example. I need the N (unknown) largest > customers [quoted text clipped - 43 lines] > choosing between firehose cursors and fast forward only cursors and > this functionality will be implemented when time allows. Hello Alin,
That I already understood and I am glad about that. I will keep following jTDS progress.
About Oracle: it may not support scrollable cursurs (which for me are totally useless anyway) but it does support chunked transfer of resultsets. There is (or used to be) a special optimization hint/setting that favours query plans that result in fast first rows.
Sorting is not neccesary if the column is indexed, which is exactly the strategy SQL-server and Oracle utilize. This icw. chunked fetching is the best solution for the use-case. Different solutions are abviously possible.
Regards,
Silvio Bierman
> Mark Matthews wrote: > > I've learned not to underestimate the interesting ways in which users [quoted text clipped - 26 lines] > Alin, > The jTDS Project. Alin Sinpalean - 18 Mar 2005 16:28 GMT > About Oracle: it may not support scrollable cursurs (which for me are > totally useless anyway) but it does support chunked transfer of resultsets. > There is (or used to be) a special optimization hint/setting that favours > query plans that result in fast first rows. Scrollable ResultSets aren't totally useless in desktop applications, e.g. database managers. They are very much useless in server side apps, although some JDBC users seem to prefer using a scrollable ResultSet over a separate query to find out the number of rows in a table.
The fact that Oracle supports some optimization hint that favors fast first rows does not imply that those rows will be returned in chunks (it doesn't mean the opposite either). It _could_ simply mean that when a client reads the resulting rows, the first rows will be available immediately while following reads will probably block. Anyway, I mentioned Oracle just as an example; MySQL is another one. And the fact that Oracle implements scrollable result sets through client side cursors could imply that it is not able to return result sets in chunks between which you could run other queries. So I was implying that forward only result sets could be implemented just like they are currently in jTDS.
> Sorting is not neccesary if the column is indexed, which is exactly the > strategy SQL-server and Oracle utilize. This icw. chunked fetching is the > best solution for the use-case. Different solutions are abviously possible. I don't necessarily agree that chunked fetching is definitely the best solution for your problem. Depending on the situation (complex query, missing index, whatever, change sensitivity) it could happen that the overhead of a server side cursor is much higher that that of a firehose cursor/direct select. The server could have to lock all selected rows or keep a copy of rows altered while the cursor is open, depending on transaction isolation settings. As I said before, it is definitely the easiest way for you; not necessarily for the server.
Alin, The jTDS Project.
Mark Matthews - 15 Mar 2005 16:06 GMT [snip]
> It would be possible to have a parameter to specify whether forward > only server side cursors should be used instead of firehose cursors [quoted text clipped - 4 lines] > single connection but processing the ResultSet first and doing the > other operations afterwards? Alin,
I've learned not to underestimate the interesting ways in which users might use the functionality that is given to them ;).
We have quite a few users of MySQL that do just this, interleaving various statements with others and processing more than one open result set at a time or issuing updates based on data they work with from a currently-open result set.
Many of these result sets are small, and using non-firehose cursors would actually be a performance loss, so we by default spool result sets into memory.
On the other side of the coin, there are users that read in the JDBC specification that drivers should be thread-safe, so they assume that means that it is a good idea to share a connection between multiple threads because they believe it improves performance, which isn't true for any database I'm aware of.
(just my two cents).
-Mark
Lee Fesperman - 14 Mar 2005 19:00 GMT > Hello Lee, > [quoted text clipped - 10 lines] > driver that supported dynamic cursors fixing the problem. I think it is a > very dangerous default behaviour of a JDBC driver... Please don't top-post. It's just basic netiquette. (I'm too lazy to rearrange)
However, I do agree that it is dangerous behaviour with some very problematic side effects. Alin's solution is a fairly reasonable workaround.
Note: Alin posted his solution in a reply to your article, but I replied to you just so I could chide you about top-posting ;^)
> > > >>I think MySQL allowed/allows only one open cursor at a time. Reading > > > >>all the data into memory was probably the easiest way to simulate multiple [quoted text clipped - 26 lines] > > and closes them when no longer needed). Since it's GPL, you get the source and > > can see for yourself this 'magic' ;^)
 Signature Lee Fesperman, FFE Software, Inc. (http://www.firstsql.com) ============================================================== * The Ultimate DBMS is here! * FirstSQL/J Object/Relational DBMS (http://www.firstsql.com)
Silvio Bierman - 15 Mar 2005 00:01 GMT Thanks for the correction Lee,
I normally don't toppost but sometimes I forget...
Regards,
Silvio
> > Hello Lee, > > [quoted text clipped - 55 lines] > * The Ultimate DBMS is here! > * FirstSQL/J Object/Relational DBMS (http://www.firstsql.com) Gerd Nachtsheim - 15 Mar 2005 00:07 GMT Silvio Bierman wrote, On 15.03.2005 00:01:
> Thanks for the correction Lee, > > I normally don't toppost but sometimes I forget... ...he said and topposted again ;^)
You may want to read <http://learn.to/quote>
SCNR
Gerd
 Signature Gerd Nachtsheim mailto:Gerd.Nachtsheim@t-online.de ICQ:#13126958
Silvio Bierman - 17 Mar 2005 23:29 GMT > Silvio Bierman wrote, On 15.03.2005 00:01: > > Thanks for the correction Lee, [quoted text clipped - 10 lines] > -- > Gerd Nachtsheim mailto:Gerd.Nachtsheim@t-online.de ICQ:#13126958 Gerd,
That second top-post was a joke...
Regards,
Silvio
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|