Java Forum / Databases / December 2007
MySQL ResultSet - count rows?
Christoph - 10 Dec 2007 08:34 GMT Hello,
I am writing the data of a ResultSet object to an array. To declare the array length, I need to know the number of rows before copying them - is there a way to read the row count instantly, without having to iterate anything?
PHP has mysql_count_rows, but I'm not sure what that does internally - it may just iterate over the data too.
-- Chris
Dyreatnews@sun.com - 10 Dec 2007 09:34 GMT > Hello, > > I am writing the data of a ResultSet object to an array. To declare > the array length, I need to know the number of rows before copying > them - is there a way to read the row count instantly, without having > to iterate anything? No. You could execute 'SELECT COUNT(*) ...' first but then you essentially traverse all the rows twice.
(There could be non-standard MySQL extensions that lets you do this, I don't know)
> PHP has mysql_count_rows, but I'm not sure what that does internally - > it may just iterate over the data too.
 Signature dt
Questions about Derby/Java DB? Please visit http://db.apache.org/derby/derby_mail.html
Christoph - 10 Dec 2007 11:47 GMT On Dec 10, 10:34 am, Dyreatn...@sun.com wrote:
> > Hello, > [quoted text clipped - 16 lines] > > Questions about Derby/Java DB? Please visithttp://db.apache.org/derby/derby_mail.html Okay, I'm working around it by using
while(result.next()) rowcount++;
At least incrementing the pointer is not a very expensive action. Even when I get a few thousand results back (which I will) the counting will take far less time than the copying and instance creation that follows.
Ruud de Koter - 12 Dec 2007 22:12 GMT >> Hello, >> [quoted text clipped - 5 lines] > No. You could execute 'SELECT COUNT(*) ...' first but then you > essentially traverse all the rows twice. One should realize that in doing so you are not determining the number of rows at the moment of retrieval, but the number of rows a few moments before the actual retrieval. Whether this is an important distinction depends with the volatility of the table and the use you will make if this number. On any system used by more than one user/process, these numbers *will* be different, sooner or later.
In short: this is not the way to go.
Regards,
Ruud de Koter.
> (There could be non-standard MySQL extensions that lets you do this, I > don't know) > >> PHP has mysql_count_rows, but I'm not sure what that does internally - >> it may just iterate over the data too. Lew - 13 Dec 2007 02:56 GMT Dyreatnews@sun.com wrote:
>> No. You could execute 'SELECT COUNT(*) ...' first but then you >> essentially traverse all the rows twice.
> One should realize that in doing so you are not determining the number > of rows at the moment of retrieval, but the number of rows a few moments [quoted text clipped - 4 lines] > > In short: this is not the way to go. You could wrap the SELECT COUNT(*) and the row SELECT in a transaction, thus guaranteeing at least a consistent view if not a current one.
 Signature Lew
Ruud de Koter - 13 Dec 2007 08:57 GMT > Dyreatnews@sun.com wrote: >>> No. You could execute 'SELECT COUNT(*) ...' first but then you [quoted text clipped - 11 lines] > You could wrap the SELECT COUNT(*) and the row SELECT in a transaction, > thus guaranteeing at least a consistent view if not a current one. True. I may have gotten carried away as I was writing. What matters is that the double-select solution has its issues. One should be aware of those and decide whether these are acceptable, can be mitigated (as you propose), or are prohibitive.
Regards,
Ruud.
Dyreatnews@sun.com - 13 Dec 2007 09:15 GMT > Dyreatnews@sun.com wrote: >>> No. You could execute 'SELECT COUNT(*) ...' first but then you [quoted text clipped - 13 lines] > transaction, thus guaranteeing at least a consistent view if not a > current one. Ruud is making a very good point. I've just been told that the the SELECT COUNT(*) approach will only be guaranteed to work if you execute both SELECTs in a single transaction AND use SERIALIZABLE isolation level. Even at REPEATABLE READ you may see new rows added since the previous select, (but all rows counted will have the same value throughout the transaction).
 Signature dt
Questions about Derby/Java DB? Please visit http://db.apache.org/derby/derby_mail.html
Lew - 13 Dec 2007 15:12 GMT > Ruud is making a very good point. I've just been told that the the > SELECT COUNT(*) approach will only be guaranteed to work if you execute > both SELECTs in a single transaction AND use SERIALIZABLE isolation > level. Even at REPEATABLE READ you may see new rows added since the > previous select, (but all rows counted will have the same value > throughout the transaction). You won't "see" those new rows added until after the transaction completes.
Even at "read uncommitted" isolation you likely will be fine. It depends on whether you need to see rows added during the transaction or not, i.e., on whether you want the values to represent the db state at the end of your transaction, or if it's sufficient to see the state at the beginning of the transaction.
After all, rows could be added a nanosecond after your transaction, too, and you wouldn't know it.
There's nothing wrong with seeing the db state as of transaction start, in most cases. There will *always* be latency in your queries. The point of the transaction is that it gives you a consistent view of the data, and that's almost always sufficient.
Bear in mind that not all configurations of MySQL tables support transactions.
 Signature Lew
Dyreatnews@sun.com - 13 Dec 2007 19:02 GMT >> Ruud is making a very good point. I've just been told that the the >> SELECT COUNT(*) approach will only be guaranteed to work if you execute [quoted text clipped - 4 lines] > > You won't "see" those new rows added until after the transaction completes. I don't know how it works in MySQL, but in Java DB/Derby, you will if the isolation level is REPEATABLE READ or lower. The following example will print 'T has 3 rows' and then list all 6 rows if c1 executes with REPEATABLE READ. With SERIALIZABLE, the second connection gets a lock timeout when trying to insert the last 3 rows.
Connection c1 = DriverManager.getConnection(args[1]); Statement s1 = c1.createStatement(); try { s1.execute("drop table T"); } catch (SQLException e) {} s1.execute("create table T(i int unique not null, j int)"); s1.execute("insert into T values (0, 0), (1, 1), (2, 2)");
c1.setAutoCommit(false); c1.setTransactionIsolation(args[2].equals("serializable")? Connection.TRANSACTION_SERIALIZABLE:Connection.TRANSACTION_REPEATABLE_READ); ResultSet count = s1.executeQuery("select count(*) from T"); count.next(); System.out.println("T has "+count.getInt(1)+" rows");
Connection c2 = DriverManager.getConnection(args[1]); Statement s2 = c2.createStatement(); s2.execute("insert into T values (5, 5), (6, 6), (7, 7)"); ResultSet rows = s1.executeQuery("select * from T"); while (rows.next()) { System.out.println(rows.getInt(1)+", "+rows.getInt(2)); }
> Even at "read uncommitted" isolation you likely will be fine. It > depends on whether you need to see rows added during the transaction [quoted text clipped - 11 lines] > > Bear in mind that not all configurations of MySQL tables support transactions.
 Signature dt
Questions about Derby/Java DB? Please visit http://db.apache.org/derby/derby_mail.html
Arne Vajhøj - 13 Dec 2007 20:49 GMT >> Ruud is making a very good point. I've just been told that the the >> SELECT COUNT(*) approach will only be guaranteed to work if you execute [quoted text clipped - 4 lines] > > You won't "see" those new rows added until after the transaction completes. You will.
user 1 - SELECT COUNT(*) user 2 - INSERT user 1 - SELECT
Only SERIALIZABLE will prevent this.
Arne
Ruud de Koter - 13 Dec 2007 21:28 GMT >>> Ruud is making a very good point. I've just been told that the the >>> SELECT COUNT(*) approach will only be guaranteed to work if you execute [quoted text clipped - 15 lines] > > Arne To make this decision a real dilemma, using serializable transactions will have 'interesting' consequences for resource contention, hence timeouts and (percieved) performance in any multi-user environment. This really is something to consider carefully.
Re-reading the original post, I saw that this is about writing a ResultSet's data into an Array. It 'd be much wiser to use some subtype of List in this case, and just avoid knowing the number of rows in advance. I think one of the other posters already suggested this. Doing so would keep one away from this minefield.
Ruud.
Lew - 14 Dec 2007 00:32 GMT >>> Ruud is making a very good point. I've just been told that the the >>> SELECT COUNT(*) approach will only be guaranteed to work if you execute [quoted text clipped - 13 lines] > > Only SERIALIZABLE will prevent this. I stand corrected. Thanks, Dyreatnews and Arne.
 Signature Lew
Sabine Dinis Blochberger - 10 Dec 2007 09:45 GMT > Hello, > [quoted text clipped - 5 lines] > PHP has mysql_count_rows, but I'm not sure what that does internally - > it may just iterate over the data too. There is a SQL statement to the effect: select count(*) from tablename Look it up to see if it can be optimized.
 Signature Sabine Dinis Blochberger
Op3racional www.op3racional.eu
Christoph - 10 Dec 2007 13:14 GMT On Dec 10, 10:45 am, Sabine Dinis Blochberger <no.s...@here.invalid> wrote:
> > Hello, > [quoted text clipped - 12 lines] > > Op3racionalwww.op3racional.eu Thanks. Right now I'm looping over result.next, but I'll try sending a COUNT() query to compare speeds.
Dyreatnews@sun.com - 10 Dec 2007 19:21 GMT > On Dec 10, 10:45 am, Sabine Dinis Blochberger <no.s...@here.invalid> > wrote: [quoted text clipped - 17 lines] > Thanks. Right now I'm looping over result.next, but I'll try sending a > COUNT() query to compare speeds. Not entirely sure, but I'm guessing that Sabine (as did I) assumed that you were using a TYPE_FORWARD_ONLY result set. Then you _have_ to traverse the rows twice, either by first calling count(*) and then actually executing your query, or looping over the first rs and count rows and then loop over a second rs (resulting from a new execute) to fill in your array. (The count(*) approach transmits less data across the network, but on the other hand you need to compile two queries. If you do this often and use prepared statements, the compilation cost may not matter. You'll have to test :).
On the other hand; if you're using a scrollable result set, you can loop over it to find the size, allocate your array, rewind and fill it. But then you typically materialize the entire result set so you would need enough memory for _both_ the result set and your array (unless your driver does some tricks and only materializes parts of the scrollable result set, but then you would get additional roundtrips to the server "behind your back").
But why can't you use an ArrayList rather than an array? Typically that would result in ~ log2(n) reallocation+copy operations. If you have a heuristic which estimates how many rows will be returned, and set the initial capacity accordingly, you can get even fewer. Just my $.02.
 Signature dt
Questions about Derby/Java DB? Please visit http://db.apache.org/derby/derby_mail.html
Arne Vajhøj - 11 Dec 2007 00:05 GMT > On the other hand; if you're using a scrollable result set, you can loop > over it to find the size, You don't even need to loop.
Statement stmt = con.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_UPDATABLE); ResultSet rs = stmt.executeQuery("SELECT * FROM T1"); rs.last(); int n = rs.getRow(); rs.first();
But I would never use that construct.
Arne
Lew - 11 Dec 2007 01:41 GMT >> On the other hand; if you're using a scrollable result set, you can loop >> over it to find the size, [quoted text clipped - 10 lines] > > But I would never use that construct. Arne: What are the dangers of that construct?
I'm guessing that it's the overhead of going first / last / first on large datasets.
 Signature Lew
Arne Vajhøj - 12 Dec 2007 00:09 GMT >>> On the other hand; if you're using a scrollable result set, you can loop >>> over it to find the size, [quoted text clipped - 15 lines] > I'm guessing that it's the overhead of going first / last / first on > large datasets. Most (maybe all) drivers will implement that scrollable feature by reading all data into memory.
medium data size => bad performance
large and huge data => OutOfMemoryError
Arne
Guido Weber - 10 Dec 2007 17:09 GMT Hi Christoph
as others already pointed out, there is SELECT COUNT (*). The advantage of that is that the counting can be done in the DB instead of sending all records from the DBto your app. But, as you want to read all records anyway, why not store them in a Java collection (say a List), and use .toArray() to get an array out of the collection? That way the data has to be fetched only once, and you need only one DB call.
HTH Guido
Christoph schrieb:
> Hello, > [quoted text clipped - 8 lines] > -- > Chris Arne Vajhøj - 11 Dec 2007 01:21 GMT > I am writing the data of a ResultSet object to an array. To declare > the array length, I need to know the number of rows before copying [quoted text clipped - 3 lines] > PHP has mysql_count_rows, but I'm not sure what that does internally - > it may just iterate over the data too. If you read the fine print in the PHP manual:
http://www.php.net/manual/en/function.mysql-num-rows.php
#Note: If you use mysql_unbuffered_query(), mysql_num_rows() will not #return the correct value until all the rows in the result set have been #retrieved.
You will see that it only return the the right number if you either read all rows to memory with mysql_query or with mysql_unbuffered_query have read all rows.
So in that sense PHP and JDBC are identical.
BTW, I agree completely with all those that have suggested forgetting about the number and store the data in an ArrayList.
Arne
tzvika.barenholz@gmail.com - 11 Dec 2007 14:45 GMT > Hello, > [quoted text clipped - 8 lines] > -- > Chris How about initing a CachedRowSet object from your result set? (http:// java.sun.com/j2se/1.5.0/docs/api/javax/sql/rowset/CachedRowSet.html)
you could then avail yourself of the size() method - and with just a single pass.
T
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|