Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / Databases / December 2007

Tip: Looking for answers? Try searching our database.

MySQL ResultSet - count rows?

Thread view: 
Christoph - 10 Dec 2007 08:34 GMT
Hello,

I am writing the data of a ResultSet object to an array. To declare
the array length, I need to know the number of rows before copying
them - is there a way to read the row count instantly, without having
to iterate anything?

PHP has mysql_count_rows, but I'm not sure what that does internally -
it may just iterate over the data too.

--
Chris
Dyreatnews@sun.com - 10 Dec 2007 09:34 GMT
> Hello,
>
> I am writing the data of a ResultSet object to an array. To declare
> the array length, I need to know the number of rows before copying
> them - is there a way to read the row count instantly, without having
> to iterate anything?

No. You could execute 'SELECT COUNT(*) ...' first but then you
essentially traverse all the rows twice.

(There could be non-standard MySQL extensions that lets you do this, I
don't know)

> PHP has mysql_count_rows, but I'm not sure what that does internally -
> it may just iterate over the data too.

Signature

dt

Questions about Derby/Java DB? Please visit
http://db.apache.org/derby/derby_mail.html

Christoph - 10 Dec 2007 11:47 GMT
On Dec 10, 10:34 am, Dyreatn...@sun.com wrote:
> > Hello,
>
[quoted text clipped - 16 lines]
>
> Questions about Derby/Java DB? Please visithttp://db.apache.org/derby/derby_mail.html

Okay, I'm working around it by using

while(result.next()) rowcount++;

At least incrementing the pointer is not a very expensive action. Even
when I get a few thousand results back (which I will) the counting
will take far less time than the copying and instance creation that
follows.
Ruud de Koter - 12 Dec 2007 22:12 GMT
>> Hello,
>>
[quoted text clipped - 5 lines]
> No. You could execute 'SELECT COUNT(*) ...' first but then you
> essentially traverse all the rows twice.

One should realize that in doing so you are not determining the number
of rows at the moment of retrieval, but the number of rows a few moments
before the actual retrieval. Whether this is an important distinction
depends with the volatility of the table and the use you will make if
this number. On any system used by more than one user/process, these
numbers *will* be different, sooner or later.

In short: this is not the way to go.

Regards,

Ruud de Koter.

> (There could be non-standard MySQL extensions that lets you do this, I
> don't know)
>
>> PHP has mysql_count_rows, but I'm not sure what that does internally -
>> it may just iterate over the data too.
Lew - 13 Dec 2007 02:56 GMT
Dyreatnews@sun.com wrote:
>> No. You could execute 'SELECT COUNT(*) ...' first but then you
>> essentially traverse all the rows twice.

> One should realize that in doing so you are not determining the number
> of rows at the moment of retrieval, but the number of rows a few moments
[quoted text clipped - 4 lines]
>
> In short: this is not the way to go.

You could wrap the SELECT COUNT(*) and the row SELECT in a transaction, thus
guaranteeing at least a consistent view if not a current one.

Signature

Lew

Ruud de Koter - 13 Dec 2007 08:57 GMT
> Dyreatnews@sun.com wrote:
>>> No. You could execute 'SELECT COUNT(*) ...' first but then you
[quoted text clipped - 11 lines]
> You could wrap the SELECT COUNT(*) and the row SELECT in a transaction,
> thus guaranteeing at least a consistent view if not a current one.

True. I may have gotten carried away as I was writing. What matters is
that the double-select solution has its issues. One should be aware of
those and decide whether these are acceptable, can be mitigated (as you
propose), or are prohibitive.

Regards,

Ruud.
Dyreatnews@sun.com - 13 Dec 2007 09:15 GMT
> Dyreatnews@sun.com wrote:
>>> No. You could execute 'SELECT COUNT(*) ...' first but then you
[quoted text clipped - 13 lines]
> transaction, thus guaranteeing at least a consistent view if not a
> current one.

Ruud is making a very good point. I've just been told that the the
SELECT COUNT(*) approach will only be guaranteed to work if you execute
both SELECTs in a single transaction AND use SERIALIZABLE isolation
level. Even at REPEATABLE READ you may see new rows added since the
previous select, (but all rows counted will have the same value
throughout the transaction).

Signature

dt

Questions about Derby/Java DB? Please visit
http://db.apache.org/derby/derby_mail.html

Lew - 13 Dec 2007 15:12 GMT
> Ruud is making a very good point. I've just been told that the the
> SELECT COUNT(*) approach will only be guaranteed to work if you execute
> both SELECTs in a single transaction AND use SERIALIZABLE isolation
> level. Even at REPEATABLE READ you may see new rows added since the
> previous select, (but all rows counted will have the same value
> throughout the transaction).

You won't "see" those new rows added until after the transaction completes.

Even at "read uncommitted" isolation you likely will be fine.  It depends on
whether you need to see rows added during the transaction or not, i.e., on
whether you want the values to represent the db state at the end of your
transaction, or if it's sufficient to see the state at the beginning of the
transaction.

After all, rows could be added a nanosecond after your transaction, too, and
you wouldn't know it.

There's nothing wrong with seeing the db state as of transaction start, in
most cases.  There will *always* be latency in your queries.  The point of the
transaction is that it gives you a consistent view of the data, and that's
almost always sufficient.

Bear in mind that not all configurations of MySQL tables support transactions.

Signature

Lew

Dyreatnews@sun.com - 13 Dec 2007 19:02 GMT
>> Ruud is making a very good point. I've just been told that the the
>> SELECT COUNT(*) approach will only be guaranteed to work if you execute
[quoted text clipped - 4 lines]
>
> You won't "see" those new rows added until after the transaction completes.

I don't know how it works in MySQL, but in Java DB/Derby, you will if
the isolation level is REPEATABLE READ or lower. The following
example will print 'T has 3 rows' and then list all 6 rows if c1
executes with REPEATABLE READ. With SERIALIZABLE, the second connection
gets a lock timeout when trying to insert the last 3 rows.

Connection c1 = DriverManager.getConnection(args[1]);
Statement s1 = c1.createStatement();
try { s1.execute("drop table T"); } catch (SQLException e) {}
s1.execute("create table T(i int unique not null, j int)");
s1.execute("insert into T values (0, 0), (1, 1), (2, 2)");

c1.setAutoCommit(false);
c1.setTransactionIsolation(args[2].equals("serializable")?
Connection.TRANSACTION_SERIALIZABLE:Connection.TRANSACTION_REPEATABLE_READ);
ResultSet count = s1.executeQuery("select count(*) from T");
count.next();
System.out.println("T has "+count.getInt(1)+" rows");

Connection c2 = DriverManager.getConnection(args[1]);
Statement s2 = c2.createStatement();
s2.execute("insert into T values (5, 5), (6, 6), (7, 7)");
       
ResultSet rows = s1.executeQuery("select * from T");
while (rows.next()) {
  System.out.println(rows.getInt(1)+", "+rows.getInt(2));
}

> Even at "read uncommitted" isolation you likely will be fine.  It
> depends on whether you need to see rows added during the transaction
[quoted text clipped - 11 lines]
>
> Bear in mind that not all configurations of MySQL tables support transactions.

Signature

dt

Questions about Derby/Java DB? Please visit
http://db.apache.org/derby/derby_mail.html

Arne Vajhøj - 13 Dec 2007 20:49 GMT
>> Ruud is making a very good point. I've just been told that the the
>> SELECT COUNT(*) approach will only be guaranteed to work if you execute
[quoted text clipped - 4 lines]
>
> You won't "see" those new rows added until after the transaction completes.

You will.

user 1 - SELECT COUNT(*)
user 2 - INSERT
user 1 - SELECT

Only SERIALIZABLE will prevent this.

Arne
Ruud de Koter - 13 Dec 2007 21:28 GMT
>>> Ruud is making a very good point. I've just been told that the the
>>> SELECT COUNT(*) approach will only be guaranteed to work if you execute
[quoted text clipped - 15 lines]
>
> Arne

To make this decision a real dilemma, using serializable transactions
will have 'interesting' consequences for resource contention, hence
timeouts and (percieved) performance in any multi-user environment. This
really is something to consider carefully.

Re-reading the original post, I saw that this is about writing a
ResultSet's data into an Array. It 'd be much wiser to use some subtype
of List in this case, and just avoid knowing the number of rows in
advance. I think one of the other posters already suggested this. Doing
so would keep one away from this minefield.

Ruud.
Lew - 14 Dec 2007 00:32 GMT
>>> Ruud is making a very good point. I've just been told that the the
>>> SELECT COUNT(*) approach will only be guaranteed to work if you execute
[quoted text clipped - 13 lines]
>
> Only SERIALIZABLE will prevent this.

I stand corrected.  Thanks, Dyreatnews and Arne.

Signature

Lew

Sabine Dinis Blochberger - 10 Dec 2007 09:45 GMT
> Hello,
>
[quoted text clipped - 5 lines]
> PHP has mysql_count_rows, but I'm not sure what that does internally -
> it may just iterate over the data too.

There is a SQL statement to the effect: select count(*) from tablename
Look it up to see if it can be optimized.
Signature

Sabine Dinis Blochberger

Op3racional
www.op3racional.eu

Christoph - 10 Dec 2007 13:14 GMT
On Dec 10, 10:45 am, Sabine Dinis Blochberger <no.s...@here.invalid>
wrote:
> > Hello,
>
[quoted text clipped - 12 lines]
>
> Op3racionalwww.op3racional.eu

Thanks. Right now I'm looping over result.next, but I'll try sending a
COUNT() query to compare speeds.
Dyreatnews@sun.com - 10 Dec 2007 19:21 GMT
> On Dec 10, 10:45 am, Sabine Dinis Blochberger <no.s...@here.invalid>
> wrote:
[quoted text clipped - 17 lines]
> Thanks. Right now I'm looping over result.next, but I'll try sending a
> COUNT() query to compare speeds.

Not entirely sure, but I'm guessing that Sabine (as did I) assumed that
you were using a TYPE_FORWARD_ONLY result set. Then you _have_ to
traverse the rows twice, either by first calling count(*) and then
actually executing your query, or looping over the first rs and count
rows and then loop over a second rs (resulting from a new execute) to
fill in your array. (The count(*) approach transmits less data across
the network, but on the other hand you need to compile two
queries. If you do this often and use prepared statements, the
compilation cost may not matter. You'll have to test :).

On the other hand; if you're using a scrollable result set, you can loop
over it to find the size, allocate your array, rewind and fill it. But
then you typically materialize the entire result set so you would need
enough memory for _both_ the result set and your array (unless your
driver does some tricks and only materializes parts of the scrollable
result set, but then you would get additional roundtrips to the server
"behind your back").

But why can't you use an ArrayList rather than an array? Typically that
would result in ~ log2(n) reallocation+copy operations. If you have a
heuristic which estimates how many rows will be returned, and set the
initial capacity accordingly, you can get even fewer. Just my $.02.

Signature

dt

Questions about Derby/Java DB? Please visit
http://db.apache.org/derby/derby_mail.html

Arne Vajhøj - 11 Dec 2007 00:05 GMT
> On the other hand; if you're using a scrollable result set, you can loop
> over it to find the size,

You don't even need to loop.

         Statement stmt =
con.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE,
ResultSet.CONCUR_UPDATABLE);
         ResultSet rs = stmt.executeQuery("SELECT * FROM T1");
         rs.last();
         int n = rs.getRow();
         rs.first();

But I would never use that construct.

Arne
Lew - 11 Dec 2007 01:41 GMT
>> On the other hand; if you're using a scrollable result set, you can loop
>> over it to find the size,
[quoted text clipped - 10 lines]
>
> But I would never use that construct.

Arne: What are the dangers of that construct?

I'm guessing that it's the overhead of going first / last / first on large
datasets.

Signature

Lew

Arne Vajhøj - 12 Dec 2007 00:09 GMT
>>> On the other hand; if you're using a scrollable result set, you can loop
>>> over it to find the size,
[quoted text clipped - 15 lines]
> I'm guessing that it's the overhead of going first / last / first on
> large datasets.

Most (maybe all) drivers will implement that scrollable
feature by reading all data into memory.

medium data size => bad performance

large and huge data => OutOfMemoryError

Arne
Guido Weber - 10 Dec 2007 17:09 GMT
Hi Christoph

as others already pointed out, there is SELECT COUNT (*). The advantage
of that is that the counting can be done in the DB instead of sending
all records from the DBto your app.
But, as you want to read all records anyway, why not store them in a
Java collection (say a List), and use .toArray() to get an array out
of the collection? That way the data has to be fetched only once, and
you need only one DB call.

HTH
  Guido

Christoph schrieb:
> Hello,
>
[quoted text clipped - 8 lines]
> --
> Chris
Arne Vajhøj - 11 Dec 2007 01:21 GMT
> I am writing the data of a ResultSet object to an array. To declare
> the array length, I need to know the number of rows before copying
[quoted text clipped - 3 lines]
> PHP has mysql_count_rows, but I'm not sure what that does internally -
> it may just iterate over the data too.

If you read the fine print in the PHP manual:

http://www.php.net/manual/en/function.mysql-num-rows.php

#Note: If you use mysql_unbuffered_query(), mysql_num_rows() will not
#return the correct value until all the rows in the result set have been
#retrieved.

You will see that it only return the the right number if you
either read all rows to memory with mysql_query or with
mysql_unbuffered_query have read all rows.

So in that sense PHP and JDBC are identical.

BTW, I agree completely with all those that have suggested
forgetting about the number and store the data in an
ArrayList.

Arne
tzvika.barenholz@gmail.com - 11 Dec 2007 14:45 GMT
> Hello,
>
[quoted text clipped - 8 lines]
> --
> Chris

How about initing a CachedRowSet object from your result set? (http://
java.sun.com/j2se/1.5.0/docs/api/javax/sql/rowset/CachedRowSet.html)

you could then avail yourself of the size() method - and with just a
single pass.

T


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.