...
...
<snip>
> I have found that version numbers in a database record greatly complicate the
> db - is a version number *really* part of the data model?
Yes it can be, it is the number of times the record was updated.
> Also, databases have caches, at least all the products I've considered using
> for real applications do. Are you likely to outperform the database cache? By
> enough to matter?
It is not the goal to outperform the database cache. It is the goal to
outperform the total of query+object creation+mapping from resultset to
object+network latency to transfer the object.
When you have 1000 users pulling objects the relieved database,
application server and network resources can be considerable.
> Caching is useful for things like result sets, where you don't want to go back
> to the database to retrieve data the *client* knows it hasn't changed.
[quoted text clipped - 4 lines]
> clients' changes, or can it work for a while with data from a few
> (milli)seconds ago?
No, must be the latest updates.
> Stay away from version information in a database record. How will you know if
> a client's version is stale without doing a database query anyway? Is it
It would be very fast to issue select version_nbr from your_table where
primary_key = ?
> really worth great complications in update, insert and delete logic? I
> predict it won't solve your (perceived) problem anyway.
[quoted text clipped - 5 lines]
>
> - Lew
To further illustrate my point. Take this real world example
table patient (patient_id, last_name, first_name, sex, dob,... etc for
40 fields).
table appointment (id, appt_date,... etc for 30 fields).
table orders (order_id, order_name, ... etc for 40 fields).
table results (result_id, order_id, result_name, etc. for 20 fields).
I pull up a patient record and it pulls in:
1 patient rows
8 appointment rows
80 order rows
800 result rows
All row data are mapped to an object which adds significant business
methods to the data.
To pull in that much data may take non trivial resources and at least
2-5 seconds. The user can then browse, filter, sort, group and do many
operations on the local workstation. All operations are fast since the
object data is held in memory.
However the user is waiting for a new set of results. Every 10 minutes
they click a refresh button. To pull in the same data is a wasteful
use of resources. If you have 100 users simulataneously pressing
refresh then you have a significant performance bottleneck. You would
need a very fast database server and many application servers to scale
and keep the same 2-5 seconds. At some point you would probably need
to cluster the database. This becomes very expensive and starts to
push the cost of the hardware so that many organizations cannot afford
it.
But take a step back and look at the problem. For 98/100 of the
patients the data did not change. Lets suppose that each order object
has a version stamp which is updated every time anything on the order
or underlying tables for the results changes. A simple, single query
of
select version_nbr from orders where order_in (...) would identify
whether the local PC objects are still fresh or whether the object
needs to be extracted.
Now this would *never* work with a legacy system, there are just too
many possibilities that someone has written a query/batch job or
something that will update the database without updating the version
nbr. But with a new system you can enforce it from the get go.
I agree dont tune till you need to, dont fix it until you have to, but
I know this problem is coming.
This is undoubtably less applicable to simple applications which use
resultsets directly but that is not where I am coming from. The
complexity of the domain is significant and needs the DTO architecture,
fat objects etc. There is a performance cost to that of course, which
is what I am looking to offset.
Chris Uppal - 14 Nov 2006 13:26 GMT
> But take a step back and look at the problem. For 98/100 of the
> patients the data did not change. Lets suppose that each order object
[quoted text clipped - 5 lines]
> whether the local PC objects are still fresh or whether the object
> needs to be extracted.
I can't comment on your intuition about how much caching is necessary, nor on
how you support/implement that at the data level (you may well be right about
both, I have no means to judge). But it seems that you are considering moving
this out of the implementation domain and into the data-modelling domain. I.e.
your data model is specificially designed to allow applications to understand
the history of changes to values. If so, then I suggest dropping the use of
the word "cache" altogether. That suggests a low-level implementation detail;
one that you'd expect to be handled automatically (if tune-ably) in the bowels
of the data access layer (not unlike the caching that happens inside each
result-set). But since versioning is part of your data model, it should be
visible at the application level -- just as much as, say, patient names are --
and your code for handling it should reside at that level too.
You applications may /use/ the versioning information to implement local
caches, but that is no business of the database, nor of any (putative) O-R
tools you may use. Similarly, it would be orthogonal to any data-access
Patterns you might happen to fancy using.
So you are looking for patterns (or Patterns) for handling versioned data. If
you think of it like that, and phrase it like that, then you may find more
information than you would if you just talk about "caching" (which is bound to
attract just the kind of discussion we have just had here).
I hope that makes some sort of sense.
-- chris