Responding to Timasmith...
> The application needs to solve a particular problem in the most
> efficient manner possible. That often requires abstractions and
> relationships that are tailored to the problem in hand rather than the
> database.
Why couldn't the database schema be tailored to the problem in hand?
> For example, one may have to minimize searches or at least
> the scope of searches.
Why?
> The application needs to manage behavior as well as data and ensure
> proper synchronization between them.
Doesn't a RDB has behavior?
> The application is likely to be abstracted at a different level of
> abstraction than the database. For example, the database is likely to
> abstract a telephone number as a simple domain but a telemarketing
> program may need to understand area and country codes.
What stops you from having two different columns or use substring?
> The notions of identity and delegation are often quite different between
> an application view and a database view. For example, an Account table
[quoted text clipped - 4 lines]
> Control classes can be instantiated for any number of different tables
> while the Account table has been split up into multiple classes.
Are you talking about some kind generic user interface here? Normally a
GUI use to be tailored to the business problem (or use case). If the
database schema is tailored to the business problem, what is the
problem with a hard-wired mapping between the schema and the GUI?
> one
> wants to solve the problem in hand first and then worry about how the
> problem solution talks to the persistence mechanisms.
Do you claim that a RDB is only about persistence? Have you been
reading comp.object the last weeks?
> If the persistence access is properly encapsulated in a subsystem
> dedicated to talking to persistence, one can provide a subsystem
> interface that is tailored to the problem solution's needs ("Save this
> pile of data I call 'X'" and "Give me back the pile of data I saved as
> 'X'"). Then it is not difficult to recast those requests in terms of
> queries; one just needs a mapping between 'X' and tables/tuples.
To do this, you obviously not need a RDB. Queries is overkill in this
scenario, just use a simple directory service or low-level file index
system.
> In addition, if one abstracts the subsystem to capture the invariants of
> the RDM, one has quite generic objects like Table, Tuple, Attribute
[quoted text clipped - 3 lines]
> directly from the RDB schema). So one only needs a mapping of identity
> between the interface messages' data packets and the generic instances.
Do you have any web links to detailed examples of this?
> If one captures RDM invariants in the persistence access subsystem, then
> the subsystem becomes reusable across applications and databases.
Before you said that the interface to the "persistence access
subsystem" should be "tailored to the specific problem at hand" for the
current application. How can such interface be reusable accross
different applications?
I assume that you already know that if you want your application
reusable accross different SQL databases, there are much more efficient
methods? The only reason for encapsulation is if you want your
application to be reusable accross different database paradigms
(relational, hierachial, network, etc).
> By
> tying business abstractions to the database schema you are /forcing/ the
> solution to be touched when just the database changes.
How can the database schema change if the business abstractions is not
changed?
> The problem solution should be
> indifferent to whether the data is stored in an RDB, and OODB, flat
> files, or on clay tablets.
Why? It is a high cost associated to this. If you have an
implementation using flat files, why would you also make an
implementation using a RDB?
> If the application is properly partitioned, then the problem solution
> talks to a generic interface that reflects its needs for data.
This is a good description of SQL.
> So long
> as the requirements for the problem solution do not change, that
> interface does not change -- regardless of what happens to the database
> schema.
The database schema doesn't have to change unless the requirements for
the problem solution change.
> If the solution requirements change, then there may be different data
> needs and that would be reflected in changes to the persistence
> subsystem interface. Then one would have modify the subsystem and/or
> the database. But that is exactly the way it should be because the
> database exists to serve the needs of the application, not the other way
> around.
Indeed.
> This is one of the problems of trying to map 1:1. Database tables in a
> subclassing relationship can and are instantiated separately but in the
[quoted text clipped - 12 lines]
> [Individual] [Corporate]
> + limit + division
The corresponding tables would look like.
customer(customerid, name, address)
individual_customer(customerid, limit)
corporate_customer(customerid, division)
> When the solution side needs to instantiate a corporate customer it asks
> the persistence access for one by name and gets back {address,
> division}. The solution then routinely instantiates a single instance
> of [Corporate]. All the drudge work of forming a join query across two
> tables is handled by the persistence subsystem.
select address, division
from corporate_customer cc
join customer c on cc.customerid=c.customerid
where name=?
How can it possibly be simpler?
> Of course there is no free lunch.
Indeed.
> The price of isolating persistence
> access is that one must encode and decode messages on each side of the
> interface. So there is a trade-off between later maintenance effort and
> current performance.
Did someone prove that later maintenance would be easier using a
persistence subsystem?
> For CRUD/USER applications that usually isn't
> worthwhile because the application doesn't process the data in any
> significant fashion; it just passes it in a pipeline between the DB and
> the UI and performance is limited by DB access.
But isn't it a lot of applications that are 70% CRUD/USER and 30% more
complex features? How do we know what approach to use when starting
designing an application?
In you opinion, can we use a framework as described in this thread, or
can we only use RAD IDEs? Do you have some pointers to existing RAD IDE
products (except from MS Access).
> But for larger, more complex applications one typically accesses the
> data multiple times in various ways.
Different SELECT statements?
Fredrik Bertilsson
http://moonbird.sourceforge.net
H. S. Lahman - 13 Jun 2006 21:19 GMT
Responding to Frebe73...
We have been around on these issues before. There has been far too much
of this stuff on comp.object recently so I am not going to feed the DBMS
trolls by responding.
*************
There is nothing wrong with me that could
not be cured by a capful of Drano.
H. S. Lahman
hsl@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
frebe73@gmail.com - 14 Jun 2006 05:59 GMT
> We have been around on these issues before. There has been far too much
> of this stuff on comp.object recently so I am not going to feed the DBMS
> trolls by responding.
Yes, it is ofcourse much easier making a lot of claims without having
to support them. You prefer writing about how to use a RDBMS without
having to debate with people that actually knows about how to use a
RDBMS.
Fredrik Bertilsson
http://moonbird.sourceforge.net
Nice reply and you have prompted food for thought. How about this:
Currently I have an object - say an OrderDataModel with fields matching
database columns. However I do extend OrderDataModel with OrderModel
and all business logic takes place on OrderModel. Any business logic
contained by the OrderModel is there - not on the DataModel which is
recreated with each data model update. Adding fields will not really
affect anything - it is only if I break up a table that impact is felt.
However if my real business logic, the algorithms such as depreciation
which has little to do with persistence as in completly separate
objects which can be tested on their own and have no direct ties to the
data model - surely that solves the majority of the issues you have
described?
> Responding to Timasmith...
>
[quoted text clipped - 195 lines]
> http://www.pathfindermda.com/about_us/careers_pos3.php.
> (888)OOA-PATH
H. S. Lahman - 13 Jun 2006 22:47 GMT
Responding to Timasmith...
> Currently I have an object - say an OrderDataModel with fields matching
> database columns. However I do extend OrderDataModel with OrderModel
> and all business logic takes place on OrderModel. Any business logic
> contained by the OrderModel is there - not on the DataModel which is
> recreated with each data model update. Adding fields will not really
> affect anything - it is only if I break up a table that impact is felt.
Why do you need OrderDataModel at all? There is nothing to prevent
OrderModel from having both knowledge (presumably initialized from RDB
fields) and behavior (to solve the problem in hand).
My concern is with how the OrderModel gets instantiated. To do that one
needs data. For a new instance it may come from the UI. But more
commonly it will come from the DB. It is the mechanics of getting that
data that I am concerned about. For example, you might have a code
fragment that looked like:
DBAccess.getOrderInfo (orderNo, &customer, &itemCount, ...);
myOrderModel = new OrderModel (orderNo, customer, itemCount, ...);
where DBAccess is a GoF Facade pattern class that acts as interface to
the persistence access subsystem.
At this point it doesn't matter if OrderModel in the application maps
1:1 to an OrderModel table in the RDB or not. All the boilerplate of
query formation, dataset extraction, etc. has been delegated away
through the DBAccess interface. You don't even care if the data is in an
RDB, much less whether the legacy schema is well-formed.
At this level of solution abstraction you are only interested in what is
critical to solving this part of the problem in hand. Specifically: (A)
get some needed data from the data store and (B) instantiate an
OrderModel. The code fragment is very well focused on those two things
without any distracting detail about legacy schemas and their access
mechanisms.
> However if my real business logic, the algorithms such as depreciation
> which has little to do with persistence as in completly separate
> objects which can be tested on their own and have no direct ties to the
> data model - surely that solves the majority of the issues you have
> described?
There are some practical reasons why testing will be easier with my
solution. That is, building a test harness infrastructure for just
DBAccess Facade will be easier than providing such infrastructure for a
bunch of OrderDataModel objects (or their creators).
However, the big reason is related to maintenance when the schema
changes. Where are those changes? In my solution there are all in the
DB access subsystem. The interface to that subsystem does not change
and the solution logic itself is not touched in any way. Therefore,
once testing can demonstrate that the DBAccess subsystem gets the same
data from the new DB for the interface (which you can demonstrate with a
regression test of the subsystem), you can be absolutely sure that the
solution business logic still works correctly.
However, if you have to make changes to the OrderDataModel objects (or
their creators) to reflect the new DB schema, you cannot have that
certainty. That's because you are mucking with object implementations
/in/ the solution subsystem but you are testing it at the system or
subsystem level. No matter how isolated they are supposed to be from
the business objects like DataModel, you cannot guarantee that those
changes don't break the solution logic. If you were very careful (and
somebody looked over the maintainer's shoulder for any interim
maintenance) the probability of breaking the business logic will be low.
But it will still be non-zero. [I can give you some marvelous
examples of what-could-possibly-go-wrong? situations that did. B-)]
Why would the DB schema change without requirements changes to the
business logic? This is actually fairly common for enterprise data
where requirements changes for some client applications require schema
changes but those requirements aren't relevant to other applications
using the same data. However, in your case I would guess that is quite
likely.
You indicated the legacy DB is ill-formed. If so, I would expect there
to be mounting pressure to fix it or replace it with a well-formed DB.
No matter how you solve your current problem, an ill-formed DB is going
to lead to problems later. Fixing those in a kludged solution is going
to require disproportionate effort. The more problems that show up and
the more resources it takes to fix them, the more pressure there will be
to replace it.
<aside>
Note that there is an added bonus for my solution here. Once you have
the interface in place to talk to the database it becomes much easier to
replace the database. That's because (A) all the changes are in one
place and (B) the changes are isolated from the business logic. In
fact, what I have suggested is part of a technique for piecemeal legacy
replacement in general. That is, the subsystems don't have to be
DBAccess; they can be any part of the legacy code. Basically the steps are:
(1) Design the new system's ideal partitioning into subsystems. This is
where you want to be with the final replacement application.
(2) Select a subsystem, design it, implement it, and test it.
(3) Insert the interface for the new subsystem into the legacy code. It
accesses the legacy functionality. This simply isolates the legacy code
that will be replaced by the new subsystem.
(4) Regression test the legacy code. This ensures that the new
interface works properly.
(5) Excise the legacy code whose requirements are addressed by the new
subsystem.
(6) Rebuild the application with the new subsystem. This should be
trivial because the legacy code already accesses the new interface from (3).
(7) Test the whole application. This should mostly Just Work at this point.
The third step is where all the project risk resides and it will be the
toughest one because it will inevitably require legacy surgery.
However, it is pretty well focused because one really isn't changing
anything; the legacy code is still solving the problem and one is just
isolating it. That is pretty much the problem I think you are facing
with the legacy database. You need to find a way to isolate the legacy
database from the business logic. But once you do that properly,
replacing the database becomes relatively simple.
</aside>

Signature
*************
There is nothing wrong with me that could
not be cured by a capful of Drano.
H. S. Lahman
hsl@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
timasmith@hotmail.com - 14 Jun 2006 03:47 GMT
Hmm, I might be having an epiphany... so case in your point:
I had an FormControl object which extended FormDataControl which is
regenerated from the datamodel and populated by generated database
code.
On the fields inherited by FormControl I had Application, View and Type
- represented by the database table. A module or two used FormControl
to dynamically generate controls on a form.
Later on I moved Application, View and Type into its own table and
called ApplicationView - then I added ApplicationViewId to FormControl
table as a foreign key. All good normalizing for a greater purpose -
but I broke (and quickly fixed) the control generation.
Your point is that if I had a FormControl object with fields
appropriate to its function, I could change the mapping from one to two
tables and retest the subsystem minimizing the impact.
As a complex system grows and grows being able to retest a subsystem
only provides greater value. Certainly I couldnt forsee the need to
normalize the table - it grew as I added fields due to unforseen
requirements.
I guess I fear resorting to hand crafting mapping code but point well
taken about that it is often easier than rewriting (or even just
testing) business logic.
Now moving on to a suggestion that meets both sides...
So my OrderDataModel does not actually know anything about persistence
a OrderDataAdapter (also auto generated) has the generate select,
insert and update and all the mapping from db result sets to the data
model. So in that respect I am covered - I could change the database
quite easily (actually I support several databases).
The reason I use OrderModel for one to isolate the business logic from
the data fields for ease of readability and from a practical standpoint
I recreate the source file OrderDataModel.java.
In my problem above with normalizing a table into two *IF* I had
changed my code generator to continue to recreate FormDataControl but
from a query which combined the two tables - then it is win-win - my
FormControl does not change and I can continue to regenerate the source
files.
Perhaps it only works with simple normalization.
I guess my other realization is that on my client side code I probably
have far too many dependencies on my OrderModel, FormControl etc. I
should replace as many as possible with Interfaces within the local
package - increasing the cohesiveness of a package and reducing
external dependencies.
In some ways that also meets your goal - sure I can recreate the
datamodels as long as they continue to implement the appropriate
interfaces then there is no need to retest those packages.
Tim
> Responding to Timasmith...
>
[quoted text clipped - 134 lines]
> http://www.pathfindermda.com/about_us/careers_pos3.php.
> (888)OOA-PATH
H. S. Lahman - 14 Jun 2006 18:52 GMT
Responding to Timasmith...
> I had an FormControl object which extended FormDataControl which is
> regenerated from the datamodel and populated by generated database
[quoted text clipped - 12 lines]
> appropriate to its function, I could change the mapping from one to two
> tables and retest the subsystem minimizing the impact.
Exactly. Even better, you should not even need to retest the solution
logic if it is encapsulated in its own subsystem. Since it isn't
touched all you need to demonstrate is that the DB access subsystem
interface still provides the same data values. [Prudence would suggest
a system regression test was in order just in case the analysis that the
interface was didn't need to change was flawed. B-)]
> As a complex system grows and grows being able to retest a subsystem
> only provides greater value. Certainly I couldnt forsee the need to
> normalize the table - it grew as I added fields due to unforseen
> requirements.
Quite so. Aside from the DB access issues application partitioning into
subsystems encapsulated by pure message-based data transfer interfaces
are a very good idea in general. By separating concerns and
encapsulating them one can perform fairly intense functional testing of
the individual subsystems. In effect one can do large scale unit tests.
Better yet, it can be done in complete isolation from any other parts
of the application because the interfaces are simple. (If I didn't
mention it before, my blog has a category on Application Partitioning.)
> I guess I fear resorting to hand crafting mapping code but point well
> taken about that it is often easier than rewriting (or even just
> testing) business logic.
It's really not that difficult. You are going to have to hand craft SQL
queries and whatnot anyway. The encode/decode of message data packets
is somewhat tedious but it is pretty mundane and pretty much the same
everywhere so one is unlikely to screw up other than typos.
How much effort you put into making the subsystem reusable is another
question. At one extreme for can just hard-code the SQL queries in the
DB access subsystem just like they would have been sprinkled in the
solution code. You just isolate them around Facade interface method
implementations. [It is not uncommon for the DB access "subsystem" to
consist solely of the Facade interface class for simple data stores.
The "subsystem" exists in the Facade class' implementation. Your
compromise solution below is getting very close to that already.]
At the other extreme you provide generic identity mapping through
external configuration data so that you can reuse the subsystem for
other applications. That's not so difficult to code but it requires
more design effort to get it right and there will be more infrastructure
code. You can also get exotic with read/write caching and whatnot.
Whether that is worthwhile is a basic development trade-off between
today and tomorrow.
> Now moving on to a suggestion that meets both sides...
>
[quoted text clipped - 3 lines]
> model. So in that respect I am covered - I could change the database
> quite easily (actually I support several databases).
If you are not encapsulating these in a persistence access subsystem you
still have a test problem. The objects actually doing the DB access may
be logically separate from the solution classes _in theory_, but you
can't be sure. That's because the testing scope still includes both
solution objects and DB access objects. So when you modify the DB
access objects you still need to test the solution logic thoroughly
because you can't be sure you didn't break the solution logic despite
all the good intentions. That problem just gets worse as someone
performs additional maintenance subsequently (e.g., a maintainer
modifying solution functionality may put it in one of your pristine DB
access objects).
If you put the DB access in a separate subsystem, then you don't have
that problem because the solution logic is no longer within the
necessary test scope. All you need to demonstrate is that the DB access
subsystem provides the same data through the interface as it provided
prior to changing its guts. [The caveat about prudence above still
applies, though.]
Another benefit is that the scope of maintenance is better isolated.
That doesn't sound like a big deal but a lot of problems stemming from
maintenance can be traced to making changes in the wrong place. When
the schema changes there is no doubt where the change needs to go if you
have a DB access subsystem (i.e., no debate about putting it it
OrderDataModel or OrderDataAdapter). In addition, there is no
distracting code in the DB access subsystem because all it does is DB
access so it is easier to focus on getting the changes right.
<Hot Button>
Testing can get one to 5-Sigma reliability. But to achieve 6-Sigma and
beyond requires militant defect prevention. One needs to eliminate
/opportunities/ for inserting defects. Isolating DB access (or any
other significantly complex functionality) in a subsystem tends to
foster better focus for both design and maintenance. That translates
into less likelihood of getting things wrong because the context is
simpler. IOW, it is harder to break code that isn't visible in the
scope of change than it is to break code that is visible.
</Hot Button>
> The reason I use OrderModel for one to isolate the business logic from
> the data fields for ease of readability and from a practical standpoint
[quoted text clipped - 5 lines]
> FormControl does not change and I can continue to regenerate the source
> files.
If I understand your concern here, I would argue that encapsulation in a
subsystem would provide even better readability and maintainability.
The separation of concerns is better because it eliminates distractions
when dealing with either the business logic or the DB access logic.
That is, /all/ the objects are business objects or /all/ the objects are
DB objects, depending on which subsystem one is in.
I think maintainability would improve because of better isolation and
decoupling. The decoupling through subsystem interfaces is much
stronger because the objects in each subsystem do not even know that the
objects in the other subsystem even exist.
As far as the files are concerned, the code has to go somewhere so you
are going to have .java files lying around everywhere. However, a
separate subsystem also offers some advantages for deployment, source
control, and configuration management -- such as separately deliverable
DLLs.
> Perhaps it only works with simple normalization.
The subsystem approach works for any changes on the DB side. You can
switch to clay tables using CODASYL if you want. Which segues to
another advantage. Some changes are more extensive than others.
Suppose you decide that to enhance DB performance you need to provide
read-ahead caching of large joins. How difficult is that going to be to
do with an object like OrderDataAdapter? You are probably going to need
some major surgery and a few more objects -- all right in the middle of
your business logic. My point is that once one has encapsulated in a
subsystem, one has open-ended enhancement capability independent of the
business logic.
*************
There is nothing wrong with me that could
not be cured by a capful of Drano.
H. S. Lahman
hsl@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH