Monday, February 20, 2012

Orphans

Hi,
In simple terms, in SQL2K, I have 2 header tables A and B and a junction
table AB. All of keys are defined with the Cascade Delete constraint. Table
A has a one-to-many relationship with table B. My problem is that if a row
in table A is deleted, the rows that reference it in AB are deleted (which
is good) but table B is untouched. I need to delete all rows in B that are
referenced in the AB row that is deleted in the cascade operation. If those
rows are not deleted, they are left orphaned with no purpose in life except
to take up room on the disk.
I can manually code this but I thought that there might be a chance that I
am missing something a lot simpler. Thanks for any ideas.
SteveSteve Prescott wrote:
> Hi,
> In simple terms, in SQL2K, I have 2 header tables A and B and a junction
> table AB. All of keys are defined with the Cascade Delete constraint. Tabl
e
> A has a one-to-many relationship with table B. My problem is that if a row
> in table A is deleted, the rows that reference it in AB are deleted (which
> is good) but table B is untouched. I need to delete all rows in B that are
> referenced in the AB row that is deleted in the cascade operation. If thos
e
> rows are not deleted, they are left orphaned with no purpose in life excep
t
> to take up room on the disk.
If there is no direct relationship from A->B, but instead it goes
through AB it isn't going to happen. This is because AB references A and
B. B does not reference AB.
This may help you visualize. A->AB, B->AB. When you cascade the
delete/update it only follows the direction of the arrow.
Aaron Weiker
http://aaronweiker.com/
http://www.sqlprogrammer.org/|||On Tue, 8 Feb 2005 19:06:38 -0600, Steve Prescott wrote:

>Hi,
>In simple terms, in SQL2K, I have 2 header tables A and B and a junction
>table AB. All of keys are defined with the Cascade Delete constraint. Table
>A has a one-to-many relationship with table B.
Hi Steve,
A junction table is not needed for a one-to-many relationship. If you
really need a one-to-many relationship, then drop the junction table and
instead, insert a foreign key to A in table B.
If you really need the junciton table, then your relationship is
many-to-many, not one-to-many.

> My problem is that if a row
>in table A is deleted, the rows that reference it in AB are deleted (which
>is good) but table B is untouched. I need to delete all rows in B that are
>referenced in the AB row that is deleted in the cascade operation. If those
>rows are not deleted, they are left orphaned with no purpose in life except
>to take up room on the disk.
If you really need a one-to-many relationship, then changing the design as
indicated above will take care of this - rows from B will be deleted when
a row from A is deleted.
If you actually have a many-to-many relationship, then you should rethink
what you are doing. If a business terminates a project, the rows that
store the assignment of employees to that project should be removed as
well (that's the junction table) - but you wouldn't want to remove the
rows holding the personnel data for those employees, right?
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Thanks Aaron and Hugo,
When I called the A/B relationship one to many I meant that any row in B
that has a reference to A in the AB table is exclusive to that row in A. A
needs to be able to own many B's and every B that it owns, it owns
exclusively so that if A is deleted, every B that it owns needs to be
deleted also. The best example I can think of would be that A is a table of
customers and B is a table of orders where if you delete a customer, the
orders have to be deleted as well.
The actual problem is quite a bit more complicted. The reason I designed it
that way is because a B can also be owned by a C, D, E or F, etc. One B
cannot be owned by more than one owner even if the owners are of different
types.
Given that the Cascade Delete constraint is not fully functional for the
need, I was wondering what the best practices would be for deletion of child
rows in this situation. I am working on my own solution now but would be
happy to throw it out for a better one.
Thanks again.
Steve|||On Wed, 9 Feb 2005 17:21:18 -0600, Steve Prescott wrote:
(snip)
>The actual problem is quite a bit more complicted. The reason I designed it
>that way is because a B can also be owned by a C, D, E or F, etc. One B
>cannot be owned by more than one owner even if the owners are of different
>types.
>Given that the Cascade Delete constraint is not fully functional for the
>need, I was wondering what the best practices would be for deletion of chil
d
>rows in this situation. I am working on my own solution now but would be
>happy to throw it out for a better one.
Hi Steve,
What do you mean with "not fully functional"?
The correct (IMO) design in your case, if I'm not misunderstanding you, is
CREATE TABLE B (.....,
....,
A_Key int NULL, -- int is assumption
C_Key int NULL, -- int is assumption
D_Key int NULL, -- int is assumption
E_Key int NULL, -- int is assumption
PRIMARY KEY (...),
FOREIGN KEY (A_Key) REFERENCES A
ON UPDATE CASCADE ON DELETE CASCASE,
FOREIGN KEY (C_Key) REFERENCES C
ON UPDATE CASCADE ON DELETE CASCASE,
FOREIGN KEY (D_Key) REFERENCES D
ON UPDATE CASCADE ON DELETE CASCASE,
FOREIGN KEY (E_Key) REFERENCES E
ON UPDATE CASCADE ON DELETE CASCASE,
CHECK (CASE WHEN A_Key IS NULL THEN 1 ELSE 0 END
+ CASE WHEN C_Key IS NULL THEN 1 ELSE 0 END
+ CASE WHEN D_Key IS NULL THEN 1 ELSE 0 END
+ CASE WHEN E_Key IS NULL THEN 1 ELSE 0 END = 1)
)
Though I should probably add that I'm intrigued - what is the actual
business problem here? One "thing" that can be owned by different "things"
but yet stays the same is not too common. I don't think I ever encoutered
anything like this (and if I did, I've forgotten).
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Hugo,
The business problem is church contributions. It goes like this:
A Batch is a money counting session
A Batch can own one or more Churches
A Church owns People and Families
A Person or Family can own one or more Contributions.
The junction table in question references a Batch, a Church, a Person (or
Family) and a Contribution. Thus the Contribution ('B' in the preceeding
emails) can belong to a Person or a Family.
The simplified AB junction table I referred to actually has a primary key
made up of Batch_ID, Church_ID, Person_ID (or Family_ID) and
Contribution_ID.
This arrangement allows the money counters to count up contributions for
more than one Church at one sitting. They can count Contributions for both
People and Families and refer to the data later by referencing the Batch.
Contributions that go into one Batch are not visible to other Batches even
though the same Church, Person or Family might be in the other Batch.
I am grateful for your code example. I'm not sure if I can re-design the DB
at this point. My design is not exactly a cripple but it could have been
done better. This issue of deleting orphans has brought that to light to me.
Steve|||On Wed, 9 Feb 2005 19:07:20 -0600, Steve Prescott wrote:

>The business problem is church contributions. It goes like this:
> A Batch is a money counting session
> A Batch can own one or more Churches
> A Church owns People and Families
> A Person or Family can own one or more Contributions.
>The junction table in question references a Batch, a Church, a Person (or
>Family) and a Contribution. Thus the Contribution ('B' in the preceeding
>emails) can belong to a Person or a Family.
Hi Steve,
This is partly enlightening, but some new questions arise as well. What
does "A Church owns People and Families" mean? (It sounds more like a sect
than like church <shudder> )
I assume that "owning" refers to a one to many relationship, but that
would mean that each person or family can only donate to one church. Is
that the case? Or can a person / family donate to church 1 and church 2
today, and also donate to church 1 and church 3 during the next batch?
For now, I'll assume the following - let me know if I'm wrong:
* Each contribution is from either one person or one family
* Each person and each familiy is allowed to make multiple contributions
* Each contribution is for one church
* A church may receive more than one contribution
* The same person or family can contribute to different churches
* Each church can receive contributions from multiple persons and families
* Each contribution is counted during one batch
* A batch can consist of more than one contribution
* A batch can consist of contributions of different persons and families
* A person or family can donate contributions during more than one batch
* A batch can consist of contributions for different churches
* A church can receive contributions during more than one batch

>The simplified AB junction table I referred to actually has a primary key
>made up of Batch_ID, Church_ID, Person_ID (or Family_ID) and
>Contribution_ID.
If all the above assumptions are completely correct, then this is quite a
good design (but read on for one possible improvement). But if I made
incorrect assumptions (e.g. if the "owning" implies that a person or
family is not allowed to contribute to more than one church), then this
design holds some redundancy. Building on the example: if Peter may only
donate to the Holy Church of Petersburg, it suffices to store that a
donation is from Peter - explicitly storing that this donation is for the
Holy Church of Petersburg would bhe redundant. There might still be good
reasons to choose it, but you should be awarre of the implications and the
increased possibility of data corruption.
(snip)
>I am grateful for your code example. I'm not sure if I can re-design the DB
>at this point. My design is not exactly a cripple but it could have been
>done better. This issue of deleting orphans has brought that to light to me.[/color
]
Well, if you can re-design, my first advice would be to replace the two
entities Family and Person with one entity: Contributor. My guess is that
the families table and the persons table have a lot of columns in common.
All these shared columns should be moved to the Contributors table; the
columns that are exclusive for either persons (like date of birth) or
families (like number of family members) can either go in seperate
subtables, or in mutually exclusive NULLable columns in the Contributors
table.
Based on my current information (and the above assumptions), my proposal
for the design would be:
BATCHES (Batch_ID (PK), other columns)
CHURCHES (Church_ID (PK), other columns)
CONTRIBUTORS (Contrib_ID (PK), other columns)
PERSONS (Contrib_ID (PK and FK to Contributors), other columns)
FAMILIES (Contrib_ID (PK and FK to Contributors), other columns)
DONATIONS (Batch_ID (FK to Batches), Church_ID (FK to Churches),
Contrib_ID (FK to Contributions), Amount, other columns)
For donations, the PK would be (Batch_ID, Church_ID, Contrib_ID)
For all tables, I'd examine if a surrogate key is really needed. I prefer
to use natuaral keys where possible.
The Persons and Families tables are optional - if their number of columns
is limited, I'd simply keep those columns in the Contributors table.
If you really need to be sure that no family will ever be entered in the
persons table and vice versa, you can use Joe Celko's trick:
CREATE TABLE Contributors (Contrib_ID int NOT NULL,
Type char(1) NOT NULL,
-- other columns
PRIMARY KEY (Contrib_ID),
CONSTRAINT Celko1 UNIQUE (Contrib_ID, Type),
CONSTRAINT Celko2 CHECK (Type IN ('F', 'P'))
)
CREATE TABLE Persons (Contrib_ID int NOT NULL,
Type char(1) NOT NULL,
-- other columns
PRIMARY KEY (Contrib_ID),
FOREIGN KEY (Contrib_ID, Type)
REFERENCES Contributors (Contrib_ID, Type)
ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT OnlyPersons CHECK (Type = 'P')
)
CREATE TABLE Families (Contrib_ID int NOT NULL,
Type char(1) NOT NULL,
-- other columns
PRIMARY KEY (Contrib_ID),
FOREIGN KEY (Contrib_ID, Type)
REFERENCES Contributors (Contrib_ID, Type)
ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT OnlyFamilies CHECK (Type = 'F')
)
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Hugo,
All of the questions and assumptions you pose are correct. The Batch knows
nothing about the church affiliation of a Person or Family. One Person or
Family can make multiple Contributions to the same Church or multiple
Churches in the same Batch.
The Families and Persons tables have no columns in common. Actually, the
Families table has just 3 columns (Name, Salutation and Anniversay) outside
of the primary key and 'DateCreated', et. al. All info regarding people in
a Family is in the Persons table or the FamilyPerson junction table.
Regarding the fabled 'AB' junction table, that actually represents 2
junction tables: BatchChurchPersonContribution and
BatchChurchFamilyContribution. I left that out before for the sake of
simplicity. I'm not too clear about the Donations table you suggest, but I
think that the issues surrounding that suggestion are addressed by the fact
that there are the 2 tables I reference above. A Family gives strictly as a
Family and the same is true for a Person. A Person in a Family can make a
Contribution to the same Church in the same Batch and the Family he belongs
to can do that as well and there is no knowledge on the part of the Batch as
to the Family/Person relationship.
I hesitate to bring this up but there is one serious redundancy issue that I
took on knowing that it is bad practice but I could not think of another
solution. There is a ChurchPersonContribution table that references the same
Church, Person and Contribution keys that are in the
BatchChurchPersonContribution table. Rows are created in both tables each
time the user saves. I did this because it is necessary to be able to
reference all of the Contributions of a Person through the Church. This way,
Contributions of a Person can be fetched from the DB without reference to
the Batch. The user can take a 'Church Centric' view of Contributions. The
intracacies of doing this are coded into the Data layer and it works good.
Deleting a Batch or a Contribution from a Batch works well and the
corresponding Contribution from the ChurchPersonContribution table works in
sync thanks to Cascade Delete. Is this a big Achilles Heel or is this kind
of thing regarded in the trade as acceptable but not desireable? Except for
help from Delete Cascade in the case of deletions, I am relying entirely on
Visual Basic code at the data layer to keep the DB uncorrupted.
Thanks for all of your help.
Steve|||On Thu, 10 Feb 2005 11:04:14 -0600, Steve Prescott wrote:

>All of the questions and assumptions you pose are correct. The Batch knows
>nothing about the church affiliation of a Person or Family. One Person or
>Family can make multiple Contributions to the same Church or multiple
>Churches in the same Batch.
Hi Steve,
Okay - thanks for confirming. I suggest you to re-design your database
along the lines I gave in my previous message.
I forgot to add the effects of that design on cascading deletes: if you
delete a batch, all contributions that belong to that batch will be
deleted by the cascading delete option, but the churches that received the
contributions and the contributors that paid them will be retained. Along
the same lines, if you delete a church all contributions to that church
will be deleted, but not the contributors paying them, nor the batches
containing those contributions. I'd say that this is a Good Thing, as
there's a good chance that these batches also hold contributions for other
churches, so you *should* not delete the complete batch.

>The Families and Persons tables have no columns in common. Actually, the
>Families table has just 3 columns (Name, Salutation and Anniversay) outside
>of the primary key and 'DateCreated', et. al. All info regarding people in
>a Family is in the Persons table or the FamilyPerson junction table.
In that case, I might still go for the contributors table (to keep the
foreign keys in the contributions table clear), or I might use two
mutually exclusive foreign keys in the contributions table (PersonID and
FamilyID, both as foreign key, both nullable and with a CHECK constraint
to ensure that exactly one is NULL in each row). You'll need to add a
surrogate primary key to the contributions table; the natural key
(BatchID, ChurchID, FamilyID and PersonID) has NULLable columns, it can
still be declared as a UNIQUE constraint, but not as PRIMARY KEY.

>Regarding the fabled 'AB' junction table, that actually represents 2
>junction tables: BatchChurchPersonContribution and
>BatchChurchFamilyContribution. I left that out before for the sake of
>simplicity. I'm not too clear about the Donations table you suggest, but I
>think that the issues surrounding that suggestion are addressed by the fact
>that there are the 2 tables I reference above.
The name Donations was ill chosen - I should have called it Contributions.
If you have a seperate table for contributions AND one (or two, that is
indeed not the core issue) junction tables BatchChurchPersonContribution
and BatchChurchFamilyContribution, then this is a seriuos flaw in your
design. Each contribution is part of one batch, goes to one church and is
paid by one family or person, so BatchID, ChurchID, PersonID and FamilyID
are functionally dependant on a contribution. They should be stored in the
table that holds all the information about individual contributions.
If you have a table Contributions and a junction table that relates a
contribution to a church, a family (or person) and a batch, then there is
nothing that would stop a user from entering than one single contribution
is part of three batches, paid by five families and four persons and in
favor of seventeen different churches.
And since this is allowed by your design, it makes sense that deleting a
batch would not remove the contributions in that batch, since they might
be part of another batch as well.
(snip)
>I hesitate to bring this up but there is one serious redundancy issue that
I
>took on knowing that it is bad practice but I could not think of another
>solution. There is a ChurchPersonContribution table that references the sam
e
>Church, Person and Contribution keys that are in the
>BatchChurchPersonContribution table.
There's nothing wrong with **CONTROLLED** redundancy -- unless it is
needless. In this case, it is needless. Based on my proposed design, the
ChurchPersonContribution table can be replaced by a view (and I'm quite
positive it can also be replaced by a view in your situation, it's just
that I still don't feel secorer about all details of your current
implementation).
CREATE VIEW ChurchPersonContribution
AS
SELECT ChurchID, PersonID, Amount, DatePaid, ...
FROM Contributions
go

> Except for
>help from Delete Cascade in the case of deletions, I am relying entirely on
>Visual Basic code at the data layer to keep the DB uncorrupted.
Ouch!!! That means that if you one day have to replace your front-end,
you'll have to redo all your integrity issues. That also means you could
easily mess up if you're using straight SQL queries to do some maintenance
work or trouble-shooting.
ALWAYS use constraints to guard against data corruption. If the business
rules are too complex for constraints, use triggers. Triggers and
constraints are the only methods that can't easily be circumvented.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)

No comments:

Post a Comment