In the original exchange (
http://www.dbdebunk.com/page/page/1396086.htm
), Date says:
"the operation DELETE SSP WHERE S# = S#('S1') AND P# = P#('P1') ;
is inherently unsafe, since we presumably don't know, in general,
whether there are any other shipments for supplier S1."
Any time Date says something, I think those of us laymen must pay
attention, so let me try and reconcile his comment with my own
conclusion. I don't argue with the adverb "inherently" because I came
to the same conclusion as far as how I expect most people probably
interpret this DELETE statement is concerned. I think I came to this
conclusion from a different angle, though, because in Parts One and Two,
I tried to be careful to keep any notion of language notions such as
assignment or of base or virtual relvars out of the 'algebraic' argument
and talk only of projections of results. (One might say I've implicitly
included virtual relations in that a projection is virtual as far as
tuples are concerned, but my argument is based more on triples than
tuples and a projection on some chosen set of attributes loses no
triples that those attributes have values for.)
Then he says: "But I never claimed--at least, I don't think I ever
claimed!--that the rule for deleting from a join was based purely on
logical inference."
In Parts One and Two I tried to argue that if D (the relation implied by
the WHERE clause) has ANY triple where S# = 'S1', the A-algebra and the
Tutorial D MINUS definition imply that R'{S#} will have no such triple,
similarly for R'{P#} where P# = 'P1'. From the equations in Parts One
and Two, I think we can also see that a D triple where P# = 'P1', would
cause an R'{S#} triple where S# = 'S2' to be 'lost' in the result as
well as any R'{P#} triple where P# = 'P2' and therefore lost from the
join R' = A <AND> B, and similarly from SSP = S <AND> SP. I think this
has nothing to do with any design principle or any implementation rule
that says to 'delete from both sides'. If what I've argued so far makes
sense, I think the 'rule' "delete from both sides" is purely a logical
inference. It is a direct consequence of the MINUS definition applied
with the A-algebra. (I know that it can be dangerous to disagree with
Date, but that's where my approach leads me, apologies to him if I've
made a mis-step.)
Whether people see this consequence as "unsafe" is psychological as far
as I'm concerned. In other words, my conclusion is that there is an
algebraic interpretation of this DELETE statement that leads logically
to results that most people probably don't expect. In this case, the
language keyword "AND" is present and yet the triples that are omitted
from the result involve a notion of "OR".
To my mind, the issue here is really a language issue, not a model
issue. I think it all depends on the definition we choose for MINUS and
when we choose a definition we need to consider what will result when
the algebraic operands in the definition don't have the same headings.
Let me try a different definition that tries to reflect the intention of
deleting only those tuples that have both triples standing for S# = 'S1'
and P# = 'P1'. To keep it as simple as possible, I'll consider only the
situation where the D header is a subset of the R (view) header. What I
think we want in D is a header that is equal to R's header and includes
only the tuples that have both of these triples.
If any relation 'r' is given an additional attribute 'a' that has all
possible values from the domain or type of 'a', we can say that every
tuple in 'r' is implied by the projection r+{Hr} of r{Hr,a}. If we
imagine some relation 'X' that has the 'a' attribute, we can express all
possible 'a' values algebraically as X <OR> <NOT> X. Let me apply this
to D by augmenting it with all possible triples containing the HR
attributes that aren't in D and call it D+:
D+ = ((R <OR> <NOT> R) <AND> D)
The complement of D+ will have tuples that have both triples standing
for, say, S# = 'S2' and P# = 'P1', or, say, for both of S# = 'S1' and P#
= 'P2' and for both of S# = 'S2' and P# = 'P2' and so on. If such pairs
of triples are in tuples of R, they will remain in the result R', so my
definition might be:
R MINUS D is semantically equivalent to the A expression R <AND> <NOT>
D+, ie., R <AND> <NOT> ((R <OR> <NOT> R) <AND> D)
Putting that condition in the form of an equation:
R' = R <AND> <NOT> D+
= R <AND> <NOT> ((R <OR> <NOT> R) <AND> D)
Now let me substitute A <AND> B for R:
R' = A <AND> B <AND> <NOT> (( (A <AND> B) <OR> <NOT> (A <AND> B) )
<AND> D )
The parentheses here are important, but otherwise I'm still assuming the
precedence and associativity I assumed in Part One. Still we can
distribute the long expression on the right that stands for the
complement of D+ over A and B:
R' = (A <AND> <NOT> (( (A <AND> B) <OR> <NOT> (A <AND> B) ) <AND> D ))
<AND>
(B <AND> <NOT> (( (A <AND> B) <OR> <NOT> (A <AND> B) ) <AND> D ))
I believe my previous arguments allow me to take the expressions in the
first and third lines above and add appropriate projections, eg;
R' = (A <AND> <NOT> (( (A <AND> B) <OR> <NOT> (A <AND> B) ) <AND> D ))
{HA}
<AND>
(B <AND> <NOT> (( (A <AND> B) <OR> <NOT> (A <AND> B) ) <AND> D
)) {HA}
so that the first line can be re-written as
A'{HA} = (A <AND> <NOT> (( (A <AND> B) <OR> <NOT> (A <AND> B) ) <AND>
D )) {HA}
= (A <AND> <NOT> D+) {HA}
and similarly for the third line.
This equation is hard to test manually, but I can try a couple of simple
examples.
Assume attributes named a,b,c,d having the domain {1,2} and relations A
and B with these values:
A:
a c
1 1
1 2
2 1
B:
a b d
1 1 2
1 2 1
A <AND> B: /* ie. the value of R */
a b c d
1 1 1 1
1 1 1 2
1 2 2 1
D:
a b
1 1
(A <AND> B) <OR> <NOT> (A <AND> B)
a b c d
1 1 1 1
1 1 1 2
1 1 2 1
1 1 2 2
1 2 1 1
1 2 1 2
1 2 2 1
1 2 2 2
2 1 1 1
2 1 1 2
2 1 2 1
2 1 2 2
2 2 1 1
2 2 1 2
2 2 2 1
2 2 2 2
D+:
a b c d
1 1 1 1
1 1 1 2
1 1 2 1
1 1 2 2
<NOT> D+:
a b c d
1 2 1 1
1 2 1 2
1 2 2 1
1 2 2 2
2 1 1 1
2 1 1 2
2 1 2 1
2 1 2 2
2 2 1 1
2 2 1 2
2 2 2 1
2 2 2 2
R':
a b c d
1 2 2 1
Note that only the tuples where a = 1 AND b = 1 have been 'deleted', not
the one where a = 1 OR b = 1, this gives a similar view result to the TD
example DELETE statement's result.
R'{HA} = (A <AND> <NOT> D+) {HA}
a c
1 2
2 1
Note that NO tuples in A have been 'dropped', which is not similar to
the example DELETE statement's result. I think this might make sense to
most people on the casual grounds that A never had any tuple with both
triples standing for a = 1 AND b = 1.
R'{HB} = (B <AND> <NOT> D+) {HB}
a b d
1 2 1
Note that only one tuple in B has been dropped, the one that has the
same two triples that D has.
If I were to use this different definition of MINUS, if only for the
case where HD is a proper subset of HA union HB, I'd want some kind of
formal exposition that allowed me to predict all possible results,
including ones for empty heading. As for cases where HD overlaps with
HA and HB in other ways or doesn't overlap at all, that would be even
more desireable. Not to mention an implementation that allows easier
validation of expectations compared to results.
In the meantime, let me show one other simpler example involving a
different sort of heading overlap, the degenerate Cartesian Product,
which I think might be important as far as the Principle of
Interchangeability ('POI') is concerned. Suppose relation X has
attribute x with domain (1,2} and Y has attribute x with domain {1,2}, eg.,
X:
x
1
2
Y:
y
1
R = X <AND> Y (Cartesian Product):
x y
1 1
2 1
D:
x y
1 1
D+ = (R <OR> <NOT> R) <AND> D:
x y
1 1
<NOT> D+:
x y
1 2
2 1
2 2
R'{HR} = R{HR} <AND> <NOT> D+:
x y
2 1
In the result for R', the tuple that has both of the D triples, ie. x =
1 and y = 1, has been dropped, which seems compatible with the POI.
X'{HX} = (X <AND> <NOT> D+) {HX}:
x
1
2
But the X' result has dropped no tuples.
Y'{HY} = (Y <AND> <NOT> D+) {HY}:
y
1
If we produce R' from X' <AND> Y', R' = R!
Neither has the Y' result dropped any tuples and we have a situation
where R = X <AND> Y, but R' is NOT equal to X' <AND> Y', in other words
DELETE R WHERE X = 1 AND Y = 1 does nothing! I think the casual
explanation for this result is sort of patent - neither X nor Y can ever
have a tuple with two triples. Somewhere Date mentions to the effect
that the POI means there should be "no arbitrary or unnecessary"
differences between base and virtual relvars. From a purely (I hope)
algebraic standpoint, it looks to me that the different MINUS definition
must imply a difference. Personally, I'd say it is a necessary
difference, not an arbitrary one.
This is why I think McGoveran might be right if he is saying that there
are times when a user must be aware of virtual relvars in order to
appreciate dbms results.