Is there a relational algebra equivalent of the SQL expression NOT IN?
For example if I have the relation:
A1  |  A2
----------
x   |  y
a   |  b
y   |  x
I want to remove all tuples in the relation for which A1 is in A2. In SQL I might query:
SELECT
    *
FROM
    R
WHERE
    R.A1 NOT IN
        (
        SELECT
            A2
        FROM
            R
        )
/
What is really stumping me is how to subquery inside the relational algebra selection operator, is this possible?:
σsome subquery hereR
Relational Algebra is procedural query language, which takes Relation as input and generate relation as output. Relational algebra mainly provides theoretical foundation for relational databases and SQL. Projection is used to project required column data from a relation.
Relational algebra includes six comparison operators (=, <>, <, >, <=, >=). These are proposition-forming operators on terms. For example, x <> 0 asserts that x is not equal to zero. It also includes three logical operators (and, or, not).
The query "SELECT * FROM R, S WHERE R.B = S.B;" is equivalent to "σR.B = S.B(R X S)".
The natural join is arguably one of the most important operators since it is the relational counterpart of the logical AND operator.
In relational algebra, you can do this using a carthesian product. Something like:
R - ρa1,a2(πa11,a21(σA11 = A22(ρa11,a21(R) x ρa12, a22(R))))
That gives you the rows that were matched. Subtract this from R to find the rows that where not matched.
The opening question is sending us down the wrong thinking. It should be:
Is there a relational algebra equivalent of the SQL expression R WHERE ... [NOT] IN S?
(That is, the answer is some operation between two relations, not some sort of filter.)
The answer is Yes, it is (Natural) JOIN aka the bowtie operator ⋈.
To see why, let's first tidy up the SQL solution given. As shown, it's looking for attribute A1 NOT IN a relation with single attribute A2. That's really a mis-match in attribute names. SQL also allows NOT inside the where condition. This SQL makes the logical structure clearer:
SELECT * FROM R
WHERE NOT (A1 IN (SELECT A2 AS A1 FROM R) )
Now we can see a projection and a rename. (The surrounding NOT we can implement as set MINUS, as per the first answer.) So the equivalent RA is:
R - (R ⋈ ρA1⁄A2(πA2(R)))
For interest, the Tutorial D is:
R MINUS (R JOIN (R {A2} RENAME A2 AS A1))
In the way the question is put, there's a hangover from SQL thinking. SQL's WHERE forces you into row-level 'mode'. This is contra Codd's rule 7 requiring set-at-a-time operators.
In general, SQL's WHERE and RA's σ with their row-level filters can be more succinctly implemented as (Natural) JOIN with set-at-a-time logic. (For example, this is what Date & Darwen do in their A algebra.)
A direct answer to a more general question:
SELECT
    *
FROM
    R
WHERE
    R.A1 NOT IN
        (
        SELECT
            A2
        FROM
            S
        );
The answer is:
R-R "bowtie" [R.A1=S.A2] ("pi" [A2] S )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With