I am looking for a query to help with a cleanup of an old database that doesn't have many relationships where it should. I don't need it to be perfect, just to help guide me in beginning cleanup and starting to enforce data integrity.
I am assuming that all tables have the proper primary key and that a column in a not-yet-related table has the same name.
Theoretically speaking I could have three tables, one which has a composite key (I wouldn't choose to design the db like this, but am limited in the cleanup of it and these types of composite/primary/foreign keys are common):
Case.CaseId (PK)
Workstep.WorkstepId (PK)
Workstep.CaseId (PK,FK)
WorkQueue.CaseId (isn't related TO Case.CaseId but should be)
What I would like to be able to do is run a query and come up with results that give me something like table name, column name, and foreign key of the table that isn't related but should be, e.g.:
TABLE NAME, COLUMN NAME, SHOULD BE RELATED TO PRIMARY KEY
WorkQueue, CaseId, Case.CaseId
See the SQL I am using below but it is returning any primary key, even ones that are both a primary key but also are part of a foreign key. Using my example again and the SQL below, instead of returning 1 row I get 2:
TABLE NAME, COLUMN NAME, SHOULD BE RELATED TO PRIMARY KEY
WorkQueue, CaseId, Workstep.CaseId (I don't want this row since it is also related to the 'real' primary key, Case.CaseId)
WorkQueue, CaseId, Case.CaseId
SELECT
SubqueryAllPotentialForeignKeys.TABLE_NAME
,SubqueryAllPotentialForeignKeys.COLUMN_NAME
,(PrimaryKeys.TABLE_NAME + '.' + PrimaryKeys.COLUMN_NAME) as 'Possible Primary Key'
--all potential foreign keys (column name matches another column name but there is no reference from this column anywhere else)
FROM
(
SELECT
INFORMATION_SCHEMA.COLUMNS.TABLE_NAME
,INFORMATION_SCHEMA.COLUMNS.COLUMN_NAME
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
--only get columns that are in multiple tables
INFORMATION_SCHEMA.COLUMNS.COLUMN_NAME IN
(
SELECT COLUMN_NAME FROM
(SELECT COLUMN_NAME, COUNT(COLUMN_NAME) AS ColNameCount FROM INFORMATION_SCHEMA.COLUMNS GROUP BY COLUMN_NAME) AS SubQueryColumns
WHERE ColNameCount > 1
)
--only get the table.column if not part of a foreign or primary key
EXCEPT
(
SELECT TABLE_NAME, COLUMN_NAME FROM INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE
)
) AS SubqueryAllPotentialForeignKeys
LEFT JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE AS PrimaryKeys ON
SubqueryAllPotentialForeignKeys.COLUMN_NAME = PrimaryKeys.COLUMN_NAME
--when finding possible keys for our columns that don't have references, limit to primary keys
WHERE
PrimaryKeys.CONSTRAINT_NAME LIKE '%PK_%'
ORDER BY TABLE_NAME, COLUMN_NAME
This may not be the most beautiful thing in the world but works pretty darn well:
SELECT * FROM
(
SELECT
SubqueryAllPotentialForeignKeys.TABLE_NAME
,SubqueryAllPotentialForeignKeys.COLUMN_NAME
,(PrimaryKeys.TABLE_NAME + '.' + PrimaryKeys.COLUMN_NAME) as 'Possible Primary Key'
--all potential foreign keys (column name matches another column name but there is no reference from this column anywhere else)
FROM
(
SELECT
INFORMATION_SCHEMA.COLUMNS.TABLE_NAME
,INFORMATION_SCHEMA.COLUMNS.COLUMN_NAME
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
--only get columns that are in multiple tables
INFORMATION_SCHEMA.COLUMNS.COLUMN_NAME IN
(
SELECT COLUMN_NAME FROM
(SELECT COLUMN_NAME, COUNT(COLUMN_NAME) AS ColNameCount FROM INFORMATION_SCHEMA.COLUMNS GROUP BY COLUMN_NAME) AS SubQueryColumns
WHERE ColNameCount > 1
)
--only get the table.column if not part of a foreign or primary key
EXCEPT
(
SELECT TABLE_NAME, COLUMN_NAME FROM INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE
)
) AS SubqueryAllPotentialForeignKeys
LEFT JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE AS PrimaryKeys ON
SubqueryAllPotentialForeignKeys.COLUMN_NAME = PrimaryKeys.COLUMN_NAME
--when finding possible keys for our columns that don't have references, limit to primary keys
WHERE
PrimaryKeys.CONSTRAINT_NAME LIKE '%PK_%'
) AS Subquery
--exclude all keys that are primary but also foreign
WHERE [Possible Primary Key] NOT IN
(
SELECT (TABLE_NAME + '.' + COLUMN_NAME)
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE CONSTRAINT_NAME LIKE 'FK_%'
)
ORDER BY TABLE_NAME, COLUMN_NAME
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With