Context and objective
I am trying to psuedo-anonymize some product numbers in my data table. See the sample code below. The product numbers are 10 numerics and may or may not be unique for the table.
Since there are potentially other tables that I would like to link to, I would like to use a non-random way of pseudo-anonymizing the data.
System is SQLite 3.10.1. However, any type of DBMS with SQL will be fine.
My constraints are to:
Actions I have taken
I was literally going to go through every possible digit and update it as below. However, this feels like a very inefficient way to do it.
UPDATE test
SET pseudo_num = replace(pseudo_num, '0', 'B');
UPDATE test
SET pseudo_num = replace(pseudo_num, '1', 'T');
UPDATE test
SET pseudo_num = replace(pseudo_num, '2', 'A');
UPDATE test
SET pseudo_num = replace(pseudo_num, '3', 'A');
UPDATE test
SET pseudo_num = replace(pseudo_num, '4', 'D');
UPDATE test
SET pseudo_num = replace(pseudo_num, '5', '3');
UPDATE test
SET pseudo_num = replace(pseudo_num, '6', '2');
UPDATE test
SET pseudo_num = replace(pseudo_num, '7', '4');
UPDATE test
SET pseudo_num = replace(pseudo_num, '8', 'X');
UPDATE test
SET pseudo_num = replace(pseudo_num, '9', 'L');
Questions
Sample code to create data table
CREATE TABLE test (
prod_num varchar(14),
owner varchar(255) default NULL,
prod_date varchar(255)
);
INSERT INTO test (prod_num,owner,prod_date) VALUES ("260619275","Kieran","Feb 10, 2018"),("316556232","Steven","Jan 6, 2020"),("625302534","Oliver","Feb 10, 2018"),("811424845","Jeremy","Apr 12, 2018"),("060961216","Quinlan","Jul 19, 2019"),("713794360","Stuart","Nov 1, 2019"),("553381666","George","Jan 8, 2019"),("978519361","Macon","Nov 26, 2018"),("352718969","Raphael","Jul 21, 2019"),("803299478","Byron","Nov 26, 2019");
INSERT INTO test (prod_num,owner,prod_date) VALUES ("696124452","Dalton","Jul 17, 2018"),("892088485","Keane","Jul 9, 2018"),("817054190","Dillon","Apr 23, 2018"),("500170097","Fitzgerald","Feb 11, 2019"),("663252252","Thomas","Apr 10, 2018"),("061983557","Alan","May 12, 2018"),("492057435","Jarrod","Apr 16, 2018"),("837802495","Shad","Mar 22, 2019"),("725698187","Mark","Jul 22, 2018"),("153352349","Akeem","Feb 19, 2018");
ALTER TABLE test
ADD pseudo_num NVARCHAR(20);
UPDATE test
SET pseudo_num = prod_num;
You can use a hash (or encryption) function to convert the product numbers to strings with chars and numbers in same length. The same product numbers also get the same hash / value:
Example on TSQL:
-- preview (old and new prod_num)
SELECT prod_num, RIGHT(CONVERT(VARCHAR(32), HASHBYTES('SHA1', prod_num), 2), LEN(prod_num))
FROM test;
-- the UPDATE
UPDATE test SET pseudo_num = RIGHT(CONVERT(VARCHAR(32), HASHBYTES('SHA1', prod_num), 2), LEN(prod_num));
demo on dbfiddle.uk
Example on MySQL:
-- preview (old and new prod_num)
SELECT prod_num, UPPER(RIGHT(MD5(prod_num), LENGTH(prod_num)))
FROM test;
-- the UPDATE
UPDATE test SET pseudo_num = UPPER(RIGHT(MD5(prod_num), LENGTH(prod_num)));
demo on dbfiddle.uk
Example on Oracle:
-- preview (old and new prod_num)
SELECT prod_num, SUBSTR(STANDARD_HASH(prod_num, 'MD5'), LENGTH(prod_num) * -1) pseudo_prod_num
FROM test;
-- the UPDATE
UPDATE test SET pseudo_num = SUBSTR(STANDARD_HASH(prod_num, 'MD5'), LENGTH(prod_num) * -1);
demo on dbfiddle.uk
Example PostgreSQL:
-- preview (old and new prod_num)
SELECT prod_num, UPPER(RIGHT(MD5(prod_num), LENGTH(prod_num)))
FROM test;
-- the UPDATE
UPDATE test SET pseudo_num = UPPER(RIGHT(MD5(prod_num), LENGTH(prod_num)));
demo on dbfiddle.uk
You may try using a join here to do the replacements. If you don't have a formal table which contains the mapping from the old to new pseduo_num, then we can try using a CTE.
WITH map AS (
SELECT '0' AS pseudo_num, 'B' AS output UNION ALL
SELECT '1', 'T' UNION ALL
SELECT '2', 'A' UNION ALL
SELECT '3', 'A' UNION ALL
SELECT '4', 'D' UNION ALL
SELECT '5', '3' UNION ALL
SELECT '6', '2' UNION ALL
SELECT '7', '4' UNION ALL
SELECT '8', 'X' UNION ALL
SELECT '9', 'L'
),
cte AS (
SELECT t.pseudo_num, m.output
FROM test t
INNER JOIN map m
ON t.pseudo_num = m.psuedo_num
)
UPDATE cte
SET pseudo_num = output;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With