Scenario:
Question:
Is anyone familiar with a way to encrypt/obfuscate a column of data purely within a Presto SQL query that can be decrypted/de-obfuscated in a deterministic way later on?
I know that I could easily post-process the results of the query and encrypt/obfuscate it myself, but I would like to leverage the distributed execution model of Presto if at all possible.
The level of encryption/obfuscation does not need to be impenetrable— just something a bit more complex than base64 encoding it (and preferably with a simple secret).
After a bit more research, I stumbled across the XOR cipher which seems possible to implement entirely in a Presto DB query.
I've been able to test it briefly with the following proof-of-concept:
WITH
private_data AS (
SELECT 'some private string' as private
),
encrypted_data AS (
SELECT
zip_with(
regexp_extract_all(private, '.'),
regexp_extract_all(substr('a27e6f329c03461688d6866203aasdljfasaslksa7982k3lkjsd987fok2jlkj0sdf9c59c', 1, length(private)), '.'),
(x, y) ->
bitwise_xor(codepoint(cast(x as varchar(1))), codepoint(cast(y as varchar(1))))
) as encrypted_data
FROM private_data
),
decrypted_data AS (
SELECT
array_join(
zip_with(
encrypted_data,
regexp_extract_all(substr('a27e6f329c03461688d6866203aasdljfasaslksa7982k3lkjsd987fok2jlkj0sdf9c59c', 1, cardinality(encrypted_data)), '.'),
(x, y) ->
chr(bitwise_xor(x, codepoint(cast(y as varchar(1)))))
),
''
) as decrypted_string
FROM encrypted_data
)
SELECT
*
FROM private_data, encrypted_data, decrypted_data
It seems to work, although I would love to simplify it more. Can anyone see ways to optimize it? (For example: casting from a varchar with a length of one to a varchar(1) seems ridiculous, but it complains if I don't. Also regexp_extract_all is the only way I can find to convert a varchar into an array.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With