Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ideas to encrypt/obfuscate results in a Presto query?

Scenario:

  1. I have a Presto table that I will be querying and sending results to various semi-trusted parties.
  2. These semi-trusted parties will analyze the data and return results back to me.
  3. Some of the data in this table is "semi-private"— nothing that could cause real harm if it were discovered, but private nonetheless (like the name of a device).
  4. The signature of this "semi-private" data will be important for GROUP BY clauses, but the actual data itself is not important to the semi-trusted parties.
  5. When this analyzed data is returned back to me, I'll need to be able to decrypt/de-obfuscate this "semi-private" data in order to act on it.

Question:

Is anyone familiar with a way to encrypt/obfuscate a column of data purely within a Presto SQL query that can be decrypted/de-obfuscated in a deterministic way later on?

I know that I could easily post-process the results of the query and encrypt/obfuscate it myself, but I would like to leverage the distributed execution model of Presto if at all possible.

The level of encryption/obfuscation does not need to be impenetrable— just something a bit more complex than base64 encoding it (and preferably with a simple secret).

like image 739
Jim Heising Avatar asked Jan 26 '26 12:01

Jim Heising


1 Answers

After a bit more research, I stumbled across the XOR cipher which seems possible to implement entirely in a Presto DB query.

I've been able to test it briefly with the following proof-of-concept:

WITH

private_data AS (
  SELECT 'some private string' as private
),

encrypted_data AS (
  SELECT
  zip_with(
    regexp_extract_all(private, '.'),
    regexp_extract_all(substr('a27e6f329c03461688d6866203aasdljfasaslksa7982k3lkjsd987fok2jlkj0sdf9c59c', 1, length(private)), '.'),
    (x, y) -> 
      bitwise_xor(codepoint(cast(x as varchar(1))), codepoint(cast(y as varchar(1))))
  ) as encrypted_data
  FROM private_data
),

decrypted_data AS (
  SELECT
  array_join(
    zip_with(
      encrypted_data,
      regexp_extract_all(substr('a27e6f329c03461688d6866203aasdljfasaslksa7982k3lkjsd987fok2jlkj0sdf9c59c', 1, cardinality(encrypted_data)), '.'),
      (x, y) -> 
        chr(bitwise_xor(x, codepoint(cast(y as varchar(1)))))
    ),
    ''
  ) as decrypted_string
  FROM encrypted_data
)

SELECT
*
FROM private_data, encrypted_data, decrypted_data

It seems to work, although I would love to simplify it more. Can anyone see ways to optimize it? (For example: casting from a varchar with a length of one to a varchar(1) seems ridiculous, but it complains if I don't. Also regexp_extract_all is the only way I can find to convert a varchar into an array.)

like image 118
Jim Heising Avatar answered Jan 29 '26 02:01

Jim Heising



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!