Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use content of binary as string in DataFrame in pyspark

I created a DataFrame from an existing table where two columns are stored as binary. For the further processing I need the binary-content as string.

Example:

+--------------------+--------------------+
|              DB_KEY|          PARENT_KEY|
+--------------------+--------------------+
|[00 50 56 88 0A]    |[00 50 56 88 12]    |

Schema:
root
 |-- DB_KEY: binary (nullable = true)
 |-- PARENT_KEY: binary (nullable = true)

Content of binary should be used as string like:

DB_KEY = "005056880A"
PARENT_KEY = "0050568812"

Can you please give me any hints how to do this?

like image 709
Marcus Avatar asked Oct 31 '25 06:10

Marcus


1 Answers

Have you tried to hex() your binary fields?

scala> val df = spark.sql("select unhex('005056880A') as db_key")
df: org.apache.spark.sql.DataFrame = [db_key: binary]

scala> df.withColumn("db_key_string", hex($"db_key")).show(false)
+----------------+-------------+
|db_key          |db_key_string|
+----------------+-------------+
|[00 50 56 88 0A]|005056880A   |
+----------------+-------------+
like image 58
mazaneicha Avatar answered Nov 02 '25 12:11

mazaneicha