Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark redis connector to write data into specific index of the redis

I'm trying to read the data from Cassandra and write to Redis of a specific index. let's say Redis DB 5.

I need to write all data into Redis DB index 5 in the hashmap format.

 val spark = SparkSession.builder()
  .appName("redis-df")
  .master("local[*]")
  .config("spark.redis.host", "localhost")
  .config("spark.redis.port", "6379")
  .config("spark.redis.db", 5)
  .config("spark.cassandra.connection.host", "localhost")
  .getOrCreate()

  import spark.implicits._
    val someDF = Seq(
      (8, "bat"),
      (64, "mouse"),
      (-27, "horse")
    ).toDF("number", "word")

    someDF.write
      .format("org.apache.spark.sql.redis")
      .option("keys.pattern", "*")
      //.option("table", "person"). // Is it mandatory ?
      .save()

Can I save data into Redis without a table name? Actually just I want to save all data into Redis index 5 without table name is it possible? I have gone through the documentation of spark Redis connector I don't see any example related to this. Doc link : https://github.com/RedisLabs/spark-redis/blob/master/doc/dataframe.md#writing

I'm currently using this version of spark redis-connector

    <dependency>
        <groupId>com.redislabs</groupId>
        <artifactId>spark-redis_2.11</artifactId>
        <version>2.5.0</version>
    </dependency>

Did anyone face this issue? any workaround?

The error I get if I do not mention the table name in the config

FAILED

  java.lang.IllegalArgumentException: Option 'table' is not set.
  at org.apache.spark.sql.redis.RedisSourceRelation$$anonfun$tableName$1.apply(RedisSourceRelation.scala:208)
  at org.apache.spark.sql.redis.RedisSourceRelation$$anonfun$tableName$1.apply(RedisSourceRelation.scala:208)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.redis.RedisSourceRelation.tableName(RedisSourceRelation.scala:208)
  at org.apache.spark.sql.redis.RedisSourceRelation.saveSchema(RedisSourceRelation.scala:245)
  at org.apache.spark.sql.redis.RedisSourceRelation.insert(RedisSourceRelation.scala:121)
  at org.apache.spark.sql.redis.DefaultSource.createRelation(DefaultSource.scala:30)
  at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
like image 468
Tulasi Avatar asked Oct 24 '25 07:10

Tulasi


1 Answers

The table option is mandatory. The idea is that you specify the table name, so it is possible to read the dataframe back from Redis providing that table name. In your case another option is to convert the dataframe to the key/value RDD and use sc.toRedisKV(rdd)

like image 68
fe2s Avatar answered Oct 26 '25 00:10

fe2s



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!