Spark redis connector to write data into specific index of the redis

Question

I'm trying to read the data from Cassandra and write to Redis of a specific index. let's say Redis DB 5.

I need to write all data into Redis DB index 5 in the hashmap format.

 val spark = SparkSession.builder()
  .appName("redis-df")
  .master("local[*]")
  .config("spark.redis.host", "localhost")
  .config("spark.redis.port", "6379")
  .config("spark.redis.db", 5)
  .config("spark.cassandra.connection.host", "localhost")
  .getOrCreate()

  import spark.implicits._
    val someDF = Seq(
      (8, "bat"),
      (64, "mouse"),
      (-27, "horse")
    ).toDF("number", "word")

    someDF.write
      .format("org.apache.spark.sql.redis")
      .option("keys.pattern", "*")
      //.option("table", "person"). // Is it mandatory ?
      .save()

Can I save data into Redis without a table name? Actually just I want to save all data into Redis index 5 without table name is it possible? I have gone through the documentation of spark Redis connector I don't see any example related to this. Doc link : https://github.com/RedisLabs/spark-redis/blob/master/doc/dataframe.md#writing

I'm currently using this version of spark redis-connector

    <dependency>
        <groupId>com.redislabs</groupId>
        <artifactId>spark-redis_2.11</artifactId>
        <version>2.5.0</version>
    </dependency>

Did anyone face this issue? any workaround?

The error I get if I do not mention the table name in the config

FAILED

  java.lang.IllegalArgumentException: Option 'table' is not set.
  at org.apache.spark.sql.redis.RedisSourceRelation$$anonfun$tableName$1.apply(RedisSourceRelation.scala:208)
  at org.apache.spark.sql.redis.RedisSourceRelation$$anonfun$tableName$1.apply(RedisSourceRelation.scala:208)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.redis.RedisSourceRelation.tableName(RedisSourceRelation.scala:208)
  at org.apache.spark.sql.redis.RedisSourceRelation.saveSchema(RedisSourceRelation.scala:245)
  at org.apache.spark.sql.redis.RedisSourceRelation.insert(RedisSourceRelation.scala:121)
  at org.apache.spark.sql.redis.DefaultSource.createRelation(DefaultSource.scala:30)
  at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)

fe2s · Accepted Answer

The table option is mandatory. The idea is that you specify the table name, so it is possible to read the dataframe back from Redis providing that table name. In your case another option is to convert the dataframe to the key/value RDD and use sc.toRedisKV(rdd)

Spark redis connector to write data into specific index of the redis

Tags:

dataframe

redis

scala

apache-spark

pyspark

Tulasi

1 Answers

fe2s

Recent Activity

Donate For Us

Spark redis connector to write data into specific index of the redis

Tags:

dataframe

redis

scala

apache-spark

pyspark

Tulasi

1 Answers

fe2s

Related questions

Recent Activity

Donate For Us