I'm currently working on a project where we consider switching to Redis as a database. The nature of our data is extremely simple and seems suitable for Redis. Without having experience on Redis, I did a really small benchmark test to compare it to PostgreSQL in terms of insert-performance (which is important for us).
I created an .sql file with 200000 INSERT statements into a simple schema (address [key], timestamp, value). The insert took about 6 seconds.
For Redis, each of the 200000 records is inserted by:
HSET data:address timestamp <VALUE>
HSET data:address value <VALUE>
Dumping everything into Redis with time redis-cli < insert_data.redis takes 16 seconds.
I realize that this 'benchmark' is very basic, but am I missing something on my side that lets PostgreSQL come out on top? I can't really imagine that Redis is actually slower on inserts.
This result is logical. To understand the results of a benchmark, it is important to understand the operations triggered on the system.
Both the Redis and PostgreSQL clients work synchronously with their respective servers. For each statement, they send a query and wait for the reply before processing the next statement.
On such volume, a lot of things happen in memory (even with PostgreSQL). Furthermore, you have no concurrency here. So the cost of the operations is not dominated by I/Os or indexing, but by roundtrips exchanged between the client and the server.
Now, how many roundtrips do each test generate?
With PostgreSQL, you have one statement per record, resulting in 200000 roundtrips. With Redis, you have two statements per record, resulting in 400000 roundtrips. Furthermore, the Redis roundtrips systematically include the keywords of your schema (data, timestamp, value) and the address is sent twice per record. So a lot more data are exchanged by the Redis test.
You may also have differences in the way the input file is parsed by the client software.
To improve a bit your result with redis-cli, you could use command HMSET to send only one statement per record.
HSET data:address timestamp <VALUE>
HSET data:address value <VALUE>
becomes:
HMSET data:address timestamp <VALUE> value <VALUE>
But the real gain here would be to use pipelining. Unfortunately, you cannot use it from redis-cli, except by relying on the --pipe option. For this option, you have to generate the actual Redis protocol instead of textual commands. That's why your test with "cat data.txt | redis-cli --pipe" cannot work. Generating Redis protocol from simple shell commands is not convenient.
For such benchmark, I would highly recommend to use your own client program rather than redis-cli. Even something written in Python, Ruby, or Javascript will result in interesting performance provided pipelining is used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With