Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improve INSERT-per-second performance of SQLite

People also ask

How can I improve my insert performance?

To optimize insert speed, combine many small operations into a single large operation. Ideally, you make a single connection, send the data for many new rows at once, and delay all index updates and consistency checking until the very end.

How many writes per second SQLite?

Actually, SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second. Transaction speed is limited by the rotational speed of your disk drive.

Does SQLite support bulk insert?

SQLite doesn't have any special way to bulk insert data. To get optimal performance when inserting or updating data, ensure that you do the following: Use a transaction. Reuse the same parameterized command.

Why is SQLite so slow?

The SQLite docs explains why this is so slow: Transaction speed is limited by disk drive speed because (by default) SQLite actually waits until the data really is safely stored on the disk surface before the transaction is complete. That way, if you suddenly lose power or if your OS crashes, your data is still safe.


Several tips:

  1. Put inserts/updates in a transaction.
  2. For older versions of SQLite - Consider a less paranoid journal mode (pragma journal_mode). There is NORMAL, and then there is OFF, which can significantly increase insert speed if you're not too worried about the database possibly getting corrupted if the OS crashes. If your application crashes the data should be fine. Note that in newer versions, the OFF/MEMORY settings are not safe for application level crashes.
  3. Playing with page sizes makes a difference as well (PRAGMA page_size). Having larger page sizes can make reads and writes go a bit faster as larger pages are held in memory. Note that more memory will be used for your database.
  4. If you have indices, consider calling CREATE INDEX after doing all your inserts. This is significantly faster than creating the index and then doing your inserts.
  5. You have to be quite careful if you have concurrent access to SQLite, as the whole database is locked when writes are done, and although multiple readers are possible, writes will be locked out. This has been improved somewhat with the addition of a WAL in newer SQLite versions.
  6. Take advantage of saving space...smaller databases go faster. For instance, if you have key value pairs, try making the key an INTEGER PRIMARY KEY if possible, which will replace the implied unique row number column in the table.
  7. If you are using multiple threads, you can try using the shared page cache, which will allow loaded pages to be shared between threads, which can avoid expensive I/O calls.
  8. Don't use !feof(file)!

I've also asked similar questions here and here.


Try using SQLITE_STATIC instead of SQLITE_TRANSIENT for those inserts.

SQLITE_TRANSIENT will cause SQLite to copy the string data before returning.

SQLITE_STATIC tells it that the memory address you gave it will be valid until the query has been performed (which in this loop is always the case). This will save you several allocate, copy and deallocate operations per loop. Possibly a large improvement.


Avoid sqlite3_clear_bindings(stmt).

The code in the test sets the bindings every time through which should be enough.

The C API intro from the SQLite docs says:

Prior to calling sqlite3_step() for the first time or immediately after sqlite3_reset(), the application can invoke the sqlite3_bind() interfaces to attach values to the parameters. Each call to sqlite3_bind() overrides prior bindings on the same parameter

There is nothing in the docs for sqlite3_clear_bindings saying you must call it in addition to simply setting the bindings.

More detail: Avoid_sqlite3_clear_bindings()


On bulk inserts

Inspired by this post and by the Stack Overflow question that led me here -- Is it possible to insert multiple rows at a time in an SQLite database? -- I've posted my first Git repository:

https://github.com/rdpoor/CreateOrUpdate

which bulk loads an array of ActiveRecords into MySQL, SQLite or PostgreSQL databases. It includes an option to ignore existing records, overwrite them or raise an error. My rudimentary benchmarks show a 10x speed improvement compared to sequential writes -- YMMV.

I'm using it in production code where I frequently need to import large datasets, and I'm pretty happy with it.


Bulk imports seems to perform best if you can chunk your INSERT/UPDATE statements. A value of 10,000 or so has worked well for me on a table with only a few rows, YMMV...


If you care only about reading, somewhat faster (but might read stale data) version is to read from multiple connections from multiple threads (connection per-thread).

First find the items, in the table:

SELECT COUNT(*) FROM table

then read in pages (LIMIT/OFFSET):

SELECT * FROM table ORDER BY _ROWID_ LIMIT <limit> OFFSET <offset>

where and are calculated per-thread, like this:

int limit = (count + n_threads - 1)/n_threads;

for each thread:

int offset = thread_index * limit

For our small (200mb) db this made 50-75% speed-up (3.8.0.2 64-bit on Windows 7). Our tables are heavily non-normalized (1000-1500 columns, roughly 100,000 or more rows).

Too many or too little threads won't do it, you need to benchmark and profile yourself.

Also for us, SHAREDCACHE made the performance slower, so I manually put PRIVATECACHE (cause it was enabled globally for us)