To optimize insert speed, combine many small operations into a single large operation. Ideally, you make a single connection, send the data for many new rows at once, and delay all index updates and consistency checking until the very end.
Actually, SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second. Transaction speed is limited by the rotational speed of your disk drive.
SQLite doesn't have any special way to bulk insert data. To get optimal performance when inserting or updating data, ensure that you do the following: Use a transaction. Reuse the same parameterized command.
The SQLite docs explains why this is so slow: Transaction speed is limited by disk drive speed because (by default) SQLite actually waits until the data really is safely stored on the disk surface before the transaction is complete. That way, if you suddenly lose power or if your OS crashes, your data is still safe.
Several tips:
pragma journal_mode
). There is NORMAL
, and then there is OFF
, which can significantly increase insert speed if you're not too worried about the database possibly getting corrupted if the OS crashes. If your application crashes the data should be fine. Note that in newer versions, the OFF/MEMORY
settings are not safe for application level crashes.PRAGMA page_size
). Having larger page sizes can make reads and writes go a bit faster as larger pages are held in memory. Note that more memory will be used for your database.CREATE INDEX
after doing all your inserts. This is significantly faster than creating the index and then doing your inserts.INTEGER PRIMARY KEY
if possible, which will replace the implied unique row number column in the table.!feof(file)
!I've also asked similar questions here and here.
Try using SQLITE_STATIC
instead of SQLITE_TRANSIENT
for those inserts.
SQLITE_TRANSIENT
will cause SQLite to copy the string data before returning.
SQLITE_STATIC
tells it that the memory address you gave it will be valid until the query has been performed (which in this loop is always the case). This will save you several allocate, copy and deallocate operations per loop. Possibly a large improvement.
Avoid sqlite3_clear_bindings(stmt)
.
The code in the test sets the bindings every time through which should be enough.
The C API intro from the SQLite docs says:
Prior to calling sqlite3_step() for the first time or immediately after sqlite3_reset(), the application can invoke the sqlite3_bind() interfaces to attach values to the parameters. Each call to sqlite3_bind() overrides prior bindings on the same parameter
There is nothing in the docs for sqlite3_clear_bindings
saying you must call it in addition to simply setting the bindings.
More detail: Avoid_sqlite3_clear_bindings()
Inspired by this post and by the Stack Overflow question that led me here -- Is it possible to insert multiple rows at a time in an SQLite database? -- I've posted my first Git repository:
https://github.com/rdpoor/CreateOrUpdate
which bulk loads an array of ActiveRecords into MySQL, SQLite or PostgreSQL databases. It includes an option to ignore existing records, overwrite them or raise an error. My rudimentary benchmarks show a 10x speed improvement compared to sequential writes -- YMMV.
I'm using it in production code where I frequently need to import large datasets, and I'm pretty happy with it.
Bulk imports seems to perform best if you can chunk your INSERT/UPDATE statements. A value of 10,000 or so has worked well for me on a table with only a few rows, YMMV...
If you care only about reading, somewhat faster (but might read stale data) version is to read from multiple connections from multiple threads (connection per-thread).
First find the items, in the table:
SELECT COUNT(*) FROM table
then read in pages (LIMIT/OFFSET):
SELECT * FROM table ORDER BY _ROWID_ LIMIT <limit> OFFSET <offset>
where and are calculated per-thread, like this:
int limit = (count + n_threads - 1)/n_threads;
for each thread:
int offset = thread_index * limit
For our small (200mb) db this made 50-75% speed-up (3.8.0.2 64-bit on Windows 7). Our tables are heavily non-normalized (1000-1500 columns, roughly 100,000 or more rows).
Too many or too little threads won't do it, you need to benchmark and profile yourself.
Also for us, SHAREDCACHE made the performance slower, so I manually put PRIVATECACHE (cause it was enabled globally for us)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With