Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast batch executions in PostgreSQL

Tags:

postgresql

qt

I have a lots of data and I want to insert to DB in the least time. I did some tests. I created a table (using the below script) in PostgreSQL:

CREATE TABLE test_table
(
  id serial NOT NULL,
  item integer NOT NULL,
  count integer NOT NULL,
  CONSTRAINT test_table_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE test_table OWNER TO postgres;

I wrote test code, created 1000 random values and insert to test_table in two different ways. First, using QSqlQuery::exec()

int insert() {
QSqlDatabase db = QSqlDatabase::addDatabase("QPSQL");

db.setHostName("127.0.0.1");
db.setDatabaseName("TestDB");
db.setUserName("postgres");
db.setPassword("1234");

if (!db.open()) {
    qDebug() << "can not open DB";
    return -1;
}

QString queryString = QString("INSERT INTO test_table (item, count)"
        " VALUES (:item, :count)");

QSqlQuery query;
query.prepare(queryString);

QDateTime start = QDateTime::currentDateTime();

for (int i = 0; i < 1000; i++) {

    query.bindValue(":item", qrand());
    query.bindValue(":count", qrand());

    if (!query.exec()) {
        qDebug() << query.lastQuery();
        qDebug() << query.lastError();
    }

} //end of for i

QDateTime end = QDateTime::currentDateTime();
int diff = start.msecsTo(end);
return diff;
}

Second using QSqlQuery::execBatch:

int batchInsert() {
QSqlDatabase db = QSqlDatabase::addDatabase("QPSQL");

db.setHostName("127.0.0.1");
db.setDatabaseName("TestDB");
db.setUserName("postgres");
db.setPassword("1234");

if (!db.open()) {
    qDebug() << "can not open DB";
    return -1;
}

QString queryString = QString("INSERT INTO test_table (item, count)"
        " VALUES (:item, :count)");

QSqlQuery query;
query.prepare(queryString);

QVariantList itemList;
QVariantList CountList;

QDateTime start = QDateTime::currentDateTime();

for (int i = 0; i < 1000; i++) {

    itemList.append(qrand());
    CountList.append(qrand());

} //end of for i

query.addBindValue(itemList);
query.addBindValue(CountList);

if (!query.execBatch())
    qDebug() << query.lastError();

QDateTime end = QDateTime::currentDateTime();
int diff = start.msecsTo(end);
return diff;
}

I found that there is no difference between them:

int main() {
qDebug() << insert() << batchInsert();
return 1;}

Result:

14270 14663 (milliseconds)

How can I improve it?

In http://doc.qt.io/qt-5/qsqlquery.html#execBatch has been cited:

If the database doesn't support batch executions, the driver will simulate it using conventional exec() calls.

I'm not sure my DBMS support batch executions or not? How can I test it?

like image 725
esmaeil mollaahmadi Avatar asked Oct 25 '25 06:10

esmaeil mollaahmadi


2 Answers

In not sure what the qt driver does, but PostgreSQL can support running multiple statements in one transaction. Just do it manually instead of trying to use the built in feature of the driver.

Try changing your SQL statement to

BEGIN TRANSACTION;

For every iteration of loop run an insert statement.

INSERT HERE;

Once end of loop happens for all 1000 records issue this. On your same connection.

COMMIT TRANSACTION;

Also 1000 rows is not much to test with, you might want to try 100,000 or more to make sure the qt batch really wasn't helping.

like image 112
Kuberchaun Avatar answered Oct 26 '25 20:10

Kuberchaun


By issuing 1000 insert statements, you have 1000 round trips to the database. This takes quite some time (network and scheduling latency). So try to reduce the number of insert statements!

Let's say you want to:

insert into test_table(item, count) values (1000, 10);
insert into test_table(item, count) values (1001, 20);
insert into test_table(item, count) values (1002, 30);

Transform it into a single query and the query will need less than half of the time:

insert into test_table(item, count) values (1000, 10), (1001, 20), (1002, 30);

In PostgreSQL, there is another way to write it:

insert into test_table(item, count) values (
  unnest(array[1000, 1001, 1002])
  unnest(array[10, 20, 30]));

My reason for presenting the second way is that you can pass all the content of a big array in a single parameter (tested with in C# with the database driver "Npgsql"):

insert into test_table(item, count) values (unnest(:items), unnest(:counts));
  • items is a query parameter with the value int[]{100, 1001, 1002}
  • counts is a query parameter with the value int[]{10, 20, 30}

Today, I have cut down the running time of 10,000 inserts in C# from 80s to 550ms with this technique. It's easy. Furthermore, there is not any hassle with transactions, as a single statement is never split into multiple transactions.

I hope this works with the Qt PostgreSQL driver, too. On the server side, you need PostgreSQL >= 8.4., as older versions do not provide unnest (but there may be work arounds).

like image 32
hagello Avatar answered Oct 26 '25 20:10

hagello