Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL - How can we determine the optimal number of bulk insert statements for inserting a large number of rows in a table?

I am creating dummy MySQL table data for testing purposes. The minimum size of a table is 4 million rows.

From the MySQL documentation -

If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.

I have a Java program that generates a sql script to do bulk inserts. It is invoked as follows-

java CreateData -rows 4000000 -stepSize 10000 > testdata.sql
where 
rows     = total number of rows to insert in the database
stepSize = size of each bulk insert statement. 
Thus if we have 4 million rows and a stepSize of 10000,  
then we have 400 bulk insert statements. 

I tried doing a single bulk insert with 4 million rows, but as expected, that gave an error -

$ time mysql -uroot < testdata.sql
ERROR 2006 (HY000) at line 2: MySQL server has gone away

So I tried experimenting by steadily increasing the stepsize, when I finally got the error -

ERROR 1153 (08S01) at line 1150017: Got a packet bigger 
than 'max_allowed_packet' bytes

Until then, with an increase in the stepsize, I observed a slight decrease in the total time to execute. Some sample values -

stepSize 2000    
real    3m37.564s
user    0m7.608s
sys     0m0.180s


stepSize 24000
real    1m48.399s
user    0m4.788s
sys     0m0.168s


stepSize 32000
real    1m49.490s
user    0m4.544s
sys     0m0.072s


stepSize 160000
real    1m43.927s
user    0m3.800s
sys     0m0.108s


stepSize 220000

real    1m33.025s
user    0m3.772s
sys     0m0.120s

stepSize 230000
ERROR 1153 (08S01) at line 1150017: 
Got a packet bigger than 'max_allowed_packet' bytes

I don't know how much more the speed would have increased if I had increased the max_allowed_packet variable. So, the question is, how can I determine the optimal step size for the bulk insert?

like image 910
Wes Avatar asked Nov 27 '25 04:11

Wes


1 Answers

Take for example the code for the mysqldump tool. It produces INSERT statements that insert multiple rows. How does it decide the optimal number of rows, given that rows can be of variable length?

It appends rows one by one, while the length of the total INSERT statement with rows of data is less than net_buffer_length. This is another MySQL configuration variable that affects the largest single statement you can submit to the server.

In other words, if the length of the INSERT statement + the length of the next row of data > net_buffer_length, then finish the current statement and then start a new statement using the next row of data.

like image 105
Bill Karwin Avatar answered Nov 29 '25 21:11

Bill Karwin