I am creating dummy MySQL table data for testing purposes. The minimum size of a table is 4 million rows.
From the MySQL documentation -
If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.
I have a Java program that generates a sql script to do bulk inserts. It is invoked as follows-
java CreateData -rows 4000000 -stepSize 10000 > testdata.sql
where
rows = total number of rows to insert in the database
stepSize = size of each bulk insert statement.
Thus if we have 4 million rows and a stepSize of 10000,
then we have 400 bulk insert statements.
I tried doing a single bulk insert with 4 million rows, but as expected, that gave an error -
$ time mysql -uroot < testdata.sql
ERROR 2006 (HY000) at line 2: MySQL server has gone away
So I tried experimenting by steadily increasing the stepsize, when I finally got the error -
ERROR 1153 (08S01) at line 1150017: Got a packet bigger
than 'max_allowed_packet' bytes
Until then, with an increase in the stepsize, I observed a slight decrease in the total time to execute. Some sample values -
stepSize 2000
real 3m37.564s
user 0m7.608s
sys 0m0.180s
stepSize 24000
real 1m48.399s
user 0m4.788s
sys 0m0.168s
stepSize 32000
real 1m49.490s
user 0m4.544s
sys 0m0.072s
stepSize 160000
real 1m43.927s
user 0m3.800s
sys 0m0.108s
stepSize 220000
real 1m33.025s
user 0m3.772s
sys 0m0.120s
stepSize 230000
ERROR 1153 (08S01) at line 1150017:
Got a packet bigger than 'max_allowed_packet' bytes
I don't know how much more the speed would have increased if I had increased the max_allowed_packet variable. So, the question is, how can I determine the optimal step size for the bulk insert?
Take for example the code for the mysqldump tool. It produces INSERT statements that insert multiple rows. How does it decide the optimal number of rows, given that rows can be of variable length?
It appends rows one by one, while the length of the total INSERT statement with rows of data is less than net_buffer_length. This is another MySQL configuration variable that affects the largest single statement you can submit to the server.
In other words, if the length of the INSERT statement + the length of the next row of data > net_buffer_length, then finish the current statement and then start a new statement using the next row of data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With