Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How batch size affects bulk insert performance?

Tags:

c#

.net

sybase

I am doing bulk insert in syabse database by grouping insert query and sending it to database in batch where size of batch is configurable, the code looks somewhat like this

public static void InsertModelValueInBulk(DataSet modelValueData, int clsaId)
    {
        int batchSize = Convert.ToInt32(ConfigurationManager.AppSettings["BatchSize"].ToString());            
        IList<string> queryBuffer = new List<string>();    
        using (var connection = GetAseConnection())
        {
            connection.Open();
            var tran = connection.BeginTransaction();
            try
            {                  
                for (int i = 0; i < modelValueData.Tables[0].Rows.Count; i++)
                {

                    var insertItem = string.Format(@"select '{0}',{1},{2},{3},'{4}','{5}','{6}',{7}", row["ModelValueID"], Convert.ToInt32(row["StockModelID"]), Convert.ToInt32(row["ModelItemID"]),
                                                                      fyeStr, row["Period"], value, row["UpdatedUser"], clsaId);
                    queryBuffer.Add(insertItem);
                    if (queryBuffer.Count % (batchSize) == 0 && queryBuffer.Count > 0)
                    {
                        var finalQuery = @"INSERT INTO InsertTable (ModelValueID, StockModelID, ModelItemID, FYE, Period, Value, UpdatedUser,id) 
         " + String.Join(" union ", queryBuffer.ToArray<string>());                         

                        using (var cmd = new AseCommand(finalQuery, connection, tran))
                        {                                    
                                cmd.ExecuteNonQuery();                              
                        }
                        queryBuffer.Clear();
                    }                      
                }                   
                tran.Commit();
            }
            catch
            {
                tran.Rollback();
                throw;
            }
            finally
            {
                tran.Dispose();
            }
        }
    }

using this the performance observed for batch size vs time taken to insert 20000 forms a J curve, sample data is somewhat like

batch size 10 => Operation completes in 30 sec, when batch size is 50 => 20 sec, 100=>10 sec, 200=>20 sec, 500 30 sec, 1000=>1 min.

Would like to understand what is reason behind this J curve. Is it something to do with app server memory or some database server setting or its something else? What makes 100 optimum and can this be tweaked further?

like image 371
Brijesh Mishra Avatar asked Nov 23 '25 14:11

Brijesh Mishra


2 Answers

BULK insert locks the table for the duration of the batch size. Locks have a basic overhead, so small batches won't benefit nearly as much, but do let other operations happen against the table in-between batches.

So larger batches are good, to a point. Because it's a transaction, the data is not committed until the current batch is complete. This means writing to the log file. Really large batches will cause the log to grow, which is IO intensive, it also increases contention as more of your log will be in use.

Something along those lines.

edit: Two other things 1) Use parameterized inputs 2) If you don't do #1, "union" causes a distinct. Use "union all"

like image 196
Bengie Avatar answered Nov 26 '25 04:11

Bengie


I see quite a feww Issues with you existing code.. for example.. on your Commit I would not assume that Commits would always be successful..

  1. I would wrap all code that could have the potential to fail or explode around a try catch Commits, Rollbacks cmd.Execute

  2. I would look at my Select statement and personally I would create a stored procedure and if you can't do that I would make the select string a const.

  3. I would name my transactions personally.. but that's up to you

  4. does this line have the potential of changing during every method call..

    int batchSize = Convert.ToInt32(ConfigurationManager.AppSettings["BatchSize"].ToString());
    if not I would make it a static call and not call it everytime you go into the method

  5. try to refactor your code .. it's starting to look a bit confusing to follow..
like image 26
MethodMan Avatar answered Nov 26 '25 04:11

MethodMan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!