I am doing bulk insert in syabse database by grouping insert query and sending it to database in batch where size of batch is configurable, the code looks somewhat like this
public static void InsertModelValueInBulk(DataSet modelValueData, int clsaId)
{
int batchSize = Convert.ToInt32(ConfigurationManager.AppSettings["BatchSize"].ToString());
IList<string> queryBuffer = new List<string>();
using (var connection = GetAseConnection())
{
connection.Open();
var tran = connection.BeginTransaction();
try
{
for (int i = 0; i < modelValueData.Tables[0].Rows.Count; i++)
{
var insertItem = string.Format(@"select '{0}',{1},{2},{3},'{4}','{5}','{6}',{7}", row["ModelValueID"], Convert.ToInt32(row["StockModelID"]), Convert.ToInt32(row["ModelItemID"]),
fyeStr, row["Period"], value, row["UpdatedUser"], clsaId);
queryBuffer.Add(insertItem);
if (queryBuffer.Count % (batchSize) == 0 && queryBuffer.Count > 0)
{
var finalQuery = @"INSERT INTO InsertTable (ModelValueID, StockModelID, ModelItemID, FYE, Period, Value, UpdatedUser,id)
" + String.Join(" union ", queryBuffer.ToArray<string>());
using (var cmd = new AseCommand(finalQuery, connection, tran))
{
cmd.ExecuteNonQuery();
}
queryBuffer.Clear();
}
}
tran.Commit();
}
catch
{
tran.Rollback();
throw;
}
finally
{
tran.Dispose();
}
}
}
using this the performance observed for batch size vs time taken to insert 20000 forms a J curve, sample data is somewhat like
batch size 10 => Operation completes in 30 sec, when batch size is 50 => 20 sec, 100=>10 sec, 200=>20 sec, 500 30 sec, 1000=>1 min.
Would like to understand what is reason behind this J curve. Is it something to do with app server memory or some database server setting or its something else? What makes 100 optimum and can this be tweaked further?
BULK insert locks the table for the duration of the batch size. Locks have a basic overhead, so small batches won't benefit nearly as much, but do let other operations happen against the table in-between batches.
So larger batches are good, to a point. Because it's a transaction, the data is not committed until the current batch is complete. This means writing to the log file. Really large batches will cause the log to grow, which is IO intensive, it also increases contention as more of your log will be in use.
Something along those lines.
edit: Two other things 1) Use parameterized inputs 2) If you don't do #1, "union" causes a distinct. Use "union all"
I see quite a feww Issues with you existing code.. for example.. on your Commit I would not assume that Commits would always be successful..
I would wrap all code that could have the potential to fail or explode around a try catch Commits, Rollbacks cmd.Execute
I would look at my Select statement and personally I would create a stored procedure and if you can't do that I would make the select string a const.
I would name my transactions personally.. but that's up to you
does this line have the potential of changing during every method call..
int batchSize = Convert.ToInt32(ConfigurationManager.AppSettings["BatchSize"].ToString());
if not I would make it a static call and not call it everytime you go into the method
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With