Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sqlalchemy bulk_insert_mappings generates a large number of insert batches, is this avoidable?

I have a large number of objects that need to be inserted into an Oracle database via sqlalchemy.

Using individual inserts took quite a while to execute. After searching around it became obvious that there are more efficient bulk insert methods, bulk_insert_mappings, bulk_save_objects, etc. These methods perform better than individual inserts. However, I've noticed that they seem to batch the inserts into groupings of about 8 or so. Is it possible to increase this batch size to something around 1000?

The performance documentation I've been using is: http://docs.sqlalchemy.org/en/latest/faq/performance.html

The render_nulls flag doesn't seem to have an effect. I'm using the cx_Oracle driver if that helps.

For extra reference:

session.bulk_insert_mappings(MY_OBJECT, my_object_dicts_for_insert)

Produces about len(my_object_dicts_for_insert)/8 statements as measured by the "after_cursor_execute" event. Is there a way to tune this behavior, either from sqlalchemy or the database?

Thanks!

like image 410
Steve Avatar asked Oct 28 '25 01:10

Steve


1 Answers

Found the issue, it was in my understanding of the render_nulls flag.

From the description, I thought that if a list of dictionaries with different keys was provided then they would be padded with null values to allow for a single insert statement.

This is not how it works, you must make sure that your list of dictionaries has identical keys and that you pad unused fields with None. Then when render_nulls is set to True, null values will be inserted into the columns with a None value, allowing for a single insert statement.

Examples:

# This results in 2 inserts
my_objects = [{'a': 1, 'b': 2}, {'a': 1}]
session.bulk_insert_mappings(MY_OBJECT, my_objects, render_nulls=True)

# This results in 2 inserts
my_objects = [{'a': 1, 'b': 2}, {'a': 1}]
session.bulk_insert_mappings(MY_OBJECT, my_objects, render_nulls=False)

# This results in 2 inserts
my_objects = [{'a': 1, 'b': 2}, {'a': 1, 'b': None}]
session.bulk_insert_mappings(MY_OBJECT, my_objects, render_nulls=False)

# This results in 1 insert - BETTER PERFORMANCE
my_objects = [{'a': 1, 'b': 2}, {'a': 1, 'b': None}]
session.bulk_insert_mappings(MY_OBJECT, my_objects, render_nulls=True)

This results in considerable performance gains, for me the performance was increased by about 10x.

Note that there are possible side effects when using render_nulls, read the documentation here: http://docs.sqlalchemy.org/en/latest/orm/session_api.html#sqlalchemy.orm.session.Session.bulk_insert_mappings

If you are unable to use the render_nulls flag, can't pad your dicts with None, or don't want the increased bandwidth of sending null values, you can improve the number of inserts executed by grouping your object list by the dict's keys sets.

Examples:

# This will result in 2 inserts - BETTER PERFORMANCE
my_objects = [{'a': 1, 'b': 2}, {'a': 2, 'b': 3}, {'a': 4}]
session.bulk_insert_mappings(MY_OBJECT, my_objects)
# This will result in 3 inserts
my_objects = [{'a': 1, 'b': 2}, {'a': 4}, {'a': 2, 'b': 3}]
session.bulk_insert_mappings(MY_OBJECT, my_objects)
like image 120
Steve Avatar answered Oct 30 '25 17:10

Steve



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!