OS: Mac OS Catalina v 10.15.1
Python version: Python 3.7.1
I'm using Firestore as my database for a personal project with the Python SDK. I'm currently trying to optimize my backend and I've noticed that writes to Firestore are quite slow. Take the example piece of code below:
import firebase_admin
from firebase_admin import credentials
from firebase_admin import firestore
import time
cred = credentials.Certificate("./path/to/adminsdk.json")
firebase_admin.initialize_app(cred)
db = firestore.client()
test_data = {f"test_field_{i}":f"test_value_{i}" for i in range(20)}
now = time.time()
db.collection(u'latency_test_collection').document(u'latency_test_document').set(test_data)
print(f"Total time: {time.time()-now}")
The above code takes >300ms to run, which seems quite slow, especially when I have multiple writes of much larger size than the above example. I've checked my internet connection, and regardless of the connection the performance hovers around this value. Is this performance expected for Firestore writes, or is there a way I could be optimizing my code for this?
Generally speaking this is not a good example, since you only write one document. The latency of writing this single document could be 3000 ms and the next one could be 1 ms in theory. Develop a test that writes multiple documents and take the average time of those writes. Also keep in mind that writing consecutive documents will degrade performance if the document IDs are adjecent. That is why you should pick a random document ID or a hash of some sort.
import uuid
from google.cloud import firestore_v1
count = 20
data = [{f"test_field_{i}":f"test_value_{i}"} for i in range(count)]
now = time.time()
db = firestore_v1.Client()
coll = db.collection(u'latency_test_collection').
for record in data:
coll.document(uuid.uuid4().hex).set(record)
print(f"Average time: {(time.time() - now)/count)}")
But keep in mind that writing a large amount of single records/documents to firestore, you are still limited by the speed of the firestore api. There are two ways to overcome this. The first one is writing documents asynchronously. This way you can handle multiple writes at once, but this can be extremely costly, since you pay for every api call to firestore. The other (preferred) way of writing multiple records/documents is by doing batch operations, this is shown below. Keep in mind here that the maximum batch size for writes is 500 at the time of writing.
import uuid
from google.cloud import firestore_v1
count = 20
data = [{f"test_field_{i}":f"test_value_{i}"} for i in range(count)]
now = time.time()
db = firestore_v1.Client()
coll = db.collection(u'latency_test_collection').
batch = db.batch()
for idx, record in enumerate(data):
doc_ref = coll.document(uuid.uuid4().hex)
batch.set(doc_ref, record)
# Max batch size is 500
if idx % 500 == 0:
batch.commit()
if idx % 500 != 0
batch.commit()
print(f"Total time: {(time.time() - now)/count)}")
Like @Nebulastic said, batches are much more efficient than one by one transactions. I just ran a test from my laptop in Europe to a Firestore located in us-west2 (Los Angeles). Here are the actual results from one by one deletions and batch deletions.
$ python firestore_test.py
Creating 10 documents
Wrote 10 documents in 1.80 seconds.
Deleting documents one by one
Deleted 10 documents in 7.97 seconds.
###
Creating 10 documents
Wrote 10 documents in 0.92 seconds.
Deleting documents in batch
Deleted 10 documents in 1.71 seconds.
###
Creating 2000 documents
Wrote 2000 documents in 6.27 seconds.
Deleting documents in batch
Deleted 2000 documents in 9.80 seconds.
Here's the test code:
from time import time
from uuid import uuid4
from google.cloud import firestore
DB = firestore.Client()
def generate_user_data(entries = 10):
print('Creating {} documents'.format(entries))
now = time()
batch = DB.batch()
for counter in range(entries):
# Each transaction or batch of writes can write to a maximum of 500 documents.
# https://cloud.google.com/firestore/quotas#writes_and_transactions
if counter % 500 == 0 and counter > 0:
batch.commit()
batch = DB.batch()
user_id = str(uuid4())
data = {
"some_data": str(uuid4()),
"expires_at": int(now)
}
user_ref = DB.collection(u'users').document(user_id)
batch.set(user_ref, data)
batch.commit()
print('Wrote {} documents in {:.2f} seconds.'.format(entries, time() - now))
def delete_one_by_one():
print('Deleting documents one by one')
now = time()
docs = DB.collection(u'users').where(u'expires_at', u'<=', int(now)).stream()
counter = 0
for doc in docs:
doc.reference.delete()
counter = counter + 1
print('Deleted {} documents in {:.2f} seconds.'.format(counter, time() - now))
def delete_in_batch():
print('Deleting documents in batch')
now = time()
docs = DB.collection(u'users').where(u'expires_at', u'<=', int(now)).stream()
batch = DB.batch()
counter = 0
for doc in docs:
counter = counter + 1
if counter % 500 == 0:
batch.commit()
batch.delete(doc.reference)
batch.commit()
print('Deleted {} documents in {:.2f} seconds.'.format(counter, time() - now))
generate_user_data(10)
delete_one_by_one()
print('###')
generate_user_data(10)
delete_in_batch()
print('###')
generate_user_data(2000)
delete_in_batch()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With