I would like to perform a small operation on all entities of a specific kind and rewrite them to the datastore. I currently have 20,000 entities of this kind but would like a solution that would scale to any amount.
What are my options?
To update an existing entity, modify the attributes of the Entity object, then pass it to the DatastoreService. put() method. The object data overwrites the existing entity. The entire object is sent to Datastore with every call to put() .
An index is defined on a list of properties of a given entity kind, with a corresponding order (ascending or descending) for each property. For use with ancestor queries, the index may also optionally include an entity's ancestors. An index table contains a column for every property named in the index's definition.
The Cloud Datastore Administration API v1beta1 is now deprecated. The Cloud Datastore Admin backup feature is being phased out in favor of the managed export and import for Cloud Datastore. Please migrate to the managed export and import functionality at your earliest convenience.
Data objects in Firestore in Datastore mode are known as entities. An entity has one or more named properties, each of which can have one or more values. Entities of the same kind do not need to have the same properties, and an entity's values for a given property do not all need to be of the same data type.
Use a mapper - this is part of the MapReduce framework, but you only want the first component, map, as you don't need the shuffle/reduce step if you're simply mutating datastore entities.
Daniel is correct, but if you don't want to mess up with the mapper, that requires you to add another library to your app you can do it using Task Queues or even simpler using the deferred library that is included since SDK 1.2.3.
20.000 entities it's not that dramatic and I assume that this task is not going be performed in regular basis (but even if it does, it is feasible).
Here is an example using NDB and the deferred library (you can easily do that using DB, but consider switching to NDB anyway if you are not already using it). It's a pretty straight forward way, but without caring much about the timeouts:
def update_model(limit=1000):
  more_cursor = None
  more = True
  while more:
    model_dbs, more_cursor, more = Model.query().fetch_page(limit, start_cursor=more_cursor)
    for model_db in model_dbs:
      model_db.updated = True
    ndb.put_multi(model_dbs)
    logging.info('### %d entities were updated' % len(model_dbs))
class UpdateModelHandler(webapp2.RequestHandler):
  def get(self):
    deferred.defer(update_model, _queue='queue')
    self.response.headers['Content-Type'] = 'text/html'
    self.response.out.write('The task has been started!')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With