I have to add one new property to my existing NDB class:
class AppList(ndb.Model):
...
ignore = ndb.BooleanProperty(default=False) # new property
Then I will use it like below:
entries = AppList.query()
entries = entries.filter(AppList.counter > 5)
entries = entries.filter(AppList.ignore == False)
entries = entries.fetch()
I can not use AppList.ignore != True to catch early added records (which don't have ignore property), so I have to assign False for all records in my AppList entity. What is the most effective way to do it? Currently this entity contains about 4'000 entries.
Upd. I've decided to use the following ugly code (didn't manage to apply cursors), it runs as a cron job. But don't I update the same 100 records each time?
entities = AppList.query()
# entities = entities.filter(AppList.ignore != False)
entities = entities.fetch(100)
while entities:
for entity in entities:
entity.ignore = False
entity.put()
entities = AppList.query()
# entities = entities.filter(AppList.ignore != False)
entities = entities.fetch(100)
Don't forget that there is a MapReduce library that is used in these cases. But I think the best method is to use all these suggestions toghether.
Now, you need to get() and put() 4000 entities and the question is how to reduce the "costs" of this operation.
I'm just curious to know what your bool(entity.ignore) returns. If a missing property return False you can adjust the code considering it False and postponed the operation. If you put() for other reason the property ignore is written to False thanks to the default argument. So, for the rest of the entities can run a script like this (via remote_api):
def iter_entities(cursor=None):
entries = AppList.query()
res, cur, more = entries.fetch_page(100, start_cursor=cursor)
put_queue = [ent for ent in res if not hasattr(ent, 'ignore')]
# put_queue = []
# for ent in res:
# if not hasattr(ent, 'ignore'):
# put_queue.append(ent)
ndb.put_multi(put_queue)
if more:
iter_entities(cur) # a taskqueue is better
Your updated code will update first 100 entities only. try using cursor
https://developers.google.com/appengine/docs/python/ndb/queries#cursors
if u cant use cursor then use offset and keep increasing the offset by 100 on every loop or fetch all the entries once by fetch() (cursor approach is better one)
and instead of putting them one by one use ndb.put_multi(list of entities to put)
this will be more faster than putting one by one
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With