When storing and retrieving a datastore entity that contains a list of tuples what is the most efficient way of storing this list?
When I have encountered this problem the tuples could be anything from key value pairs, to a datetime and sample results, to (x, y) coordinates.
The number of tuples is variable and ranges from 1 to a few hundred.  
The entity containing these tuples, would need to be referenced quickly/cheaply, and the tuple values do not need to be indexed.
I have had this problem a few times, and have solved it a number of different ways.
Method 1:
Convert the tuple values to a string and concatenate them together with some delimiter.
def PutEntity(entity, tuples):
  entity.tuples = ['_'.join(tuple) for tuple in tuples]
  entity.put()
Advantages: Results are easily readable in the Datastore Viewer, everything is fetched in one get. Disadvantages: Potential precision loss, programmer required to deserialize/serialize, more bytes required to store data in string format.
Method 2:
Store each tuple value in a list and zip / unzip the tuple.
def PutEntity(entity, tuples):
  entity.keys = [tuple[0] for tuple in tuples]
  entity.values = [tuple[1] for tuple in tuples]
  entity.put()
Advantages: No loss of precision, Confusing but still possible to view data in Datastore viewer, Able to enforce types, Everything is fetched in one get.
Disadvantage: programmer needs to zip / unzip the tuples or carefully maintain order in the lists.
Method 3:
Serialize the list of tuples in some manor json, pickle, protocol buffers and store it in a blob or text property.
Advantages: Usable with objects, and more complex objects, less risk of a bug miss matching tuple values.
Disadvantages: Blob store access requires and additional fetch?, Can not view data in the Datastore Viewer.
Method 4:
Store the tuples in another entity and keep a list of the keys.
Advantages: More obvious architecture.  If the entity is a view, we no longer need to keep two copies of the tuple data.
Disadvantages: Two fetches required one for the entity and key list and one for the tuples.
I am wondering if anyone knows which one performs the best and if there is a way I haven't thought about?
Thanks, Jim
List and Tuple in Python are the classes of Python Data Structures. The list is dynamic, whereas the tuple has static characteristics. This means that lists can be modified whereas tuples cannot be modified, the tuple is faster than the list because of static in nature.
Creating a tuple is faster than creating a list. Creating a list is slower because two memory blocks need to be accessed. An element in a tuple cannot be removed or replaced. An element in a list can be removed or replaced.
The key takeaways are: The key difference between the tuples and lists is that while the tuples are immutable objects the lists are mutable. This means that tuples cannot be changed while the lists can be modified. Tuples are more memory efficient than the lists.
To store a single value, or singleton in a tuple, you must include a comma when assigning the value to a variable. If you don't include the comma, Python does not store the value as a tuple. For example, create the following tuple to store a single string. Use the type() function to display the day variable's type.
I use Method 3. Blobstore may require an extra fetch, but db.BlobProperty does not. For objects where it is important that it comes out of storage exactly as it was put in I use PickleProperty (which can be found in tipfy, and some other utility libraries).
For objects where I just need its state stored I wrote a JsonProperty function that works similarly to PickleProperty (but uses SimpleJson, obviously).
For me getting all data in a single fetch, and being idiot-proof, is more important than cpu performance (in App Engine). According to the Google I/O talk on AppStats, a trip to the datastore is almost always going to be more expensive than a bit of local parsing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With