Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I prevent duplicate values in Google Cloud Datastore?

Is there any mechanism available to prevent duplicate data in certain fields of my entities? Something similar to an SQL unique.

Failing that, what techniques do people normally use to prevent duplicate values?

like image 736
Alexander Trauzzi Avatar asked Oct 26 '25 10:10

Alexander Trauzzi


2 Answers

The only way to do the equivalent on a UNIQUE constraint in SQL will not scale very well in a NoSQL storage system like Cloud Datastore. This is mainly because it would require a read before every write, and a transaction surrounding the two operations.

If that's not an issue (ie, you don't write values very often), the process might look something like:

  1. Begin a serializable transaction
  2. Query across all Kinds for a match of property = value
  3. If the query has matches, abort the transaction
  4. If there are no matches, insert new entity with property = value
  5. Commit the transaction

Using gcloud-python, this might look like...

from gcloud import datastore
client = datastore.Client()

with client.transaction(serializable=True) as t:
    q = client.query(kind='MyKind')
    q.add_filter('property', '=', 'value')

    if q.fetch(limit=1):
        t.rollback()
    else:
        entity = datastore.Entity(datastore.Key('MyKind'))
        entity.property = 'value'
        t.put(entity)
        t.commit()

Note: The serializable flag on Transactions is relatively new in gcloud-python. See https://github.com/GoogleCloudPlatform/gcloud-python/pull/1205/files for details.


The "right way" to do this is to design your data such that the key is your measure of "uniqueness", but without knowing more about what you're trying to do, I can't say much else.

like image 101
JJ Geewax Avatar answered Oct 29 '25 09:10

JJ Geewax


The approach given above will not work in the datastore because you cannot to a query across arbitrary entities inside a transaction. If you try, an exception will be thrown.

However you can do it by using a new kind for each unique field and doing a "get" (lookup by key) within the transaction.

For example, say you have a Person kind and you want to ensure that Person.email is unique you also need a kind for e.g. UniquePersonEmail. That does not need to be referenced by anything but it is just there to ensure uniqueness.

  1. start transaction
  2. get UniquePersonEmail with id = theNewAccountEmail
  3. if exists abort
  4. put UniquePersonEmail with id = theNewAccountEmail
  5. put Person with all the other details including the email
  6. commit the transaction

So you end up doing one read and two writes to create your account.

like image 27
Dev Vercer Avatar answered Oct 29 '25 08:10

Dev Vercer