Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data Migrating document for Couchbase (i.e changing existing field type)?

I am coming from object relation database background, I understand Couchbase is schema-less, but data migration will still happen as the application develop.

In SQL we have management tool to alter table, or I can write migration script with SQL to do migration from version 1 table to version 2 table.

But in document, say we have json Document UserProfile:

UserProfile
{
  "Owner": "Rich guy!",
  "Car": "Cool car"
}

We might want to add a last visit field there, allow user have multiple car, so the new updated document will become follows:

UserProfile
{
  "Owner": "Rich guy!",
  "Car": ["Cool car", "Another car"],
  "LastVisit": "2015-09-29"
}

But for easier maintenance, I want all other UserProfile documents to follow the same format, having "Car" field as an array.

From my experience in SQL, I could write migration script which support migrating different version of table. Migrate from version 1 table to version 2...N table.

So how can I should I write such migration code? I will have to really just writing an app (executable) using Couchbase SDK to migrate all the documents each time?

What will be the good way for doing migration like this?

like image 589
King Chan Avatar asked Oct 22 '25 21:10

King Chan


2 Answers

Essentially, your problem breaks down into two parts:

  1. Finding all the documents that need to be updated.
  2. Retrieving and updating said documents.

You can do this in one of two ways: using a view that gives you the document ids, or using a DCP stream to get all the documents from the bucket. The view only gives you the ids of the documents, so you basically iterate over all the ids, and then retrieve, update and store each one using regular key-value methods. The DCP protocol, on the other hand, gives you the actual documents.

The advantage of using a view is that it's very simple to implement, works with any language SDK, and it lets you write your own logic around the process to make it more robust and safe. The disadvantage is having to build a view just for this, and also that if the data keeps changing, you must retrieve the ENTIRE view result at once, because if you try to page over the view with offsets, the ordering of results can change, thus giving you an inconsistent snapshot of the data.

The advantage of using DCP to stream all documents is that you're guaranteed to get a consistent snapshot of your data even if it's constantly changing, and also that you get the whole document directly as part of the stream, so you don't need to retrieve it separately - just update and store back to the database. The disadvantage is that it's currently only implemented in the Java SDK and is considered an experimental feature. See this blog for a simple implementation.

The third - and most convenient for an SQL user - way to do this is through the N1QL query language that's introduced in Couchbase 4. It has the same data manipulation commands as you would expect in SQL, so you could basically issue a command along the lines of UPDATE myBucket SET prop = {'field': 'value'} WHERE condition = 'something'. The advantage of this is pretty clear: it both finds and updates the documents all at once, without writing a single line of program code. The disadvantage is that the DML commands are considered "beta" in the 4.0 release of Couchbase, and that if the data set is too large, then it might not actually work due to timing out at some point. And of course, that fact that you need Couchbase 4.0 in the first place.

like image 159
David Ostrovsky Avatar answered Oct 25 '25 19:10

David Ostrovsky


I don't know of any official tool currently to help with data model migrations, but there are some helpful code snippets depending on the SDK you use (see e.g. bulk updates in java).

For now you will have to write your own script. The basic process is as follow:

  • Make sure all your documents have a model_version attribute that you increment after each migration.
  • Before a migration update your application code so it can handle both the old and the new model_version, and so that new documents are written in the new model.
  • Write a script that iterate through all the old model documents in your bucket(you need a view that emits the document key), make the update you want, increment model_version and save the document back.

In a high concurrency environment it's important to have good error handling and monitoring, you could have for example a view that counts how many documents are in each model_version.

like image 22
Julian Go Avatar answered Oct 25 '25 18:10

Julian Go