How can we do a one time data migration of one document collection that requires a lookup to another collection in RavenDB?
I think we should do it with a patch operation. But how can we do the lookup?
In detail:
We have 2 collections in our RavenDB. One contains business cases with an address. The other contains additional data to addresses. So when a new business case is created, we look up the address and add the additional data to the business case document. But we need to update all cases that have been created before we had the additional data. Each collection contains about 100'000 documents and we cannot load them into memory.
Initially the collections look something like that:
{
"Id" = "BusinessCase/1",
"Address" = "street 1, city"
"AdditionalData" = null
}
{
"Id" = "AddressData/1",
"Address" = "street 1, city"
"AdditionalData" = {...}
}
After the data migration we expect the business cases to be enriched with additional data by a matching address.
{
"Id" = "BusinessCase/1",
"Address" = "street 1, city"
"AdditionalData" = {...}
}
I would expect a patch script to look something like that, but I have no idea how to load the matching address data document.
from BusinessCases as b
load AddressDatas as a where a.Address = b.Address // something like this
update {
b.AdditionalData = a. AdditionalData;
}
Here is one way you can solve this:
Define the following Multi-Map Index:
public class BusinessCaseWithAddressData : AbstractMultiMapIndexCreationTask<BusinessCaseWithAddressData.IndexEntry>
{
public class IndexEntry
{
public string Address { get; set; }
public string BusinessCaseId { get; set; }
public AdditionalData AdditionalData { get; set; }
}
public BusinessCaseWithAddressData()
{
// Map for BusinessCases collection
AddMap<BusinessCase>(businessCases =>
from bc in businessCases
select new IndexEntry
{
Address = bc.Address,
BusinessCaseId = bc.Id,
AdditionalData = null // Initially no additional data
});
// Map for AddressDatas collection
AddMap<AddressData>(addressDatas =>
from ad in addressDatas
select new IndexEntry
{
Address = ad.Address,
BusinessCaseId = null, // Not relevant for this map
AdditionalData = ad.AdditionalData
});
// Reduce phase to join the data
Reduce = results =>
from result in results
group result by result.Address into g
select new IndexEntry
{
Address = g.Key,
BusinessCaseId = g.Select(x => x.BusinessCaseId).FirstOrDefault(x => x != null),
AdditionalData = g.Select(x => x.AdditionalData).FirstOrDefault(x => x != null)
};
}
}
Note that Patch operations cannot be executed directly on Map-Reduce indexes in RavenDB. Instead, you need to execute the Patch operation on the underlying collections by querying them using the multi-map index for filtering. Here’s how:
using (var session = store.OpenSession())
{
// Query the multi-map index to fetch results
var results = session.Query<BusinessCaseWithAddressData.IndexEntry, BusinessCaseWithAddressData>()
.Where(x => x.BusinessCaseId != null && x.AdditionalData != null)
.ToList();
foreach (var item in results)
{
// Load the BusinessCase document
var businessCase = session.Load<BusinessCase>(item.BusinessCaseId);
if (businessCase != null)
{
// Update the AdditionalData field
businessCase.AdditionalData = item.AdditionalData;
// Mark the document as modified
session.Store(businessCase);
}
}
// Save changes in batches
session.SaveChanges();
}
Classes used in this solution:
public class BusinessCase
{
public string Id { get; set; }
public string Address { get; set; }
public AdditionalData AdditionalData { get; set; }
}
public class AddressData
{
public string Id { get; set; }
public string Address { get; set; }
public AdditionalData AdditionalData { get; set; }
}
public class AdditionalData
{
public string someData { get; set; }
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With