Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

One-time combination of 2 large collections in RavenDB

Tags:

ravendb

How can we do a one time data migration of one document collection that requires a lookup to another collection in RavenDB?

I think we should do it with a patch operation. But how can we do the lookup?

In detail:

We have 2 collections in our RavenDB. One contains business cases with an address. The other contains additional data to addresses. So when a new business case is created, we look up the address and add the additional data to the business case document. But we need to update all cases that have been created before we had the additional data. Each collection contains about 100'000 documents and we cannot load them into memory.

Initially the collections look something like that:

{
  "Id" = "BusinessCase/1",
  "Address" = "street 1, city"
  "AdditionalData" = null
}
{
  "Id" = "AddressData/1",
  "Address" = "street 1, city"
  "AdditionalData" = {...}
}

After the data migration we expect the business cases to be enriched with additional data by a matching address.

{
  "Id" = "BusinessCase/1",
  "Address" = "street 1, city"
  "AdditionalData" = {...}
}

I would expect a patch script to look something like that, but I have no idea how to load the matching address data document.

from BusinessCases as b
load AddressDatas as a where a.Address = b.Address // something like this
update {
  b.AdditionalData = a. AdditionalData;
}
like image 778
Heiner Avatar asked Jan 18 '26 08:01

Heiner


1 Answers

Here is one way you can solve this:
Define the following Multi-Map Index:

    public class BusinessCaseWithAddressData : AbstractMultiMapIndexCreationTask<BusinessCaseWithAddressData.IndexEntry>
    {
        public class IndexEntry
        {
            public string Address { get; set; }
            public string BusinessCaseId { get; set; }
            public AdditionalData AdditionalData { get; set; }
        }

        public BusinessCaseWithAddressData()
        {
            // Map for BusinessCases collection
            AddMap<BusinessCase>(businessCases => 
                from bc in businessCases
                select new IndexEntry
                {
                    Address = bc.Address,
                    BusinessCaseId = bc.Id,
                    AdditionalData = null // Initially no additional data
                });

            // Map for AddressDatas collection
            AddMap<AddressData>(addressDatas => 
                from ad in addressDatas
                select new IndexEntry
                {
                    Address = ad.Address,
                    BusinessCaseId = null, // Not relevant for this map
                    AdditionalData = ad.AdditionalData
                });

            // Reduce phase to join the data
            Reduce = results => 
                from result in results
                group result by result.Address into g
                select new IndexEntry
                {
                    Address = g.Key,
                    BusinessCaseId = g.Select(x => x.BusinessCaseId).FirstOrDefault(x => x != null),
                    AdditionalData = g.Select(x => x.AdditionalData).FirstOrDefault(x => x != null)
                };
        }
    }

Note that Patch operations cannot be executed directly on Map-Reduce indexes in RavenDB. Instead, you need to execute the Patch operation on the underlying collections by querying them using the multi-map index for filtering. Here’s how:

            using (var session = store.OpenSession())
            {
                // Query the multi-map index to fetch results
                var results = session.Query<BusinessCaseWithAddressData.IndexEntry, BusinessCaseWithAddressData>()
                    .Where(x => x.BusinessCaseId != null && x.AdditionalData != null)
                    .ToList();

                foreach (var item in results)
                {
                    // Load the BusinessCase document
                    var businessCase = session.Load<BusinessCase>(item.BusinessCaseId);

                    if (businessCase != null)
                    {
                        // Update the AdditionalData field
                        businessCase.AdditionalData = item.AdditionalData;

                        // Mark the document as modified
                        session.Store(businessCase);
                    }
                }

                // Save changes in batches
                session.SaveChanges();
            }

Classes used in this solution:

    public class BusinessCase
    {
        public string Id { get; set; }
        public string Address { get; set; }
        public AdditionalData AdditionalData { get; set; }
    }
    
    public class AddressData
    {
        public string Id { get; set; }
        public string Address { get; set; }
        public AdditionalData AdditionalData { get; set; }
    }
    
    public class AdditionalData
    {
        public string someData { get; set; }
    }
like image 184
Danielle Avatar answered Jan 21 '26 06:01

Danielle