What is the best database structure for this scenario?

Question

I have a database that is holding real estate MLS (Multiple Listing Service) data. Currently, I have a single table that holds all the listing attributes (price, address, sqft, etc.). There are several different property types (residential, commercial, rental, income, land, etc.) and each property type share a majority of the attributes, but there are a few that are unique to that property type.

My question is the shared attributes are in excess of 250 fields and this seems like too many fields to have in a single table. My thought is I could break them out into an EAV (Entity-Attribute-Value) format, but I've read many bad things about that and it would make running queries a real pain as any of the 250 fields could be searched on. If I were to go that route, I'd literally have to pull all the data out of the EAV table, grouped by listing id, merge it on the application side, then run my query against the in memory object collection. This also does not seem very efficient.

I am looking for some ideas or recommendations on which way to proceed. Perhaps the 250+ field table is the only way to proceed.

Just as a note, I'm using SQL Server 2012, .NET 4.5 w/ Entity Framework 5, C# and data is passed to asp.net web application via WCF service.

Thanks in advance.

Danny Varod · Accepted Answer

Lets consider the pros and cons of the alternatives:

One table for all listings + attributes:

Very wide table - hard to view to model & schema definitions and table data
One query with no joins required to retreive all data on listing(s)
Requires schema + model change for each new attribute.
Efficient if you always load all the attributes and most items have values for most of the attributes.
Example LINQ query according to attributes:

context.Listings.Where(l => l.PricePerMonthInUsd < 10e3 && l.SquareMeters >= 200)
    .ToList();

One table for all listings, one table for attribute types and one for (listing IDs + attribute IDS +) values (EAV):

Listing table is narrow
Efficient if data is very sparse (most attributes don't have values for most items)
Requires fetching all data from values - one additional query (or one join, however, that would waste bandwidth - will fetch basic listing table data per attribute value row)
Does not require schema + model changes for new attributes
If you want type safe access to attributes via code, you'll need custom code generation based on attribute types table
Example LINQ query according to attributes:

var listingIds = context.AttributeValues.Where(v =>
                    v.AttributeTypeId == PricePerMonthInUsdId && v < 10e3)
                .Select(v => v.ListingId)
                .Intersection(context.AttributeVales.Where(v =>
                    v.AttributeTypeId == SquareMetersId && v.Value >= 200)
                .Select(v => v.ListingId)).ToList();

or: (compare performance on actual DB)

var listingIds = context.AttributeValues.Where(v =>
                    v.AttributeTypeId == PricePerMonthInUsdId && v < 10e3)
                .Select(v => v.ListingId).ToList();

listingIds = context.AttributeVales.Where(v =>
                listingIds.Contains(v.LisingId)
                && v.AttributeTypeId == SquareMetersId
                && v.Value >= 200)
            .Select(v => v.ListingId).ToList();

and then:

var listings = context.Listings.Where(l => listingIds.Contains(l.ListingId)).ToList();

Compromise option - one table for all listings and one table per group of attributes including values (assuming you can divide attributes into groups):

Multiple medium width tables
Efficient if data is sparse per group (e.g. garden related attributes are all null for listings without gardens, so you don't add a row to the garden related table for them)
Requires one query with multiple joins (bandwidth not wasted in join, since group tables are 1:0..1 with listing table, not 1:many)
Requires schema + model changes for new attributes
Makes viewing the schema/model simpler - if you can divide attributes to groups of 10, you'll have 25 tables with 11 columns instead of another 250 on the listing table
LINQ query is somewhere between the above two examples.

Consider the pros and cons according to your specific statistics (regarding sparseness) and requirements/maintainability plan (e.g. How often are attribute types added/changed?) and decide.

What is the best database structure for this scenario?

Tags:

database

sql-server

entity-framework

database-design

domain-model

Ricketts

1 Answers

Danny Varod

Recent Activity

Donate For Us

What is the best database structure for this scenario?

Tags:

database

sql-server

entity-framework

database-design

domain-model

Ricketts

1 Answers

Danny Varod

Related questions

Recent Activity

Donate For Us