Assuming that a table contains sufficient information to warrant an index seek, at what cardinality will SQL Server (or PostgreSQL) opt for an index scan?
The reason I ask this is I previously posted a question (link) in which two queries performed at the same speed, yet one didn't attempt to use the index on the processed columns. After SQL Server suggested I put a covering index that included the columns being queried (it suggested this for both queries), I started looking for reasons as to why it would make such a strange suggestion.
I experimented with making the indexes covering and composite, but both executed in the same time (we're talking 3 million rows).
Finally I concluded it was because of the ultra-high cardinality of the data. Every row is unique. I am deducing this caused SQL server to choose an index scan. However, the query stated "WHERE Col1 > ? AND Col2 < ?", so this is a little confusing.
My questions are:
I have attached the execution plan.

An index scan or table scan is when SQL Server has to scan the data or index pages to find the appropriate records. A scan is the opposite of a seek, where a seek uses the index to pinpoint the records that are needed to satisfy the query.
The Index Seek operator uses the structure of a nonclustered index to efficiently find either single rows (singleton seek) or specific subsets of rows (range seek). (When SQL Server needs to read single rows or small subsets from a clustered index, it uses a different operator: Clustered Index Seek).
The number of rows in the table. The number of unique values for a set of columns for leading columns in an index key, also known as cardinality. Leading columns refers to the first column, or the first and second column, or the first, second, and third column of an index (and so on).
3) index scan is faster than a table scan because they look at sorted data and query optimizers know when to stop and look for another range. 4) index seek is the fastest way to retrieve data and it comes into the picture when your search criterion is very specific.
In terms of SQL Server, this has been referred to as the tipping point, of which Kimberley's blog post is a good read on it. http://www.sqlskills.com/BLOGS/KIMBERLY/category/The-Tipping-Point.aspx
The tipping point is a guideline of 25%-33% of the total number of pages within the table, expressed as rows, e.g. 10k data pages would give a tipping point of 2500-3333 rows. As guidelines go this is pretty good, and as good as you will get - remember the query plan engine is a black box, and whilst it will give you a query plan, it only says what it decided, not why.
In terms of tipping a covering index though, that is not actually very easy, even with 100% of the data being selected a covering index will still seek over scan in the majority of cases.
That makes sense, if you consider that the cost optimizer doesn't assign any real cost to the index pages hierarchy, any only costs up the access to the leaf pages of the index. At that point, scanning or seeking 100% of a covering index is costed the same.
I found from my own experimentation (http://sqlfascination.com/2009/11/07/can-a-covering-nc-index-be-tipped ) using a between clause would cause it to scan, but other where clauses would not - from what I could tell it was purely down to the route through the query engine.
In PostgreSQL, this is usually not a good question to ask because the actual plan selection is more complicated. It depends on table size, memory settings, and other parts of the query. You will usually get a plain index scan only if you are selecting very few rows. Above that, you will get a bitmap index scan up to say 40% selectivity in simple experiments.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With