Understanding SQL Index: The Key to Faster Query Execution

Exploring SQL Index: Basics, Types, Benefits, and Query Optimization

9 min readAug 14, 2023

Magnifying Glass On Book — Image From Pexels

Index is a very crucial topic in SQL. It is also a very huge topic to cover and understand. In this blog, I’ll give you a brief about the Index, like what is it and why do we need it, types of Indexes, how index helps to optimize query performance with some examples. I’ll keep this blog simple and beginner-friendly and won’t go in advance topics. So let's begin.

Indexes are the silent partners in SQL, working behind the scenes to make queries sing.

What is the Index in SQL and need for it?

A data structure that enhances the speed of data retrieval operations.

Confusing right!!! Make it simple with an example of a library.

Let’s say you are in a huge library and you want to find a book with a specific topic like “Machine Learning”. With no help, you’d have to go through every single book in the library, which would take a long time.

In a library, there is a catalog that lists all the books and what they’re about. This catalog acts as an index, by using this catalog you can find the right book much faster. You just need to look up for “Machine Learning” in the catalog, and it tells you which books are related to that topic. So now it is easier for you to find a specific book from the library instead of searching through the entire library.

In a similar way, index works as a catalog for a database table. When there is lots of information in the table, then it would take a long time to find specific information. But if you have an index on the table column, then it’s like making a catalog for that column.

For an example, you have a table of user’s information which stores information like user’s name, age, address, etc. Now you want to find information about user name with “Mike” and you don’t have an index on this column. Then the database would have to go through all the names one by one to find “Mike.” But if we create an index on the “name” column, it’s like making a catalog of all the names in alphabetical order, so now we have index on “name” column and we are searching for “Mike” then it would be easy to find when names are already sorted in alphabetical order.

Image of Table with and without Index — Table with and without Index

So, now you have a basic understanding that what is index, and why we need it. Now let’s jump to type of indexes.

Types of Indexes

There are several types of indexes, I’m not going into too deep, but give a brief about them with syntax regarding Microsoft SQL Server.

Clustered Index
It determines the physical order of records in the table. A table can have only one Clustered Index, and it is often created on the primary key column.
When a table has a clustered index, the data rows are stored on the disk in the same order as the clustered index.

CREATE CLUSTERED INDEX index_name
ON table_name (column1, column2, ...);

Non-Clustered Index
A non-clustered index is a separate data structure from the table and contains a copy of selected columns in a sorted format.
It allows for quick access to data based on the indexed columns without affecting the physical order of the data.
We can create multiple non-clustered indexes on a table.

CREATE NONCLUSTERED INDEX index_name
ON table_name (column1, column2, ...);

Table showing how Clustered and Non-Clustered Index Works — Clustered and Non-Clustered Index

Unique Index
A unique index enforces the uniqueness of values in the column. Often used on columns representing primary keys or other unique identifiers.
It ensures that no two rows in the table can have the same value in the indexed column.

CREATE UNIQUE INDEX index_name
ON table_name (column1, column2, ...);

Filtered Index / Partial Index
When we create an index on a subset of rows of a table or on rows with specified condition then it is called Filtered Index.
In some databases, Filtered Index and Partial Index have the same meaning, but some have slightly different meaning.

CREATE INDEX index_name
ON table_name (column1, column2, ...)
WHERE filter_condition;

Covering Index
When an index includes all the columns required fulfilling a query, called Covering Index.
For example, we have created a non-clustered index on 3 columns, and we are retrieving these only column then this non-clustered column called covering index.
Covering index eliminates the need to access the table.
Column Store Index
This index used when there is a column-store data storage system. Basically, it is used for retrieving and querying large data warehouse tables.
This index uses column-store data storage rather than row-oriented data storage.

CREATE [ CLUSTERED | NONCLUSTERED ] COLUMNSTORE INDEX index_name
ON table_name;

Spatial Index
A spatial index allows you to index a spatial column. A spatial column is a table column that contains data of a spatial data type, similar as figure or terrain, geography.

CREATE SPATIAL INDEX index_name
ON table_name (geometry_column)
USING GEOMETRY_AUTO_GRID;

How does the Index improve query performance?

Now comes the main part, query optimization using indexes. Index helps to optimize the query by improving the speed and efficiency of retrieval data from a table.

Here, I use Microsoft SQL Server and SSMS (SQL Server Management Studio) for query execution. I have used a table with 5,00,000 rows data of people.
(Download Sample CSV)

First turn on the Actual Execution Plan (Press Ctrl + M) to see execution plan of query, so that we can get an idea of how data retrieved.

Showing how to Turn On Execution Plan In SSMS — Execution Plan in SSMS

A query execution plan is the definition of: The sequence in which the source tables are accessed. The methods used to extract data from each table.

Fast Data Retrieval

Indexes help in fast data retrieval by allowing the database engine to quickly find specific rows that match the search criteria of a query. They serve as a road map for the engine, guiding it to the relevant data while decreasing the need for time-consuming full table scans.

Avoid Table Scan

A table scan is scanning the entire table for data that matches a specific condition. While table scans are sometimes necessary, they are inefficient and slow, especially when dealing with large datasets. It decreases performance, increase CPU utilization and disk I/O and memory usage.

Let’s take an example to understand it:

We have people table with columns Id, First_Name, Last_Name, Sex, Email, Date_of_Birth, and Job_Title.

Without an Index on Id:

SELECT * FROM people 
WHERE Id = 1430;

Showing an Execution Plan of query which uses Table Scan. — Execution Plan with Table Scan

It is using Table Scan and the database might need to scan the entire people table to find a specific row.

With Index (Clustered Index on Id):

CREATE CLUSTERED INDEX PK_people ON people(Id);

Showing an Execution Plan of qury which uses Index Seek. — Execution Plan with Index Seek

Now we can see that it is using Index Seek, which is significantly better than the Table scan. Now the database doesn’t have to scan through the entire table, but it can directly access records through the index that we have created.

The subtree cost represents the estimated cost of a plan. The lower the subtree cost, the less computational effort that step is expected to require.

In our case, a subtree cost is 7.51773 when there is no index and 0.0032831 when there is an index.

Reduce Disk I/O Operations

Disk I/O operations have a direct impact on query performance, especially when there is a too much or inefficient amount of disk activity. The process of reading data from or writing data to a storage disk is referred to as disk I/O. Inefficient disk I/O operations can cause slow execution of queries.

Properly designed indexes can reduce the need for full table scans and minimize disk I/O.

Understand it with example, that how can index reduce disk I/O operations.

Execute this query and check I/O statistics:

(First Press F4 or Go to View -> Properties Windows to Open Properties)

SELECT Id, First_Name, Last_Name, Email 
FROM people 
WHERE First_Name LIKE 'Ki%';

For check I/O statistics:

Showing Actual I/O Cost before Index — Actual I/O Cost before Index

Here we can see in Actual I/O Statistics, Actual Logical Reads are 9552.

Now create the following index:

CREATE INDEX IX_first_name ON people (First_Name) 
INCLUDE (Id, Last_Name, Email);

Now run the same query and check I/O statistics:

Showing Actual I/O Cost after Index — Actual I/O Cost after Index

We can see that Actual Logical Reads decrease to 46, which shows that by using index we can reduce I/O reads.

Sorting, Grouping, Joining

Indexes on sorting columns (e.g., ORDER BY) enable the engine to retrieve data in the desired order. Instead of performing a separate sorting step, the engine can efficiently retrieve data by utilizing the index’s sorted structure.

Indexes on columns used in grouping operations (e.g., GROUP BY) help the engine in grouping together similar data. Aggregation and summary calculations speed up as the engine can scan the index in the order of the grouped values.

Indexes help in joining operations by allowing the database engine to quickly locate matching rows between tables, reducing the need for full table scans and improving query performance.

Be Careful

When working with indexes, it’s important to be cautious and make informed decisions to avoid potential pitfalls and ensure optimal database performance.

Be aware of the following important factors:

Don’t Create too many Indexes: Over-indexing a table can increase storage needs, slow data modification operations, and make index maintenance more difficult.
Create indexes only if they are necessary for enhancing query performance.
Maintain Indexes: Indexes should be regularly monitored and maintained to avoid fragmentation, which over time can reduce performance. When indexes fragment, think about rebuilding or reorganizing them.
Index Statistics: Keep index statistics up to date so that the query optimizer can make accurate decisions about execution plans.
Unused Index: Regularly review the usage of indexes. If an index is not being used by any queries, it might be safe to reconsider its necessity.

When to avoid indexes?

While indexes are useful tools for enhancing query performance, there are some circumstances in which they might be better avoided or used sparingly. Here are some situations where you might want to think about avoiding or using indexes sparingly:

Frequently Updated Table: Indexes can cause a significant amount of overhead in tables that frequently receive insert, update, or delete operations. Index maintenance is required for each modification operation, which may slow write performance.
Small Tables: Such tables don’t require an index because the database engine can scan them quickly.
Columns with Low Cardinality: Indexing columns with very few distinct values might not provide significant performance gains and will actually increase storage and maintenance costs.
Batch Data Loads: It’s frequently more efficient to drop indexes before loading large amounts of data in bulk and to recreate them afterward. Index maintenance during data loading can cause a delay.

Conclusion

In a nutshell, indexes quietly revolutionize how we interact with databases and are the unsung heroes of the SQL world. They serve as guides, making it easier to quickly and effectively sort through enormous amounts of data. Indexes give our applications a new level of responsiveness by enhancing sorting, grouping, and join operations, as well as query performance.