home › Adviсe › Composite sql index. Indexing in the database. Does the number of duplicates in a key column matter?

Composite sql index. Indexing in the database. Does the number of duplicates in a key column matter?

This material will discuss such database objects Microsoft SQL Server How indices You will learn what indexes are, what types of indexes there are, how to create, optimize, and delete them.

What are indexes in a database?

Index is a database object that is a data structure consisting of keys built from one or more columns of a table or view, and pointers that map to where specified data is stored. Indexes are designed to retrieve rows from a table faster, in other words, indexes provide quick search data in a table, which significantly improves query and application performance. Indexes can also be used to ensure that rows in a table are unique, thereby ensuring data integrity.

Types of Indexes in Microsoft SQL Server

The following types of indexes exist in Microsoft SQL Server:

Clustered (Clustered) is an index that stores table data sorted by index key value. A table can only have one clustered index because the data can only be sorted in one order. If possible, each table should have a clustered index; if a table does not have a clustered index, the table is called " in a bunch" A clustered index is created automatically when you create PRIMARY KEY constraints ( primary key) and UNIQUE if a clustered index on the table has not yet been defined. If you create a clustered index on a table ( heaps) that contains non-clustered indexes, then all of them must be rebuilt after creation.
Non-clustered (Nonclustered) is an index that contains the value of a key and a pointer to a data row containing the value of that key. A table can have multiple nonclustered indexes. Non-clustered indexes can be created on tables with or without a clustered index. It is this type of index that is used to improve the performance of frequently used queries, since non-clustered indexes provide fast search and access to data by key values;
Filterable (Filtered) is an optimized nonclustered index that uses a filter predicate to index a subset of rows in a table. If designed well, this type of index can improve query performance and also reduce index maintenance and storage costs compared to full-table indexes;
Unique (Unique) is an index that ensures that there are no duplicates ( identical) index key values, thereby guaranteeing the uniqueness of the rows according to this key. Both clustered and non-clustered indexes can be unique. If you create a unique index on multiple columns, the index ensures that each combination of values in the key is unique. When you create PRIMARY KEY or UNIQUE constraints, the SQL server automatically creates a unique index on the key columns. A unique index can only be created if the table currently has no duplicate values for key columns;
Columnar (Columnstore) is an index based on columnar data storage technology. This type The index is effective for large data stores because it can increase the performance of queries to the store by up to 10 times and also reduce the data size by up to 10 times, since the data in the Columnstore index is compressed. There are both clustered column indexes and non-clustered ones;
Full text (Full-text) is a special type of index that provides efficient support for complex word lookups on character string data. The process of creating and maintaining a full-text index is called " filling" There are such types of filling as: full filling and filling based on change tracking. By default, SQL Server fully populates a new full-text index immediately after it is created, but this can require a significant amount of resources, depending on the size of the table, so it is possible to delay the full population. Change tracking-based seeding is used to maintain the full-text index after it is initially fully seeded;
Spatial (Spatial) is an index that allows for more effective use specific operations on spatial objects in columns with a geometry or geography data type. This type of index can only be created on a spatial column, and the table on which the spatial index is defined must contain a primary key ( PRIMARY KEY);
XML is another special type of index that is designed for columns of type XML data. The XML index improves the efficiency of processing queries against XML columns. There are two types of XML index: primary and secondary. A primary XML index indexes all tags, values, and paths stored in an XML column. It can only be created if the table has a clustered index on the primary key. A secondary XML index can be created only if the table has a primary XML index and it is used to improve the performance of queries on a certain type of access to the XML column, in this regard, there are several types of secondary indexes: PATH, VALUE and PROPERTY;
There are also special indexes for memory-optimized tables ( In-Memory OLTP) such as: Hash ( Hash) memory-optimized indexes and nonclustered indexes that are created for range scans and ordered scans.

Creating and Deleting Indexes in Microsoft SQL Server

Before you start creating an index, it is necessary to design it well in order to use the index effectively, since poorly designed indexes may not improve performance, but rather reduce it. For example, a large number of indexes on a table reduces the performance of INSERT, UPDATE, DELETE, and MERGE statements because when data in the table changes, all indexes must be updated accordingly. We will look at general recommendations for designing indexes in a separate article, but now let’s move on directly to examining the process of creating and deleting indexes.

Note! My SQL server is Microsoft SQL Server 2016 Express.

Creating Indexes

There are two ways to create indexes in Microsoft SQL Server: the first is using GUI environment SQL Server Management Studio (SSMS), and the second is using the Transact-SQL language, we will analyze both methods.

Source data for examples

Let's imagine that we have a product table called TestTable, which has three columns:

ProductId – product identifier;
ProductName – product name;
CategoryID – product category.

CREATE TABLE TestTable(ProductId INT IDENTITY(1,1) NOT NULL, ProductName VARCHAR(50) NOT NULL, CategoryID INT NULL,) ON

Example of creating a clustered index

As I already said, a clustered index is created automatically if, for example, when creating a table we specify a specific column as primary key (PRIMARY KEY), but since we haven't done that, let's look at an example of creating a clustered index yourself.

To create a clustered index, we can specify a primary key for the table, and thus the clustered index will be created automatically, or we can create a clustered index separately.

For example, let's just create a clustered index, without creating a primary key. First we'll do this using Management Studio.

Open SSMS and find the desired table in the object browser and right-click on the item “ Indexes", select " Create index" and index type, in our case " Clustered».

The form “ New index", where we need to specify the name of the new index ( it must be unique within the table), we also indicate whether this index will be unique; if we are talking about the product identifier in the product table, then, of course, it must be unique. Then select the column ( index key), on the basis of which we will create a clustered index, i.e. the data rows in the table will be sorted using the " Add».

After entering all the necessary parameters, click “ OK", eventually a clustered index will be created.

Similarly, one could create a clustered index using T-SQL statement CREATURE INDEX, for example, like this

CREATE UNIQUE CLUSTERED INDEX IX_Clustered ON TestTable (ProductId ASC) GO

Or, as we already said, we could also use a statement to create a primary key, for example

ALTER TABLE TestTable ADD CONSTRAINT PK_TestTable PRIMARY KEY CLUSTERED (ProductId ASC) GO

Example of creating a nonclustered index with included columns

Now let's look at an example of creating a non-clustered index, in which we will indicate columns that will not be key, but will be included in the index. This is useful in cases where you are creating an index for a specific query, for example, so that the index completely covers the query, i.e. contained all columns ( this is called "Request Coverage"). Query coverage improves performance because the query optimizer can find all the column values in the index without accessing the table data, resulting in fewer disk I/O operations. But remember that including non-key columns in the index entails an increase in the size of the index, i.e. storing the index will require more disk space and may also result in reduced performance for INSERT, UPDATE, DELETE, and MERGE operations on the base table.

In order to create a non-clustered index using the Management Studio GUI, we also find the desired table and the Indexes item, only in this case we select “ Create -> Non-Clustered Index».

After opening the form " New index"we specify the name of the index, add a key column or columns using the button " Add", for example, for our test case, let's specify the CategoryID.

In Transact-SQL it would look like this.

CREATE NONCLUSTERED INDEX IX_NonClustered ON TestTable (CategoryID ASC) INCLUDE (ProductName) GO

Example of deleting an index in Microsoft SQL Server

In order to delete an index, you can right-click on the desired index and click “ Delete", then confirm your actions by clicking " OK».

or you can also use the instructions DROP INDEX, For example

DROP INDEX IX_NonClustered ON TestTable

It should be noted that the DROP INDEX statement does not apply to indexes that were created by creating PRIMARY KEY and UNIQUE constraints. In this case, to drop the index, you must use the ALTER TABLE statement with the DROP CONSTRAINT clause.

Optimizing Indexes in Microsoft SQL Server

As a result of updating, adding or deleting data in SQL tables, the server automatically makes corresponding changes to the indexes, but over time all these changes can cause fragmentation of the data in the index, i.e. they will end up scattered throughout the database. Fragmentation of indexes entails a decrease in query performance, so periodically it is necessary to perform index maintenance operations, namely defragmentation, such as index reorganization and rebuilding operations.

When to use index reorganization and when to rebuild?

To answer this question, you first need to determine the degree of index fragmentation, since depending on the fragmentation of the index, one or another defragmentation method will be preferable and more effective. You can use the system table function to determine the degree of index fragmentation sys.dm_db_index_physical_stats, which returns detailed information about the size and fragmentation of indexes. For example, using the following query, you can find out the degree of index fragmentation for all tables in the current database.

SELECT OBJECT_NAME(T1.object_id) AS NameTable, T1.index_id AS IndexId, T2.name AS IndexName, T1.avg_fragmentation_in_percent AS Fragmentation FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL) AS T1 LEFT JOIN sys. indexes AS T2 ON T1.object_id = T2.object_id AND T1.index_id = T2.index_id

In this case, we are interested in the column avg_fragmentation_in_percent, i.e. percentage of logical fragmentation.

If the degree of fragmentation is less than 5%, then reorganizing or rebuilding the index should not be started at all;
If the degree of fragmentation is from 5 to 30%, then it makes sense to start reorganizing the index, since this operation uses minimal system resources and does not require long-term blocking;
If the degree of fragmentation is more than 30%, then it is necessary to rebuild the index, since this operation, with significant fragmentation, gives a greater effect than the index reorganization operation.

Personally, I can add the following if you have a small company and the database does not require maximum output 24 hours a day, i.e. Since it is not a super active database, you can safely periodically perform the operation of rebuilding indexes, without even determining the degree of fragmentation.

Reorganizing Indexes

Index reorganization is an index defragmentation process that defragments leaf-level clustered and non-clustered indexes on tables and views by physically reordering leaf-level pages according to logical order ( from left to right) end nodes.

You can use either the SSMS graphical tool or the Transact-SQL statement to reorganize the index.

Reorganizing an Index Using Management Studio

Reorganizing an Index Using Transact-SQL

ALTER INDEX IX_NonClustered ON TestTable REORGANIZE GO

Rebuilding indexes

Rebuilding the index is a process that removes the old index and creates a new one, thereby eliminating fragmentation.

You can use two methods to rebuild indexes.

First. Using the ALTER INDEX statement with the REBUILD clause. This statement replaces the DBCC DBREINDEX statement. Typically, this is the method used to rebuild indexes en masse.

Example

ALTER INDEX IX_NonClustered ON TestTable REBUILD GO

And the second, using the CREATE INDEX statement with the DROP_EXISTING clause. Can be used, for example, to rebuild an index by changing its definition, i.e. adding or removing key columns.

Example

CREATE NONCLUSTERED INDEX IX_NonClustered ON TestTable (CategoryID ASC) WITH(DROP_EXISTING = ON) GO

Rebuild functionality is also available in Management Studio. Right click by the required index " Rebuild».

This concludes the material on the basics of indexes in Microsoft SQL Server. If you are interested in the T-SQL language, I recommend reading my book “

1) Concept of index
Index is a means of providing fast access to table rows based on the values of one or more columns.

There is a lot of variety in this operator because it is not standardized, since standards do not address performance issues.

2) Creating indexes
CREATE INDEX
ON()

3) Changing and deleting indexes
To control the activity of the index, the operator is used:
ALTER INDEX
To remove an index, use the operator:
DROP INDEX

a) Table selection rules
1. It is advisable to index tables in which no more than 5% of the rows are selected.
2. Tables that do not have duplicates in the WHERE clause of the SELECT statement should be indexed.
3. It is not practical to index frequently updated tables.
4. It is inappropriate to index tables that occupy no more than 2 pages (for Oracle this is less than 300 rows), since its full scan does not take longer.

b) Column selection rules
1. Primary and foreign keys - often used to join tables, retrieve data, and search. These are always unique indexes with maximum utility
2. When using referential integrity options, you always need an index on the FK.
3. Columns by which data is often sorted and/or grouped.
4. Columns that are frequently searched in the WHERE clause of a SELECT statement.
5. You should not create indexes on long descriptive columns.

c) Principles for creating composite indexes
1. Composite indexes are good if the individual columns have few unique values, but a composite index provides more uniqueness.
2. If all values selected SELECT statement, belong to the composite index, then the values are selected from the index.
3. A compound index should be created if the WHERE clause uses two or more values combined with the AND operator.

d) It is not recommended to create
It is not recommended to create indexes on columns, including composite ones, that:
1. Rarely used for searching, merging, and sorting query results.
2. Contain frequently changing values, which requires frequent updates index slows down database performance.
3. Contain a small number of unique values (less than 10% m/f) or a predominant number of lines with one or two values (the supplier’s city of residence is Moscow).
4. Functions or an expression are applied to them in the WHERE clause, and the index does not work.

e) We must not forget
You should strive to reduce the number of indexes, since a large number of indexes reduces the speed of data updating. Thus, MS SQL Server recommends creating no more than 16 indexes per table.
Typically, indexes are created for query purposes and to maintain referential integrity.
If the index is not used for queries, then it should be deleted and referential integrity should be ensured using triggers.

Designed to improve productivity data search. Tables in a database can have a large number of rows that are stored in random order, and searching for them based on a given criterion by sequentially scanning the table row by row can take a long time. An index is formed from the values of one or more table columns and pointers to corresponding lines table and thus allows you to search for rows that satisfy the search criteria. Speeding up work using indexes is achieved primarily due to the fact that the index has a structure optimized for search - for example, balanced tree.

Sequence of columns in a compound index

The order in which columns are presented in a composite index is important. The fact is that it is possible to obtain a set of data using a query that affects only the first of the indexed columns. However, in most DBMSs it is impossible or inefficient to retrieve data from only the second and further indexed columns (without restrictions on the first column).

For example, imagine a telephone directory sorted first by city, then by last name, and then by first name. If you know the city, you can easily find all the phone numbers of that city. However, in such a directory it will be very laborious to find all the telephone numbers recorded under a certain surname - to do this, you need to look at the section of each city and look for the desired surname there. Some DBMSs do this job, while others simply do not use such an index.

Performance

For optimal query performance, indexes are typically created on table columns that are frequently used in queries. Multiple indexes can be created for one table. However, increasing the number of indexes slows down the operations of adding, updating, and deleting table rows, since the indexes themselves must be updated. In addition, indexes take up additional memory, so before creating an index, you should ensure that the expected benefit in query performance exceeds the additional cost of computer resources to maintain the index.

Restrictions

Indexes are useful for many applications, but there are limitations to their use. Let's take this request SQL :

SELECT first_name FROM people WHERE last_name = "Frankenstein"; .

To run such a query without an index, the DBMS must check the last_name field in each row of the table (this mechanism is known as "brute force" or "brute force" full scan table", in the plan may be displayed with the word NATURAL). When using an index, the DBMS simply traverses the B-tree until it finds the Frankenstein entry. This pass requires much less resources than a full table search.

Now let's take this query:

SELECT email_address FROM customers WHERE email_address LIKE "%@yahoo.com"; .

This query should find all clients whose email ends with @yahoo.com, but even if there is an index on the email_address column, the DBMS will still use a full search of the table. This is because indexes are built on the assumption that words/characters go from left to right. Using a wildcard at the beginning of a search condition prevents the DBMS from using B-tree search. This problem can be solved by creating an additional index using the expression reverse(email_address) and generating a query like:

SELECT email_address FROM customers WHERE reverse(email_address) LIKE reverse("%@yahoo.com"); .

In this case, the wildcard character will be in the rightmost position (moc.oohay@%), which does not exclude the use of the index by reverse(email_address) .

To provide quick access to Oracle DBMS table rows, indexes are used. Indexes provide fast access to data during operations that select a relatively small number of table rows.

Although Oracle allows an unlimited number of indexes on a table, indexes are only useful when they are used to speed up queries. Otherwise, they only take up space and reduce server performance when updating indexed columns. You must use the EXPLAIN PLAN (execution and statistics plan) feature to determine how indexes are used in your queries. Sometimes, if the index is not used by default, you can use query hints to use the index.

Create indexes after inserting table data

Typically, you insert or load data into a table before creating indexes. Otherwise, the overhead of updating indexes will slow down insert or load operations. The only exception to this rule is an index on a cluster key. It can only be created for an empty cluster.

Switch to temporary tablespace to avoid free space issues when creating indexes

When creating an index on a table that already contains data, Oracle requires additional memory for sorting. This uses the sorting memory area allocated to the index creator (the amount allocated for each user is set by the initialization parameter SORT_AREA_SIZE), in addition, the Oracle server must flush and swap information from temporary segments allocated during index creation. If the index is very large, it is recommended to do the following:

Create a new temporary table space using the CREATE TABLESPACE statement.
Specify this new temporary space in the TEMPORARY TABLESPACE parameter of the ALTER USER statement.
Create an index with the CREATE INDEX statement.
Drop this tablespace with the DROP TABLESPACE command. Then use the ALTER USER statement to restore the original tablespace as temporary.

Choose the right tables and columns for indexing

Use the following guidelines to determine when to create an index.

Create an index if you frequently retrieve a relatively small number (less than 15%) of rows from a large table. This percentage is highly dependent on the relative speed of the table scan and how clustered the row data is in the index key. The higher the browsing speed, the lower the percentage; the more clustered the row data is, the higher the percentage.
Index columns that are used in joins to improve the performance of joining multiple tables.
Indexes are created automatically based on the primary and unique keys.
Small tables do not need to be indexed. If you notice that the query execution time has increased significantly, then, most likely, it has become large.

Some columns are clear candidates for indexing. They have the following characteristics:

the values in the column are relatively unique;
large range of values (suitable for regular indexes);
small range of values (suitable for bit indexes);
highly sparse columns (lots of undefined, "empty" values), but queries are mostly about meaningful rows. In this case, a comparison that matches all non-null values is preferable:
WHERE COL_X > -9.99 *power(10, 125) rather than
WHERE COL_X IS NOT NULL This is because the first case uses the COL_X index (assuming the COL_X column is a numeric type).

Limit the number of indexes per table

The more indexes, the higher the overhead when modifying a table. When rows are added or deleted, all indexes on the table are updated. When a column is updated, all indexes in which it participates must also be updated.

In the case of indexes, you need to weigh the performance gains for queries versus the performance penalties for updates. For example, if the table is primarily read-only, you might make extensive use of indexes; but if the table is frequently updated, it is advisable to minimize the use of indexes.

Choose the order of columns in compound indexes

Although columns can be specified in any order in the CREATE INDEX statement, the order of the columns in the CREATE INDEX statement can affect query performance. In general, the columns that will be used most frequently are listed first in the index. You can create a compound index (using multiple columns), which can be used to query all the columns included in the index or just some.

Collect statistics for proper use of indexes

Indexes can be used more effectively if the database collects and maintains statistics about the tables used in queries. You can collect statistics during index creation by specifying keyword COMPUTE STATISTICS in the CREATE INDEX statement. Because data is constantly updated and the distribution of values changes, statistics should be updated periodically using the DBMS_STATS.GATHER_TABLE_STATISTICS and DBMS_STATS.GATHER_SCHEMA_STATISTICS procedures.

Destroy unnecessary indexes

The index is deleted in the following cases:

if using an index does not improve query performance. This situation occurs if the table is too small or if the table has many rows, but few of them are index entries;
if your requests for proposals do not use the index;
if the index is also dropped before it is rebuilt.

When an index is dropped, all extents of the index segment are returned to the table space to which they belong and made available to other objects in that table space. Indexes are deleted automatically when the table for which they were built is destroyed.

Popular in the category: