Note: For SnapManager best practices, see NetApp Technical Report 4225: Best Practice Guide for Microsoft SQL Server and SnapManager 7.0 for SQL Server with Clustered Data ONTAP or NetApp Technical Report 4232: Best Practice Guide for Microsoft SQL Server and SnapManager 7.0 for SQL Server with Data ONTAP Operating in 7-Mode. NetApp Technical Reports Catalogue: Index TR-4000 to latest. Best Practice Guide for Microsoft SQL Server and SnapManager 7.0 for SQL Server with Data ONTAP Operating in 7-Mode. Microsoft SQL Server and NetApp SnapManager for SQL Server on NetApp Storage Best Practices Guide. Disk Partition Alignment Best Practices for SQL Server. Disk Partition Alignment Best Practices for SQL Server Skip to main content. Download Center. It explains disk partition alignment for storage configured on Windows Server 2003, including analysis, diagnosis, and remediation; and it describes how Windows Server. Maximize the value of all your Microsoft enterprise applications using NetApp® integrated storage solutions. Leverage innovative data management features and integration with Exchange and SQL Server.
APPLIES TO: SQL Server Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse
Poorly designed indexes and a lack of indexes are primary sources of database application bottlenecks. Designing efficient indexes is paramount to achieving good database and application performance. This SQL Server index design guide contains information on index architecture, and best practices to help you design effective indexes to meet the needs of your application.
This guide assumes the reader has a general understanding of the index types available in SQL Server. For a general description of index types, see Index Types.
This guide covers the following types of indexes:
For information about XML indexes, see XML Indexes Overview.
For information about Spatial indexes, see Spatial Indexes Overview.
For information about Full-text indexes, see Populate Full-Text Indexes.
Index Design Basics
An index is an on-disk or in-memory structure associated with a table or view that speeds retrieval of rows from the table or view. An index contains keys built from one or more columns in the table or view. For on-disk indexes, these keys are stored in a structure (B-tree) that enables SQL Server to find the row or rows associated with the key values quickly and efficiently.
An index stores data logically organized as a table with rows and columns, and physically stored in a row-wise data format called rowstore1, or stored in a column-wise data format called columnstore.
The selection of the right indexes for a database and its workload is a complex balancing act between query speed and update cost. Narrow indexes, or indexes with few columns in the index key, require less disk space and maintenance overhead. Wide indexes, on the other hand, cover more queries. You may have to experiment with several different designs before finding the most efficient index. Indexes can be added, modified, and dropped without affecting the database schema or application design. Therefore, you should not hesitate to experiment with different indexes.
The query optimizer in SQL Server reliably chooses the most effective index in the vast majority of cases. Your overall index design strategy should provide a variety of indexes for the query optimizer to choose from and trust it to make the right decision. This reduces analysis time and produces good performance over a variety of situations. To see which indexes the query optimizer uses for a specific query, in SQL Server Management Studio, on the Query menu, select Include Actual Execution Plan.
Do not always equate index usage with good performance, and good performance with efficient index use. If using an index always helped produce the best performance, the job of the query optimizer would be simple. In reality, an incorrect index choice can cause less than optimal performance. Therefore, the task of the query optimizer is to select an index, or combination of indexes, only when it will improve performance, and to avoid indexed retrieval when it will hinder performance.
1 Rowstore has been the traditional way to store relational table data. In SQL Server, rowstore refers to table where the underlying data storage format is a heap, a B-tree (clustered index), or a memory-optimized table.
Index Design Tasks
The follow tasks make up our recommended strategy for designing indexes:
General Index Design Guidelines
Experienced database administrators can design a good set of indexes, but this task is very complex, time-consuming, and error-prone even for moderately complex databases and workloads. Understanding the characteristics of your database, queries, and data columns can help you design optimal indexes.
Database Considerations
When you design an index, consider the following database guidelines:
Query Considerations
When you design an index, consider the following query guidelines:
1 The term SARGable in relational databases refers to a Search ARGument-able predicate that can leverage an index to speed up the execution of the query.
Column Considerations
When you design an index consider the following column guidelines:
Index Characteristics
After you have determined that an index is appropriate for a query, you can select the type of index that best fits your situation. Index characteristics include the following:
You can also customize the initial storage characteristics of the index to optimize its performance or maintenance by setting an option such as FILLFACTOR. Also, you can determine the index storage location by using filegroups or partition schemes to optimize performance.
Index Placement on Filegroups or Partitions Schemes
As you develop your index design strategy, you should consider the placement of the indexes on the filegroups associated with the database. Careful selection of the filegroup or partition scheme can improve query performance.
By default, indexes are stored in the same filegroup as the base table on which the index is created. A nonpartitioned clustered index and the base table always reside in the same filegroup. However, you can do the following:
By creating the nonclustered index on a different filegroup, you can achieve performance gains if the filegroups are using different physical drives with their own controllers. Data and index information can then be read in parallel by the multiple disk heads. For example, if
Table_A on filegroup f1 and Index_A on filegroup f2 are both being used by the same query, performance gains can be achieved because both filegroups are being fully used without contention. However, if Table_A is scanned by the query but Index_A is not referenced, only filegroup f1 is used. This creates no performance gain.
Because you cannot predict what type of access will occur and when it will occur, it could be a better decision to spread your tables and indexes across all filegroups. This would guarantee that all disks are being accessed because all data and indexes are spread evenly across all disks, regardless of which way the data is accessed. This is also a simpler approach for system administrators.
Partitions across multiple Filegroups
You can also consider partitioning clustered and nonclustered indexes across multiple filegroups. Partitioned indexes are partitioned horizontally, or by row, based on a partition function. The partition function defines how each row is mapped to a set of partitions based on the values of certain columns, called partitioning columns. A partition scheme specifies the mapping of the partitions to a set of filegroups.
Partitioning an index can provide the following benefits:
For more information, see Partitioned Tables and Indexes.
Index Sort Order Design Guidelines
When defining indexes, you should consider whether the data for the index key column should be stored in ascending or descending order. Ascending is the default and maintains compatibility with earlier versions of SQL Server. The syntax of the CREATE INDEX, CREATE TABLE, and ALTER TABLE statements supports the keywords ASC (ascending) and DESC (descending) on individual columns in indexes and constraints.
Specifying the order in which key values are stored in an index is useful when queries referencing the table have ORDER BY clauses that specify different directions for the key column or columns in that index. In these cases, the index can remove the need for a SORT operator in the query plan; therefore, this makes the query more efficient. For example, the buyers in the Adventure Works Cycles purchasing department have to evaluate the quality of products they purchase from vendors. The buyers are most interested in finding products sent by these vendors with a high rejection rate. As shown in the following query, retrieving the data to meet this criteria requires the
RejectedQty column in the Purchasing.PurchaseOrderDetail table to be sorted in descending order (large to small) and the ProductID column to be sorted in ascending order (small to large).
The following execution plan for this query shows that the query optimizer used a SORT operator to return the result set in the order specified by the ORDER BY clause.
If an index is created with key columns that match those in the ORDER BY clause in the query, the SORT operator can be eliminated in the query plan and the query plan is more efficient.
After the query is executed again, the following execution plan shows that the SORT operator has been eliminated and the newly created nonclustered index is used.
The Database Engine can move equally efficiently in either direction. An index defined as
(RejectedQty DESC, ProductID ASC) can still be used for a query in which the sort direction of the columns in the ORDER BY clause are reversed. For example, a query with the ORDER BY clause ORDER BY RejectedQty ASC, ProductID DESC can use the index.
Sort order can be specified only for key columns. The sys.index_columns catalog view and the INDEXKEY_PROPERTY function report whether an index column is stored in ascending or descending order.
Metadata
Use these metadata views to see attributes of indexes. More architectural information is embedded in some of these views.
Note
For columnstore indexes, all columns are stored in the metadata as included columns. The columnstore index does not have key columns.
Clustered Index Design Guidelines
Clustered indexes sort and store the data rows in the table based on their key values. There can only be one clustered index per table, because the data rows themselves can only be sorted in one order. With few exceptions, every table should have a clustered index defined on the column, or columns, that offer the following:
If the clustered index is not created with the
UNIQUE property, the Database Engine automatically adds a 4-byte uniqueifier column to the table. When it is required, the Database Engine automatically adds a uniqueifier value to a row to make each key unique. This column and its values are used internally and cannot be seen or accessed by users.
Clustered Index Architecture
In SQL Server, indexes are organized as B-Trees. Each page in an index B-tree is called an index node. The top node of the B-tree is called the root node. The bottom nodes in the index are called the leaf nodes. Any index levels between the root and the leaf nodes are collectively known as intermediate levels. In a clustered index, the leaf nodes contain the data pages of the underlying table. The root and intermediate level nodes contain index pages holding index rows. Each index row contains a key value and a pointer to either an intermediate level page in the B-tree, or a data row in the leaf level of the index. The pages in each level of the index are linked in a doubly-linked list.
Clustered indexes have one row in sys.partitions, with index_id = 1 for each partition used by the index. By default, a clustered index has a single partition. When a clustered index has multiple partitions, each partition has a B-tree structure that contains the data for that specific partition. For example, if a clustered index has four partitions, there are four B-tree structures; one in each partition.
Depending on the data types in the clustered index, each clustered index structure will have one or more allocation units in which to store and manage the data for a specific partition. At a minimum, each clustered index will have one IN_ROW_DATA allocation unit per partition. The clustered index will also have one LOB_DATA allocation unit per partition if it contains large object (LOB) columns. It will also have one ROW_OVERFLOW_DATA allocation unit per partition if it contains variable length columns that exceed the 8,060 byte row size limit.
The pages in the data chain and the rows in them are ordered on the value of the clustered index key. All inserts are made at the point where the key value in the inserted row fits in the ordering sequence among existing rows.
This illustration shows the structure of a clustered index in a single partition.
Query Considerations
Before you create clustered indexes, understand how your data will be accessed. Consider using a clustered index for queries that do the following:
Column Considerations
Generally, you should define the clustered index key with as few columns as possible. Consider columns that have one or more of the following attributes:
Clustered indexes are not a good choice for the following attributes:
Nonclustered Index Design Guidelines
A nonclustered index contains the index key values and row locators that point to the storage location of the table data. You can create multiple nonclustered indexes on a table or indexed view. Generally, nonclustered indexes should be designed to improve the performance of frequently used queries that are not covered by the clustered index.
Similar to the way you use an index in a book, the query optimizer searches for a data value by searching the nonclustered index to find the location of the data value in the table and then retrieves the data directly from that location. This makes nonclustered indexes the optimal choice for exact match queries because the index contains entries describing the exact location in the table of the data values being searched for in the queries. For example, to query the
HumanResources. Employee table for all employees that report to a specific manager, the query optimizer might use the nonclustered index IX_Employee_ManagerID ; this has ManagerID as its key column. The query optimizer can quickly find all entries in the index that match the specified ManagerID . Each index entry points to the exact page and row in the table, or clustered index, in which the corresponding data can be found. After the query optimizer finds all entries in the index, it can go directly to the exact page and row to retrieve the data.
Nonclustered Index Architecture
Nonclustered indexes have the same B-tree structure as clustered indexes, except for the following significant differences:
The row locators in nonclustered index rows are either a pointer to a row or are a clustered index key for a row, as described in the following:
Nonclustered indexes have one row in sys.partitions with index_id > 1 for each partition used by the index. By default, a nonclustered index has a single partition. When a nonclustered index has multiple partitions, each partition has a B-tree structure that contains the index rows for that specific partition. For example, if a nonclustered index has four partitions, there are four B-tree structures, with one in each partition.
Depending on the data types in the nonclustered index, each nonclustered index structure will have one or more allocation units in which to store and manage the data for a specific partition. At a minimum, each nonclustered index will have one IN_ROW_DATA allocation unit per partition that stores the index B-tree pages. The nonclustered index will also have one LOB_DATA allocation unit per partition if it contains large object (LOB) columns. Additionally, it will have one ROW_OVERFLOW_DATA allocation unit per partition if it contains variable length columns that exceed the 8,060 byte row size limit.
The following illustration shows the structure of a nonclustered index in a single partition.
Database Considerations
Consider the characteristics of the database when designing nonclustered indexes.
Query Considerations
Before you create nonclustered indexes, you should understand how your data will be accessed. Consider using a nonclustered index for queries that have the following attributes:
Column Considerations
Consider columns that have one or more of these attributes:
Use Included Columns to Extend Nonclustered Indexes
You can extend the functionality of nonclustered indexes by adding nonkey columns to the leaf level of the nonclustered index. By including nonkey columns, you can create nonclustered indexes that cover more queries. This is because the nonkey columns have the following benefits:
An index with included nonkey columns can significantly improve query performance when all columns in the query are included in the index either as key or nonkey columns. Performance gains are achieved because the query optimizer can locate all the column values within the index; table or clustered index data is not accessed resulting in fewer disk I/O operations.
Note
When an index contains all the columns referenced by the query it is typically referred to as covering the query.
While key columns are stored at all levels of the index, nonkey columns are stored only at the leaf level.
Using Included Columns to Avoid Size Limits
You can include nonkey columns in a nonclustered index to avoid exceeding the current index size limitations of a maximum of 16 key columns and a maximum index key size of 900 bytes. The Database Engine does not consider nonkey columns when calculating the number of index key columns or index key size.
For example, assume that you want to index the following columns in the Document table:
Because the nchar and nvarchar data types require 2 bytes for each character, an index that contains these three columns would exceed the 900 byte size limitation by 10 bytes (455 * 2). By using the
INCLUDE clause of the CREATE INDEX statement, the index key could be defined as (Title, Revision ) and FileName defined as a nonkey column. In this way, the index key size would be 110 bytes (55 * 2), and the index would still contain all the required columns. The following statement creates such an index.
Index with Included Columns Guidelines
When you design nonclustered indexes with included columns consider the following guidelines:
Column Size Guidelines
Column Modification Guidelines
When you modify a table column that has been defined as an included column, the following restrictions apply:
Design Recommendations
Redesign nonclustered indexes with a large index key size so that only columns used for searching and lookups are key columns. Make all other columns that cover the query included nonkey columns. In this way, you will have all columns needed to cover the query, but the index key itself is small and efficient.
For example, assume that you want to design an index to cover the following query.
To cover the query, each column must be defined in the index. Although you could define all columns as key columns, the key size would be 334 bytes. Because the only column actually used as search criteria is the
PostalCode column, having a length of 30 bytes, a better index design would define PostalCode as the key column and include all other columns as nonkey columns.
The following statement creates an index with included columns to cover the query.
Performance Considerations
Avoid adding unnecessary columns. Adding too many index columns, key or nonkey, can have the following performance implications:
You will have to determine whether the gains in query performance outweigh the affect to performance during data modification and in additional disk space requirements.
Unique Index Design Guidelines
A unique index guarantees that the index key contains no duplicate values and therefore every row in the table is in some way unique. Specifying a unique index makes sense only when uniqueness is a characteristic of the data itself. For example, if you want to make sure that the values in the
NationalIDNumber column in the HumanResources.Employee table are unique, when the primary key is EmployeeID , create a UNIQUE constraint on the NationalIDNumber column. If the user tries to enter the same value in that column for more than one employee, an error message is displayed and the duplicate value is not entered.
With multicolumn unique indexes, the index guarantees that each combination of values in the index key is unique. For example, if a unique index is created on a combination of
LastName , FirstName , and MiddleName columns, no two rows in the table could have the same combination of values for these columns.
Both clustered and nonclustered indexes can be unique. Provided that the data in the column is unique, you can create both a unique clustered index and multiple unique nonclustered indexes on the same table.
The benefits of unique indexes include the following:
Creating a PRIMARY KEY or UNIQUE constraint automatically creates a unique index on the specified columns. There are no significant differences between creating a UNIQUE constraint and creating a unique index independent of a constraint. Data validation occurs in the same manner and the query optimizer does not differentiate between a unique index created by a constraint or manually created. However, you should create a UNIQUE or PRIMARY KEY constraint on the column when data integrity is the objective. By doing this the objective of the index will be clear.
Considerations
Filtered Index Design Guidelines
A filtered index is an optimized nonclustered index, especially suited to cover queries that select from a well-defined subset of data. It uses a filter predicate to index a portion of rows in the table. A well-designed filtered index can improve query performance, reduce index maintenance costs, and reduce index storage costs compared with full-table indexes.
Applies to: SQL Server 2008 through SQL Server 2017.
Filtered indexes can provide the following advantages over full-table indexes:
Filtered indexes are useful when columns contain well-defined subsets of data that queries reference in SELECT statements. Examples are:
Reduced maintenance costs for filtered indexes are most noticeable when the number of rows in the index is small compared with a full-table index. If the filtered index includes most of the rows in the table, it could cost more to maintain than a full-table index. In this case, you should use a full-table index instead of a filtered index.
Filtered indexes are defined on one table and only support simple comparison operators. If you need a filter expression that references multiple tables or has complex logic, you should create a view.
Design Considerations
In order to design effective filtered indexes, it is important to understand what queries your application uses and how they relate to subsets of your data. Some examples of data that have well-defined subsets are columns with mostly NULL values, columns with heterogeneous categories of values and columns with distinct ranges of values. The following design considerations give a variety of scenarios for when a filtered index can provide advantages over full-table indexes.
Tip
The nonclustered columnstore index definition supports using a filtered condition. To minimize the performance impact of adding a columnstore index on an OLTP table, use a filtered condition to create a nonclustered columnstore index on only the cold data of your operational workload.
Filtered Indexes for subsets of data
When a column only has a small number of relevant values for queries, you can create a filtered index on the subset of values. For example, when the values in a column are mostly NULL and the query selects only from the non-NULL values, you can create a filtered index for the non-NULL data rows. The resulting index will be smaller and cost less to maintain than a full-table nonclustered index defined on the same key columns.
For example, the
AdventureWorks2012 database has a Production.BillOfMaterials table with 2679 rows. The EndDate column has only 199 rows that contain a non-NULL value and the other 2480 rows contain NULL. The following filtered index would cover queries that return the columns defined in the index and that select only rows with a non-NULL value for EndDate .
The filtered index
FIBillOfMaterialsWithEndDate is valid for the following query. You can display the query execution plan to determine if the query optimizer used the filtered index.
For more information about how to create filtered indexes and how to define the filtered index predicate expression, see Create Filtered Indexes.
Filtered Indexes for heterogeneous data
When a table has heterogeneous data rows, you can create a filtered index for one or more categories of data.
For example, the products listed in the
Production.Product table are each assigned to a ProductSubcategoryID , which are in turn associated with the product categories Bikes, Components, Clothing, or Accessories. These categories are heterogeneous because their column values in the Production.Product table are not closely correlated. For example, the columns Color , ReorderPoint , ListPrice , Weight , Class , and Style have unique characteristics for each product category. Suppose that there are frequent queries for accessories which have subcategories between 27 and 36 inclusive. You can improve the performance of queries for accessories by creating a filtered index on the accessories subcategories as shown in the following example.
The filtered index
FIProductAccessories covers the following query because the query
results are contained in the index and the query plan does not include a base table lookup. For example, the query predicate expression
ProductSubcategoryID = 33 is a subset of the filtered index predicate ProductSubcategoryID >= 27 and ProductSubcategoryID <= 36 , the ProductSubcategoryID and ListPrice columns in the query predicate are both key columns in the index, and name is stored in the leaf level of the index as an included column.
Key Columns
It is a best practice to include a small number of key or included columns in a filtered index definition, and to incorporate only the columns that are necessary for the query optimizer to choose the filtered index for the query execution plan. The query optimizer can choose a filtered index for the query regardless of whether it does or does not cover the query. However, the query optimizer is more likely to choose a filtered index if it covers the query.
In some cases, a filtered index covers the query without including the columns in the filtered index expression as key or included columns in the filtered index definition. The following guidelines explain when a column in the filtered index expression should be a key or included column in the filtered index definition. The examples refer to the filtered index,
FIBillOfMaterialsWithEndDate that was created previously.
A column in the filtered index expression does not need to be a key or included column in the filtered index definition if the filtered index expression is equivalent to the query predicate and the query does not return the column in the filtered index expression with the query results. For example,
FIBillOfMaterialsWithEndDate covers the following query because the query predicate is equivalent to the filter expression, and EndDate is not returned with the query results. FIBillOfMaterialsWithEndDate does not need EndDate as a key or included column in the filtered index definition.
A column in the filtered index expression should be a key or included column in the filtered index definition if the query predicate uses the column in a comparison that is not equivalent to the filtered index expression. For example,
FIBillOfMaterialsWithEndDate is valid for the following query because it selects a subset of rows from the filtered index. However, it does not cover the following query because EndDate is used in the comparison EndDate > '20040101' , which is not equivalent to the filtered index expression. The query processor cannot execute this query without looking up the values of EndDate . Therefore, EndDate should be a key or included column in the filtered index definition.
A column in the filtered index expression should be a key or included column in the filtered index definition if the column is in the query result set. For example,
FIBillOfMaterialsWithEndDate does not cover the following query because it returns the EndDate column in the query results. Therefore, EndDate should be a key or included column in the filtered index definition.
The clustered index key of the table does not need to be a key or included column in the filtered index definition. The clustered index key is automatically included in all nonclustered indexes, including filtered indexes.
Data Conversion Operators in the Filter Predicate
If the comparison operator specified in the filtered index expression of the filtered index results in an implicit or explicit data conversion, an error will occur if the conversion occurs on the left side of a comparison operator. A solution is to write the filtered index expression with the data conversion operator (CAST or CONVERT) on the right side of the comparison operator.
The following example creates a table with a variety of data types.
In the following filtered index definition, column
b is implicitly converted to an integer data type for the purpose of comparing it to the constant 1. This generates error message 10611 because the conversion occurs on the left hand side of the operator in the filtered predicate.
The solution is to convert the constant on the right hand side to be of the same type as column
b , as seen in the following example:
Moving the data conversion from the left side to the right side of a comparison operator might change the meaning of the conversion. In the above example, when the CONVERT operator was added to the right side, the comparison changed from an integer comparison to a varbinary comparison.
Columnstore Index Design Guidelines
A columnstore index is a technology for storing, retrieving and managing data by using a columnar data format, called a columnstore. For more information, refer to Columnstore Indexes overview.
For version information, see Columnstore indexes - What's new.
Columnstore Index Architecture
Knowing these basics will make it easier to understand other columnstore articles that explain how to use them effectively.
Data storage uses columnstore and rowstore compression
When discussing columnstore indexes, we use the terms rowstore and columnstore to emphasize the format for the data storage. Columnstore indexes use both types of storage.
Operations are performed on rowgroups and column segmentsBest Practice Guide For Microsoft Sql Server On Netapp Storage Box
The columnstore index groups rows into manageable units. Each of these units is called a rowgroup. For best performance, the number of rows in a rowgroup is large enough to improve compression rates and small enough to benefit from in-memory operations.
For example, the columnstore index performs these operations on rowgroups:
The deltastore is comprised of one or more rowgroups called delta rowgroups. Each delta rowgroup is a clustered B-tree index that stores small bulk loads and inserts until the rowgroup contains 1,048,576 rows, or until the index is rebuilt. When a delta rowgroup contains 1,048,576 rows it is marked as closed, and waits for a process called the tuple-mover to compress it into the columnstore.
Each column has some of its values in each rowgroup. These values are called column segments. Each rowgroup contains one column segment for every column in the table. Each column has one column segment in each rowgroup.
When the columnstore index compresses a rowgroup, it compresses each column segment separately. To uncompress an entire column, the columnstore index only needs to uncompress one column segment from each rowgroup.
Small loads and inserts go to the deltastore
A columnstore index improves columnstore compression and performance by compressing at least 102,400 rows at a time into the columnstore index. To compress rows in bulk, the columnstore index accumulates small loads and inserts in the deltastore. The deltastore operations are handled behind the scenes. To return the correct query results, the clustered columnstore index combines query results from both the columnstore and the deltastore.
Rows go to the deltastore when they are:
The deltastore also stores a list of IDs for deleted rows that have been marked as deleted but not yet physically deleted from the columnstore.
When delta rowgroups are full they get compressed into the columnstore
Clustered columnstore indexes collect up to 1,048,576 rows in each delta rowgroup before compressing the rowgroup into the columnstore. This improves the compression of the columnstore index. When a delta rowgroup contains 1,048,576 rows, the columnstore index marks the rowgroup as closed. A background process, called the tuple-mover, finds each closed rowgroup and compresses it into the columnstore.
You can force delta rowgroups into the columnstore by using ALTER INDEX to rebuild or reorganize the index. Note that if there is memory pressure during compression, the columnstore index might reduce the number of rows in the compressed rowgroup.
Each table partition has its own rowgroups and delta rowgroups
The concept of partitioning is the same in both a clustered index, a heap, and a columnstore index. Partitioning a table divides the table into smaller groups of rows according to a range of column values. It is often used for managing the data. For example, you could create a partition for each year of data, and then use partition switching to archive data to less expensive storage. Partition switching works on columnstore indexes and makes it easy to move a partition of data to another location.
Rowgroups are always defined within a table partition. When a columnstore index is partitioned, each partition has its own compressed rowgroups and delta rowgroups.
Each partition can have multiple delta rowgroups
Each partition can have more than one delta rowgroups. When the columnstore index needs to add data to a delta rowgroup and the delta rowgroup is locked, the columnstore index will try to obtain a lock on a different delta rowgroup. If there are no delta rowgroups available, the columnstore index will create a new delta rowgroup. For example, a table with 10 partitions could easily have 20 or more delta rowgroups.
![]() You can combine columnstore and rowstore indexes on the same table
A nonclustered index contains a copy of part or all of the rows and columns in the underlying table. The index is defined as one or more columns of the table, and has an optional condition that filters the rows.
![]()
Starting with SQL Server 2016 (13.x), you can create an updatable nonclustered columnstore index on a rowstore table. The columnstore index stores a copy of the data so you do need extra storage. However, the data in the columnstore index will compress to a smaller size than the rowstore table requires. By doing this, you can run analytics on the columnstore index and transactions on the rowstore index at the same time. The column store is updated when data changes in the rowstore table, so both indexes are working against the same data.
Starting with SQL Server 2016 (13.x), you can have one or more nonclustered rowstore indexes on a columnstore index. By doing this, you can perform efficient table seeks on the underlying columnstore. Other options become available too. For example, you can enforce a primary key constraint by using a UNIQUE constraint on the rowstore table. Since an non-unique value will fail to insert into the rowstore table, SQL Server cannot insert the value into the columnstore.
Performance considerations
For more information, refer to Columnstore indexes - Query performance.
Design Guidance
For more information, refer to Columnstore indexes - Design Guidance.
Hash Index Design Guidelines
All memory-optimized tables must have at least one index, because it is the indexes that connect the rows together. On a memory-optimized table, every index is also memory-optimized. Hash indexes are one of the possible index types in a memory-optimized table. For more information, see Indexes for Memory-Optimized Tables.
Applies to: SQL Server 2014 (12.x) through SQL Server 2017.
Hash Index Architecture
A hash index consists of an array of pointers, and each element of the array is called a hash bucket.
The number of buckets must be specified at index definition time:
Tip
To determine the right
BUCKET_COUNT for your data, see Configuring the hash index bucket count.
The hash function is applied to the index key columns and the result of the function determines what bucket that key falls into. Each bucket has a pointer to rows whose hashed key values are mapped to that bucket.
The hashing function used for hash indexes has the following characteristics:
The interplay of the hash index and the buckets is summarized in the following image.
Configuring the hash index bucket count
The hash index bucket count is specified at index create time, and can be changed using the
ALTER TABLE..ALTER INDEX REBUILD syntax.
In most cases the bucket count would ideally be between 1 and 2 times the number of distinct values in the index key.
You may not always be able to predict how many values a particular index key may have, or will have. Performance is usually still good if the BUCKET_COUNT value is within 10 times of the actual number of key values, and overestimating is generally better than underestimating.
Too few buckets has the following drawbacks:
Too many buckets has the following drawbacks:
Note
Adding more buckets does nothing to reduce the chaining together of entries that share a duplicate value. The rate of value duplication is used to decide whether a hash is the appropriate index type, not to calculate the bucket count.
Performance considerations
The performance of a hash index is:
Tip
The predicate must include all columns in the hash index key. The hash index requires a key (to hash) to seek into the index.If an index key consists of two columns and the
WHERE Final fantasy 7 roms. clause only provides the first column, SQL Server does not have a complete key to hash. This will result in an index scan query plan.
If a hash index is used and the number of unique index keys is 100 times (or more) than the row count, consider either increasing to a larger bucket count to avoid large row chains, or use a nonclustered index instead.
Declaration considerations
A hash index can exist only on a memory-optimized table. It cannot exist on a disk-based table.
A hash index can be declared as:
The following is an example of the syntax to create a hash index, outside of the CREATE TABLE statement:
Row versions and garbage collection
In a memory-optimized table, when a row is affected by an
UPDATE , the table creates an updated version of the row. During the update transaction, other sessions might be able to read the older version of the row and thereby avoid the performance slowdown associated with a row lock.
The hash index might also have different versions of its entries to accommodate the update.
Later when the older versions are no longer needed, a garbage collection (GC) thread traverses the buckets and their link lists to clean away old entries. The GC thread performs better if the link list chain lengths are short. For more information, refer to In-Memory OLTP Garbage Collection.
Memory-Optimized Nonclustered Index Design Guidelines
Nonclustered indexes are one of the possible index types in a memory-optimized table. For more information, see Indexes for Memory-Optimized Tables.
Applies to: SQL Server 2014 (12.x) through SQL Server 2017.
In-memory Nonclustered Index Architecture
In-memory nonclustered indexes are implemented using a data structure called a Bw-Tree, originally envisioned and described by Microsoft Research in 2011. A Bw-Tree is a lock and latch-free variation of a B-Tree. For more details please see The Bw-Tree: A B-tree for New Hardware Platforms.
At a very high level the Bw-Tree can be understood as a map of pages organized by page ID (PidMap), a facility to allocate and reuse page IDs (PidAlloc) and a set of pages linked in the page map and to each other. These three high level sub-components make up the basic internal structure of a Bw-Tree.
The structure is similar to a normal B-Tree in the sense that each page has a set of key values that are ordered and there are levels in the index each pointing to a lower level and the leaf levels point to a data row. However there are several differences.
Just like hash indexes, multiple data rows can be linked together (versions). The page pointers between the levels are logical page IDs, which are offsets into a page mapping table, that in turn has the physical address for each page.
There are no in-place updates of index pages. New delta pages are introduced for this purpose.
The key value in each non-leaf level page depicted is the highest value that the child that it points to contains and each row also contains that page logical page ID. On the leaf-level pages, along with the key value, it contains the physical address of the data row.
Point lookups are similar to B-Trees except that because pages are linked in only one direction, the SQL Server Database Engine follows right page pointers, where each non-leaf pages has the highest value of its child, rather than lowest value as in a B-Tree.
If a Leaf-level page has to change, the SQL Server Database Engine does not modify the page itself. Rather, the SQL Server Database Engine creates a delta record that describes the change, and appends it to the previous page. Then it also updates the page map table address for that previous page, to the address of the delta record which now becomes the physical address for this page.
There are three different operations that can be required for managing the structure of a Bw-Tree: consolidation, split and merge.
Delta Consolidation
A long chain of delta records can eventually degrade search performance as it could mean we are traversing long chains when searching through an index. If a new delta record is added to a chain that already has 16 elements, the changes in the delta records will be consolidated into the referenced index page, and the page will then be rebuilt, including the changes indicated by the new delta record that triggered the consolidation. The newly rebuilt page will have the same page ID but a new memory address.
Split page
An index page in Bw-Tree grows on as-needed basis starting from storing a single row to storing a maximum of 8 KB. Once the index page grows to 8 KB, a new insert of a single row will cause the index page to split. For an internal page, this means when there is no more room to add another key value and pointer, and for a leaf page, it means that the row would be too big to fit on the page once all the delta records are incorporated. The statistics information in the page header for a leaf page keeps track of how much space would be required to consolidate the delta records, and that information is adjusted as each new delta record is added.
A Split operation is done in two atomic steps. In the picture below, assume a Leaf-page forces a split because a key with value 5 is being inserted, and a non-leaf page exists pointing to the end of the current Leaf-level page (key value 4).
Step 1: Allocate two new pages P1 and P2, and split the rows from old P1 page onto these new pages, including the newly inserted row. A new slot in Page Mapping Table is used to store the physical address of page P2. These pages, P1 and P2 are not accessible to any concurrent operations yet. In addition, the logical pointer from P1 to P2 is set. Then, in one atomic step update the Page Mapping Table to change the pointer from old P1 to new P1.
Step 2: The non-leaf page points to P1 but there is no direct pointer from a non-leaf page to P2. P2 is only reachable via P1. To create a pointer from a non-leaf page to P2, allocate a new non-leaf page (internal index page), copy all the rows from old non-leaf page, and add a new row to point to P2. Once this is done, in one atomic step, update the Page Mapping Table to change the pointer from old non-leaf page to new non-leaf page.
Merge page
When a
DELETE operation results in a page having less than 10% of the maximum page size (currently 8 KB), or with a single row on it, that page will be merged with a contiguous page.
When a row is deleted from a page, a delta record for the delete is added. Additionally, a check is made to determine if the index page (non-leaf page) qualifies for Merge. This check verifies if the remaining space after deleting the row will be less than 10% of maximum page size. If it does qualify, the Merge is performed in three atomic steps.
In the picture below, assume a
DELETE operation will delete the key value 10.
Step 1: A delta page representing key value 10 (blue triangle) is created and its pointer in the non-leaf page Pp1 is set to the new delta page. Additionally a special merge-delta page (green triangle) is created, and it is linked to point to the delta page. At this stage, both pages (delta page and merge-delta page) are not visible to any concurrent transaction. In one atomic step, the pointer to the Leaf-level page P1 in the Page Mapping Table is updated to point to the merge-delta page. After this step, the entry for key value 10 in Pp1 now points to the merge-delta page.
Step 2: The row representing key value 7 in the non-leaf page Pp1 needs to be removed, and the entry for key value 10 updated to point to P1. To do this, a new non-leaf page Pp2 is allocated and all the rows from Pp1 are copied except for the row representing key value 7; then the row for key value 10 is updated to point to page P1. Once this is done, in one atomic step, the Page Mapping Table entry pointing to Pp1 is updated to point to Pp2. Pp1 is no longer reachable.
Step 3: The Leaf-level pages P2 and P1 are merged and the delta pages removed. To do this, a new page P3 is allocated and the rows from P2 and P1 are merged, and the delta page changes are included in the new P3. Then, in one atomic step, the Page Mapping Table entry pointing to page P1 is updated to point to page P3.
Performance considerations
The performance of a nonclustered index is better than nonclustered hash indexes when querying a memory-optimized table with inequality predicates.
Note
A column in a memory-optimized table can be part of both a hash index and a nonclustered index.
Tip
When a column in a nonclustered index key columns have many duplicate values, performance can degrade for updates, inserts, and deletes.One way to improve performance in this situation is to add another column to the nonclustered index.
Additional Reading
CREATE INDEX (Transact-SQL) -->
ALTER INDEX (Transact-SQL) CREATE XML INDEX (Transact-SQL) CREATE SPATIAL INDEX (Transact-SQL) Reorganize and Rebuild Indexes Improving Performance with SQL Server 2008 Indexed Views Partitioned Tables and Indexes Create a Primary Key Indexes for Memory-Optimized Tables Columnstore Indexes overview Troubleshooting Hash Indexes for Memory-Optimized Tables Memory-Optimized Table Dynamic Management Views (Transact-SQL) Index Related Dynamic Management Views and Functions (Transact-SQL) Indexes on Computed Columns Indexes and ALTER TABLE Adaptive Index Defrag
APPLIES TO: 2013 2016 2019 SharePoint Online
Best practices for backup and restore help make sure that backup and restore operations in SharePoint Server are successful and that the environment is protected against data loss or continuity gaps.
Performance best practices for SharePoint backup and restore operations
Backup and restore operations consume server resources and limit server performance while the operations are running. Follow these recommended practices to help reduce resource usage and increase the performance of servers and the backup or restore task.
Minimize latency between SQL Server and the backup location
In general, it is efficient to back up to a local disk on the database server instead of a network drive. You can then copy the data later to a shared folder on the network. Network drives with 1 millisecond or less latency between them and the database server perform well.
Note
If you cannot back up to local drives, use network drives with similar latency. Because network backups are subject to network errors, verify the backup action after it finishes. For more information, see 'Backing Up to a File on a Network Share' in Backup Devices (SQL Server).
To avoid I/O bottlenecks, perform the main backup to a separate disk from the disk running SQL Servers 2017 RTM, 2016, 2014, 2012, or 2008 R2 with Service Pack 1 (SP1). For more information, see Define a Logical Backup Device for a Disk File (SQL Server).
By design, most backup jobs consume all available I/O resources to complete the job. Therefore, you might see disk queuing, which can result in greater than usual latency for I/O requests. This is typical and should not be considered a problem. For more information, see Monitor Disk Usage.
Avoid processing conflicts
Do not run backup jobs during times when users need access to the system. Typically, systems run 24 hours a day, seven days a week. A best practice is to always run incremental backups to safeguard against server failure. Consider staggering backups so that all databases are not backed up at the same time.
Keep databases small for faster recovery times
Keep databases small to speed both backup and restore. For example, use multiple content databases for a web application instead of one large content database. For more information, see Database types and descriptions in SharePoint Server.
For a graphical overview of the databases that support SharePoint Server 2016, see Quick reference guide: SharePoint Server 2016 databases.
Use incremental backups for large databases
Use incremental backups for large databases because you can make them quickly and maintain performance of the environment. Although you can restore full backups faster than incremental backups, continuous incremental backups minimize data loss. For more information about types of backups, see Backup Overview (SQL Server).
Use compression during backup
In some circumstances, you can use compression to decrease the size of backups and the time to complete each backup. Backup compression was introduced in SQL Server 2008 Enterprise. Backup compression increases CPU usage and this can affect SQL Server concurrent operations.
Important
SharePoint Server supports SQL Server backup compression. SQL Server data compression is not supported for SharePoint Server databases.
For more information about how backup compression affects performance in SQL Server, see Backup Compression (SQL Server).
Follow SQL Server backup and restore optimization recommendations
SQL Server backups use a combination of full, differential, and transaction log backups (for the full or bulk-logged recovery model) to minimize recovery time. Differential database backups are usually faster to create than full database backups and reduce the number of transaction logs required to recover the database.
If you are using the full recovery model, we recommend that you periodically truncate the transaction log files to avoid maintenance issues.
For detailed recommendations about how to optimize SQL Server backup and restore performance, see Optimizing Backup and Restore Performance in SQL Server.
Use RAID 10 if you use RAID
Carefully consider whether to use redundant array of independent disks (RAID) on the device to which you back up data. For example, RAID 5 has slow write performance, approximately the same speed as for a single disk. This is because RAID 5 has to maintain parity information. RAID 10 can provide faster backups because it doesn't need to manage parity. Therefore, it reads and writes data faster. For more information about how to use RAID with backups, see Configure RAID for maximum SQL Server I/O throughput and RAID Levels and SQL Server.
Configure SharePoint settings to improve backup or restore performance
You can only configure file compression and log file settings in PowerShell. You can configure backup and restore threads in both the SharePoint Central Administration website and PowerShell to increase backup or restore efficiency and performance.
If you use the
Export-SPWeb PowerShell cmdlet, you can use the NoFileCompression parameter. By default, SharePoint Server uses file compression while exporting web applications, site collection, lists, or document libraries. You can use this parameter to suppress file compression while exporting and importing. File compression can use up to 30% more resources. However, the exported file uses approximately 25% less disk space. If you use the NoFileCompression parameter when you export, you have to also use it when you import the same content.
You can also use the
NoLogFile parameter. By default, SharePoint Server always creates a log file when you export content. Although you can use this parameter to suppress log file creation to save resources, we recommend that you always create logs. Logs are important for troubleshooting and log creation does not use many resources such as CPU or memory.
When you use the
Backup-SPFarm cmdlet, you can also use the BackupThreads parameter to specify how many threads SharePoint Server will use during the backup process. A higher number of threads will consume more resources during backup. But the overall time to make the backup is decreased. Because each thread is recorded in the log files, the number of threads does affect log file interpretation. By default, three threads are used. The maximum number of available threads is 10.
Note
The backup threads setting is also available through Central Administration on the Default Backup and Restore Settings page in the Backup and Restore section.
Consider site collection size when you determine the tools to use
If the business requires site collection backups in addition to farm-level or database-level backups, choose a backup tool that is based on the size of the site collection.
Quality assurance best practices to back up a SharePoint farm
Follow these best practices to help ensure the quality of the backups of the farm environment and reduce the chances of data loss.
Best Practice Guide For Microsoft Sql Server On Netapp Storage SoftwareEnsure you have enough storage space
Be certain that the system has enough disk space to accommodate the backup. Configure a backup job in Central Administration to verify the required disk space.
Routinely test backup quality
Routinely test backups and validate their consistency. Run practice recovery operations to validate the contents of the backup and to make sure that you can restore the complete environment. To prepare for disaster recovery of geographically dispersed environments, set up a remote farm. Then you can restore the environment by using the database-attach method to upload a copy of the database to the remote farm and redirect users. Periodically perform a trial data recovery action to verify that the process correctly backs up files. A trial restoration can expose hardware problems that do not come up with software verifications and can also to make sure that the recovery time objectives (RTO) are met.
Back up ULS trace logs
The SharePoint Server backup process doesn't back up the Unified Logging Service (ULS) trace logs. Data in ULS trace logs can be useful for performance analysis, troubleshooting, and monitoring compliance with service level agreements. Therefore, protect this data as part of the routine maintenance.
By default, SharePoint log files are at C:Program filesCommon FilesMicrosoft SharedWeb Server Extensions<16 or 15>Logs. The files are named with the server name followed by the date and time stamp. The SharePoint trace logs are created at set intervals and when you use the IISRESET command.
Store a copy of backup files off-site
To safeguard against loss from a natural disaster that destroys the primary data center, maintain duplicate copies of backups in separate locations from the servers. Duplicate copies can help prevent the loss of critical data. As a best practice, keep three copies of the backup media, and keep at least one copy offsite in a controlled environment. This should include all backup and recovery materials, documents, database and transaction log backups, and usage and trace log backups.
Procedural best practices to back up and restore SharePoint Server
Use the following procedural best practices to plan and perform backup and restore operations.
Use FQDN server names
When you refer to servers in a different domain, always use fully qualified domain names (FQDN).
Keep accurate records
When you deploy SharePoint Server, record the accounts that you create, the computer names, passwords, and setup options. Keep this information in a safe and secure location. Possibly, keep multiple records to make sure this information is always available.
Have a recovery environment readyBest Practice Guide For Microsoft Sql Server On Netapp Storage
Use a farm in a secondary location to validate the success of restore operations as part of your disaster recovery strategy. For more information, see Choose a disaster recovery strategy for SharePoint Server. In a disaster recovery situation, you can then restore the environment by using the database-attach method to upload a copy of the database to the remote farm and redirect users. For more information, review and follow the steps in Restore farms in SharePoint Server. Also for a high availability solution, you can set up a standby environment that runs the same version of software as the production environment so that you can restore the databases and recover documents quickly. For more information, see Describing high availability.
Schedule backup operations
Use PowerShell backup and recovery cmdlets to create a script file (*.ps1) and then schedule it to run with Windows Task Scheduler. This makes sure that all backup operations are run at the best time when the system is least busy and users are not accessing it. For more information, see the following:
Use the SQL FILESTREAM provider with BLOB storage
Remote BLOB Storage (RBS) is supported in a SharePoint Server farm. There are both pros and cons associated with using RBS in SharePoint Server. One related limitation of RBS with a SharePoint farm is that System Center Data Protection Manager cannot use the FILESTREAM provider to back up or restore RBS. SharePoint Server supports the FILESTREAM provider for backup and restore operations. A benefit of RBS with a SharePoint farm is that you can use either SharePoint tools or SQL Server tools to back up and restore the content database with the Remote BLOB Store (RBS) defined. This backs up and restores both the RBS and the content database. We do not recommend that you use RBS with other restore methods. For more information about the benefits and limitations of using RBS, see Deciding to use RBS in SharePoint Server. Download Microsoft SQL Server 2014 Feature Packthat includes RBS.
Note
SharePoint Server 2019 supports the FILESTREAM provider that is included with SQL Server 2017. SharePoint Server 2016 supports the FILESTREAM provider that is included with SQL Server 2014. For more information, see Enable and Configure FILESTREAM.
Note
SharePoint Server 2013 supports the FILESTREAM provider that is included in the Microsofté SQL Serveré 2008 R2 Feature Pack. The SQL Server 2012 and SQL Server 2014 installation media includes RBS as an optional add-on component.
See alsoConceptsOther ResourcesComments are closed.
|
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |