Transforming your data in Azure SQL Database to columnstore format
We are excited to reveal a public preview of a new feature in Azure SQL Database, both in logical server and Managed Instance, called CLUSTERED COLUMNSTORE ONLINE INDEX build. This operation enables you to migrate your data stored in row-store format to the columnstore format and maintain your columnstore data structures with the minimal downtime on your workload.
Why columnstore format?
Azure SQL Database enables you to fine-tune and optimize data structures and indexes in your database to get the best performance of your queries depending on your workload and size of data. Relational data in Azure SQL Database can be organized in two formats:
- Row-store format, which is an ideal option for OLTP workloads where the queries are accessing individual rows or set of rows in the table. This is the general-purpose table format used for most of the data in relational databases.
- Columnstore format, which is optimized for analytical queries and high compression of data (up to 100x). This format is perfect for the large data sets that can be efficiently compressed using this format and the analytical queries with complex calculations that use subset of the table columns.
In some cases, you might notice that your existing table that is organized in row-store format is not suitable for the queries that are executed on the table. Also, in some cases, you might want to apply high compression in columnstore tables to minimize the size of your table. In that case, you would need to transform your data into column-store format to compress the data and boost the performance of your analytical queries.
Transforming data to columnstore format
You can transform the existing row-store tables into the column-store format by creating CLUSTERED COLUMNSTORE INDEX on a table. Clustered columnstore index will take your original dataset from the table, organize it by columns, and apply efficient high compression algorithms to minimize the size of your data.
A Transact-SQL statement that creates CLUSTERED COLUMNSTORE INDEX on a table and transforms the data to columnstore format is shown in the following example:
CREATE CLUSTERED COLUMNSTORE INDEX cci ON Sales.Orders
Clustered columnstore index created on a table will reorganize data in the table and convert rows into the columns with high compression.
The limitation of this operation is that all incoming transactions that are trying to update the rows in the table that is going to be transformed from row-store to columnstore format must be blocked until the transformation finishes. This is known as offline index build shown on the following picture:
All incoming transactions are blocked while the table is transformed from the row-store to the column-store format. This process might cause some downtime in your workload while the table is transformed so you would need to choose the time when to transform the data to the columnstore format.
Online transition to columnstore format
The latest release of Azure SQL Database enables you to transform your row-store tables to the columnstore format without blocking incoming transactions using the online version of columnstore index build (currently in public preview).
You can use the following T-SQL syntax to transform your row-store table into the columnstore format:
CREATE COLUMNSTORE INDEX cci ON Sales.Orders WITH ( ONLINE = ON )
Create clustered index operation with the option WITH (ONLINE = ON) will take all incoming transactions and continuously include the data changes into the target columnstore data structure while the original data is transformed:
Online clustered columnstore index build enables you to optimize and compress your data with minimal downtime without major blocking operations on the queries that are executing while you are transforming the data.
Besides online transformation from row-store to columnstore format, the following online features in clustered columnstore indexes are available in Azure SQL Database:
- Existing clustered columnstore indexes can be rebuilt in the online mode, meaning that workload that is working with the table don’t need to be blocked while you are performing this index maintenance operation.
- Non-clustered indexes on the tables organized in columnstore format can be rebuilt using the online option, without blocking the workload.
ONLINE CLUSTERED COLUMNSTORE index build operation would help you to perform data transformation and maintenance operations on clustered columnstore indexes with minimal downtime on the incoming workload. This feature is currently in preview in all flavors of Azure SQL Database including logical servers, elastic pools, and Managed Instances.
For more information, please see the columnstore indexes documentation page.
Source: Azure Blog Feed