Telemetry_xevents, NT SERVICE\SQLTELEMETRY, and XE_LIVE_TARGET_TVF in SQL Server 2016

November 19, 2015, 8:00 am

≫ Next: Does OPTION (RECOMPILE) Prevent Query Store from Saving an Execution Plan?

When kicking the tires of SQL Server 2016 CTP3, I was interested to find a new session defined in the Extended Events folder. Hello, telemetry_xevents!

Why, I don’t remember creating this myself.

Telemetry_xevents Extended Events Session Definition

Scripting it out, this session contains the following events:

CREATE EVENT SESSION [telemetry_xevents] ON SERVER 
ADD EVENT sqlserver.data_masking_ddl_column_definition,
ADD EVENT sqlserver.error_reported(
    WHERE ([severity]>=(20) OR ([error_number]=(18456) OR [error_number]=(17803) OR [error_number]=(701) OR [error_number]=(802) OR [error_number]=(8645) OR [error_number]=(8651) OR [error_number]=(8657) OR [error_number]=(8902) OR [error_number]=(41354) OR [error_number]=(41355) OR [error_number]=(41367) OR [error_number]=(41384) OR [error_number]=(41336) OR [error_number]=(41309) OR [error_number]=(41312) OR [error_number]=(41313)))),
ADD EVENT sqlserver.missing_column_statistics,
ADD EVENT sqlserver.missing_join_predicate,
ADD EVENT sqlserver.server_memory_change,
ADD EVENT sqlserver.server_start_stop,
ADD EVENT sqlserver.stretch_database_disable_completed,
ADD EVENT sqlserver.stretch_database_enable_completed,
ADD EVENT sqlserver.stretch_database_events_submitted,
ADD EVENT sqlserver.stretch_table_codegen_completed,
ADD EVENT sqlserver.stretch_table_remote_creation_completed,
ADD EVENT sqlserver.stretch_table_row_migration_results_event,
ADD EVENT sqlserver.stretch_table_unprovision_completed,
ADD EVENT sqlserver.stretch_table_validation_error,
ADD EVENT sqlserver.temporal_ddl_period_add,
ADD EVENT sqlserver.temporal_ddl_period_drop,
ADD EVENT sqlserver.temporal_ddl_schema_check_fail,
ADD EVENT sqlserver.temporal_ddl_system_versioning,
ADD EVENT sqlserver.temporal_dml_transaction_fail
WITH (MAX_MEMORY=4096 KB, EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS, MAX_DISPATCH_LATENCY=120 SECONDS, MAX_EVENT_SIZE=0 KB, MEMORY_PARTITION_MODE=NONE, TRACK_CAUSALITY=OFF, STARTUP_STATE=ON)
GO

There’s something quite odd about this session. It has no target! The data isn’t being written to memory in the ring buffer or to a file or even a counter.

So I did a little testing. I right clicked the session and selected ‘Watch Live Data’ to see if I could consume the data flowing through in SQL Server Management studio even though it didn’t have a target. And then I ran this in another session:

RAISERROR ('HALLO!',20,1) WITH LOG;
GO

Sure enough, after a little while, my error appeared:

HALLOOOO!

So just because the telemetry_xevents session doesn’t have a target doesn’t mean that the data can’t be consumed.

Meet the NT SERVICE\SQLTELEMETRY Login

When observing the instance using Adam Machanic’s free sp_WhoIsActive procedure, I can see the SQLTELEMETRY login collecting data. It looks like this:

Not exactly a big cpu user.

SQLTELEMETRY is querying sys.fn_MSxe_read_event_stream

Here’s what the query it’s running looks like. This is a documented function, but is intended for internal use.

SELECT type, data FROM sys.fn_MSxe_read_event_stream (@source, @sourceopt)

Should I Worry About the XE_LIVE_TARGET_TVF wait type?

It’s early days, but based on what I’ve seen so far, this wait type looks ignorable. As you can see from the screenshot, this wait is accruing, but that session is using very little resources and is just accessing the documented sys.fn_MSxe_read_event_stream function.

If you are concerned about this query or wait type, it is possible to stop the telemetry_xevents session– but it’s unclear what impact that has at this point since it’s not documented.

Will telemetry_xevents Ship in 2016 RTM?

Stay tuned, we’ll find out later.

↧

Does OPTION (RECOMPILE) Prevent Query Store from Saving an Execution Plan?

November 25, 2015, 8:00 am

≫ Next: SQL Server’s YEAR() Function and Index Performance

≪ Previous: Telemetry_xevents, NT SERVICE\SQLTELEMETRY, and XE_LIVE_TARGET_TVF in SQL Server 2016

Recompile hints have been tough to love in SQL Server for a long time. Sometimes it’s very tempting to use these hints to tell the optimizer to generate a fresh execution plan for a query, but there can be downsides:

This can drive up CPU usage for frequently run queries
This limits the information SQL Server keeps in its execution plan cache and related statistics in sys.dm_exec_query_stats and sys.dm_exec_procedure_stats
We’ve had some alarming bugs where recompile hints can cause incorrect results. (Oops! and Whoops!)
Some queries take a long time to compile (sometimes up to many seconds), and figuring out that this is happening can be extremely tricky when RECOMPILE hints are in place

The new SQL Server 2016 feature, Query Store may help alleviate at least some of these issues. One of my first questions about Query Store was whether recompile hints would have the same limitations as in the execution plan cache, and how easy it might be to see compile duration and information.

Let’s Turn on Query Store

I’m running SQL Server 2016 CTP3. To enable query store, I click on the database properties, and there’s a QueryStore tab to enable the feature. I choose “Read Write” as my new operation mode so that it starts collecting query info and writing it to disk:

Query Store: ACTIVATE!

If you script out the TSQL for that, it looks like this:

USE [master]
GO
ALTER DATABASE [ContosoRetailDW] SET QUERY_STORE = ON
GO
ALTER DATABASE [ContosoRetailDW] 
SET QUERY_STORE (OPERATION_MODE = READ_WRITE, 
CLEANUP_POLICY = (STALE_QUERY_THRESHOLD_DAYS = 367), 
DATA_FLUSH_INTERVAL_SECONDS = 900, 
INTERVAL_LENGTH_MINUTES = 60, 
MAX_STORAGE_SIZE_MB = 100, 
QUERY_CAPTURE_MODE = ALL, 
SIZE_BASED_CLEANUP_MODE = AUTO)
GO

And Now Let’s Test Drive that RECOMPILE Hint

Now that Query Store’s on, I make up a few queries with RECOMPILE hints in them and run them– some once, some multiple times. After a little bit of this, I check out and see what query store has recorded about them:

SELECT 
  qsq.query_id,
  qsq.query_hash,
  qsq.count_compiles,
  qrs.count_executions,
  qsq.avg_compile_duration,
  qsq.last_compile_duration,
  qsq.avg_compile_memory_kb,
  qsq.last_compile_duration,
  qrs.avg_logical_io_reads,
  qrs.last_logical_io_reads,
  qsqt.query_sql_text,
  CAST(qsp.query_plan AS XML) AS mah_query_plan
FROM sys.query_store_query qsq
JOIN sys.query_store_query_text qsqt on qsq.query_text_id=qsqt.query_text_id
JOIN sys.query_store_plan qsp on qsq.query_id=qsp.query_id
JOIN sys.query_store_runtime_stats qrs on qsp.plan_id = qrs.plan_id
WHERE qsqt.query_sql_text like '%recompile%';
GO

Note: I’ve kept it simple here and am looking at all rows in sys.query_store_runtime_stats. That means that if I’ve had query store on for a while and have multiple intervals, I may get multiple rows for the same query. You can add qrs.runtime_stats_interval_id to the query to see that.

Here’s a sample of the results:

query store results for recompile queries

(Click to see the beauty of query store in a larger image)

YAY! For all my queries that were run with RECOMPILE hints, I can see information about how many times they were run, execution stats, their query text and plan, and even information about compilation.

And yes, I have the execution plans, too — the “CAST(qsp.query_plan AS XML) AS mah_query_plan” totally works.

We’re still really early days on the Query Store feature and there’s a lot that we don’t know, but wow am I excited to learn more.

↧

SQL Server’s YEAR() Function and Index Performance

March 1, 2016, 8:00 am

≫ Next: SQL Server 2016 RC0 Fixes Index Usage Stats Bug, But Missing Indexes Still Broken

≪ Previous: Does OPTION (RECOMPILE) Prevent Query Store from Saving an Execution Plan?

SQL Server’s really clever about a lot of things. It’s not super clever about YEAR() when it comes to indexes, even using SQL Server 2016 — but you can either make your TSQL more clever, or work around it with computed columns.

Short on time? Scroll to the bottom of the post for the summary.

The Problem With YEAR()

I’ve created a table named dbo.FirstNameByBirthDate_2005_2009 in the SQLIndexWorkbook database. I’ve taken the history of names by year, and made them into a fake fact table– as if a row was inserted every time a baby was born. The table looks like this:

I want to count the females born in 2006. The most natural way to write this query is:

SELECT
    COUNT(*)
FROM dbo.FirstNameByBirthDate_2005_2009
WHERE 
    Gender = 'F'
    AND YEAR(FakeBirthDateStamp) = 2006
GO

Pretty simple, right?

This looks like it’d be a really great index for the query:

CREATE INDEX ix_women 
    ON dbo.FirstNameByBirthDate_2005_2009
        (Gender, FakeBirthDateStamp);
GO

All rows are sorted in the index by Gender, so we can immediately seek to the ‘F’ rows. The next column is a datetime2 column, and sorting the rows by date will put all the 2006 rows together. That seems seekable as well. Right? Right.

After creating our index, here’s the actual execution plan. At first, it looks like it worked. There’s a seek at the very right of this plan!

But if we hover over that index seek, we can see in the tooltip that there’s a hidden predicate that is NOT a seek predicate. This is a hidden filter. And because this is SQL Server 2016, we can see “Number of Rows Read” — it had to read 9.3 million rows to count 1.9 million rows. It didn’t realize the 2006 rows were together– it checked all the females and examined the FakeBirthDateStamp column for each row.

Solution 1: Rewrite Our TSQL

We can make this better with a simple query change. Let’s explain to the optimizer, in detail, what we mean by 2006, like this:

SELECT
	COUNT(*)
FROM dbo.FirstNameByBirthDate_2005_2009
WHERE 
	Gender = 'F'
	AND FakeBirthDateStamp >= CAST('1/1/2006' AS DATETIME2(0)) 
		and FakeBirthDateStamp < CAST('1/1/2007' AS DATETIME2(0))
GO

Our actual execution plan looks the same from the outer shape. We still have a seek, but the relative cost of it has gone up from 86% to 89%. Hm. Did it get worse?

Hovering over the index seek, the tooltip tells that it got much better. We have two seek predicates, and we only needed to read the rows that we actually counted. Way more efficient!

Solution 2: Add an Indexed Computed Column

What if you can’t change the code? There’s a really cool optimization with computed columns that can help.

First, I’m going to add a column to my table called BirthYear, which uses the YEAR() function, like this:

ALTER TABLE dbo.FirstNameByBirthDate_2005_2009
    ADD BirthYear AS YEAR(FakeBirthDateStamp);
GO

Then I’m going to index BirthYear and Gender:

CREATE INDEX ix_BirthYear on dbo.FirstNameByBirthDate_2005_2009 (BirthYear, Gender);
GO

Now here’s the really cool part of the trick. I don’t have to change my code at all to take advantage of the BirthYear column. I’m going to run the same old query that uses the year function. (Here it is, just to be clear.)

SELECT
    COUNT(*)
FROM dbo.FirstNameByBirthDate_2005_2009
WHERE 
    Gender = 'F'
    AND YEAR(FakeBirthDateStamp) = 2006
GO

SQL Server auto-magically matches YEAR(FakeBirthDateStamp) to my computed column, and figures out it can use the index. It does a beautiful seek, every bit as efficient as if I’d rewritten the code:

NitPicker’s Corner: Disclaimers and Notes

When considering indexed computed columns:

Test to make sure that SET OPTIONS don’t cause inserts/updates/deletes to fail
Test to make sure that DBCC CHECKDB doesn’t become significantly slower after the indexed computed column is created. (This doesn’t happen on every dataset per my testing.)

This issue isn’t specific to the DATETIME2 data type. It still happens with good old DATETIME as well.

My tests were all run against SQL Server 2016 CTP3.3.

TLDR; Just the Facts Please

There’s three main things to remember here:

A seek isn’t always awesome. Look for hidden predicates on the tooltip to the seek, because there may be hidden predicates in there which are NOT seek predicates.
The optimizer isn’t as smart with YEAR() as you might think, so consider other code constructs.
If you can’t rewrite the code and these queries need optimization, test out indexed computed columns to see if they may help.

↧

SQL Server 2016 RC0 Fixes Index Usage Stats Bug, But Missing Indexes Still Broken

March 7, 2016, 11:05 am

≫ Next: Learn Index Tuning at the PASS Summit in 2016!

≪ Previous: SQL Server’s YEAR() Function and Index Performance

Managing indexes got trickier with SQL Server 2012. SQL Server has tracked and reported statistics on how often indexes are used and requested since SQL Server 2005. As of SQL Server 2012, suddenly all that information started getting reset whenever anyone ran ALTER INDEX REBUILD.

Confusingly for users, this only happened with one specific command: REBIULD. ALTER INDEX REORGANIZE has never reset index usage stats or missing index requests.

In this post I’ll cover new changes in behavior in SQL Server 2016 RC0, encourage you to vote for Connect bug #2446044 to fix missing index requests from being reset by REBUILD, and close with a chart describing the behavior of different commands in different versions of SQL Server.

The Bug with sys.dm_db_index_usage_stats : Fixed!

Joe Sack filed Connect bug #739566 back in April 2012 on this issue. That bug was closed and marked as “won’t fix” for a while, but was recently reactivated. (Yay!)

Testing SQL Server 2016 RC0 today, I see that the bug for index_usage_stats has been fixed in this release! When I generate scans, seeks, and updates against an index, running ALTER INDEX REBUILD no longer resets the information. I can still tell which indexes have been used and which have not since the database came online, just like we had in SQL Server 2008 R2 and prior.

The Bug with Missing Index DMVs: Still There

Another problem was introduced in SQL Server 2012 that seems to have slipped by here. Running ALTER INDEX REBUILD against any index on a table clears out all missing index requests that have accrued for the table.

I still see this problem occurring in 2016 RC0. Here’s what it looks like.

First, I run a query that generates missing index requests against the SQLIndexWorkbook database a bunch of times:

SET NOCOUNT ON;
GO
USE SQLIndexWorkbook
GO

DECLARE @garbage INT
SELECT 
    @garbage = NameCount
FROM agg.FirstNameByYear
WHERE  
    FirstNameId = 210;
GO 974

I verify that this generated missing index requests using the following query:

SELECT 
    deets.statement as db_schema_table,
    missin.avg_total_user_cost as [avg_est_query_cost],
    missin.avg_user_impact as [est_%_improvement],
    missin.user_scans,
    missin.user_seeks,
    missin.unique_compiles,
    deets.equality_columns,
    deets.included_columns,
    deets.included_columns
FROM sys.dm_db_missing_index_group_stats as missin
JOIN sys.dm_db_missing_index_groups as groups on missin.group_handle=groups.index_group_handle
JOIN sys.dm_db_missing_index_details as deets on groups.index_handle=deets.index_handle;
GO

Sure enough, it did. Here’s a partial screenshot of the output:

I run the following code to rebuild one index on the table. In this case it’s the clustered primary key:

ALTER INDEX pk_aggFirstNameByYear on agg.FirstNameByYear REBUILD;
GO

After this completes, I get zero results from the missing index query for this table. They have been cleared.

And that’s a big bummer. Having this half fixed is arguably even more confusing.

In SQL Server 2008R2 and prior, index requests were not cleared upon rebuild. That’s much more desirable, as you may well have nightly or weekly index maintenance that kicks in and selectively rebuilds indexes on some tables.

Vote for Connect Bug #2446044 to Let the Product Team Know You’d Like this to Change

I think I know why the issue with missing indexes wasn’t fixed. We forgot to file a bug. Many of us knew the bug on index usage stats had been filed and had been closed for some time, and none of us thought to open a separate bug about the missing index DMVs.

Oops.

I created Connect bug #2446044 for this problem. Please vote for this bug– it just takes a second to create an account if you don’t have one.

Quick Rundown: What Happens In Which Version with Which Commands

Here’s the behavior you should expect to see by the version of SQL Server:

SQL Server Version	sys.dm_db_index_usage_stats	Missing Index DMVs
SQL Server 2005 – SQL Server 2008R2	Reset on database offline/restart	Reset on any index drop/disable/create on that table. Reset on database offline/restart.
SQL Server 2012 – SQL Server 2014	Reset on ALTER INDEX REBUILD of that index. Reset on database offline/restart	Reset on any index drop/disable/create on that table. Reset on database offline/restart. Reset on ALTER INDEX REBUILD of any index on the table.
SQL Server 2016 RC0	Reset on database offline/restart	Reset on any index drop/disable/create on that table. Reset on database offline/restart. Reset on ALTER INDEX REBUILD of any index on the table.

Notes:

This blog post by Pedro Lopes on the SQL Server Engineering team indicates that an upcoming SQL Server 2014 Service Pack will get the index usage stats improvement.
Missing index recommendations have always been cleared for a table whenever you create, drop, or disable an index on it. That makes total sense and is desirable, as the optimizer may make quite different decisions with different indexes available.

Don’t forget to vote!

Attribution: Bug photo by Tanguy Sauvin courtesy unsplash.com

↧

Learn Index Tuning at the PASS Summit in 2016!

June 24, 2016, 8:00 am

≫ Next: Telemetry_xevents, NT SERVICE\SQLTELEMETRY, and XE_LIVE_TARGET_TVF in SQL Server 2016

≪ Previous: SQL Server 2016 RC0 Fixes Index Usage Stats Bug, But Missing Indexes Still Broken

I’m excited to announce that I’ll be giving a pre-conference session on index tuning, plus a general session on locking and blocking at the PASS Summit in Seattle this October! Here’s a description and a video to tell you all about these sessions.

SQL Server Index Formulas: Problems and Solutions

Tuesday, Oct 25, 2016
All day training session – $495
PASS Summit Preconference training day – Register here

Every SQL Server developer must know the essentials of clustered and non-clustered index design to build fast and reliable applications. You need scripts to show when a new non-clustered index, filtered index, or indexed view will improve performance on a live SQL Server workload, and the skills to design the best index.

In this example-packed full-day session, you will work through a series of problems and solutions to learn how to create and tune indexes. You will learn how to select and order key columns, when included columns help, when to use special indexes, and when to drop indexes. You will see how SQL Server 2016’s new Query Store feature offers you new tools for improving your indexes, and how Query Store compares to index dynamic management views.

You’ll leave with all the scripts for the problems and solutions covered in the course, plus a set of additional exercises to keep building your index skills.

Register for this day of learning on the SQLPASS website.

The Great Performance Robbery: Locking Problems and Solutions

Exact date TBD (Oct 26-28, 2016)
75 minutes
PASS Summit general session – Register for the SQLPASS conference here

Your SQL Server is slow, and you suspect blocking. You need to identify the culprit and plan the right solution at the lowest cost. In this session you will work through three common scenarios where blocking problems steal your performance and leave you frustrated. You will learn how to cut through the confusion with the right scripts to prove your case and justify your solution.

I hope to see you in Seattle this fall!

↧

Telemetry_xevents, NT SERVICE\SQLTELEMETRY, and XE_LIVE_TARGET_TVF in SQL Server 2016

November 19, 2015, 8:00 am

≫ Next: Does OPTION (RECOMPILE) Prevent Query Store from Saving an Execution Plan?

≪ Previous: Learn Index Tuning at the PASS Summit in 2016!

When kicking the tires of SQL Server 2016 CTP3, I was interested to find a new session defined in the Extended Events folder. Hello, telemetry_xevents!

Why, I don’t remember creating this myself.

Telemetry_xevents Extended Events Session Definition

Scripting it out, this session contains the following events:

CREATE EVENT SESSION [telemetry_xevents] ON SERVER 
ADD EVENT sqlserver.data_masking_ddl_column_definition,
ADD EVENT sqlserver.error_reported(
    WHERE ([severity]>=(20) OR ([error_number]=(18456) OR [error_number]=(17803) OR [error_number]=(701) OR [error_number]=(802) OR [error_number]=(8645) OR [error_number]=(8651) OR [error_number]=(8657) OR [error_number]=(8902) OR [error_number]=(41354) OR [error_number]=(41355) OR [error_number]=(41367) OR [error_number]=(41384) OR [error_number]=(41336) OR [error_number]=(41309) OR [error_number]=(41312) OR [error_number]=(41313)))),
ADD EVENT sqlserver.missing_column_statistics,
ADD EVENT sqlserver.missing_join_predicate,
ADD EVENT sqlserver.server_memory_change,
ADD EVENT sqlserver.server_start_stop,
ADD EVENT sqlserver.stretch_database_disable_completed,
ADD EVENT sqlserver.stretch_database_enable_completed,
ADD EVENT sqlserver.stretch_database_events_submitted,
ADD EVENT sqlserver.stretch_table_codegen_completed,
ADD EVENT sqlserver.stretch_table_remote_creation_completed,
ADD EVENT sqlserver.stretch_table_row_migration_results_event,
ADD EVENT sqlserver.stretch_table_unprovision_completed,
ADD EVENT sqlserver.stretch_table_validation_error,
ADD EVENT sqlserver.temporal_ddl_period_add,
ADD EVENT sqlserver.temporal_ddl_period_drop,
ADD EVENT sqlserver.temporal_ddl_schema_check_fail,
ADD EVENT sqlserver.temporal_ddl_system_versioning,
ADD EVENT sqlserver.temporal_dml_transaction_fail
WITH (MAX_MEMORY=4096 KB, EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS, MAX_DISPATCH_LATENCY=120 SECONDS, MAX_EVENT_SIZE=0 KB, MEMORY_PARTITION_MODE=NONE, TRACK_CAUSALITY=OFF, STARTUP_STATE=ON)
GO

There’s something quite odd about this session. It has no target! The data isn’t being written to memory in the ring buffer or to a file or even a counter.

RAISERROR ('HALLO!',20,1) WITH LOG;
GO

Sure enough, after a little while, my error appeared:

HALLOOOO!

So just because the telemetry_xevents session doesn’t have a target doesn’t mean that the data can’t be consumed.

Meet the NT SERVICE\SQLTELEMETRY Login

When observing the instance using Adam Machanic’s free sp_WhoIsActive procedure, I can see the SQLTELEMETRY login collecting data. It looks like this:

Not exactly a big cpu user.

SQLTELEMETRY is querying sys.fn_MSxe_read_event_stream

Here’s what the query it’s running looks like. This is a documented function, but is intended for internal use.

SELECT type, data FROM sys.fn_MSxe_read_event_stream (@source, @sourceopt)

Should I Worry About the XE_LIVE_TARGET_TVF wait type?

If you are concerned about this query or wait type, it is possible to stop the telemetry_xevents session– but it’s unclear what impact that has at this point since it’s not documented.

Will telemetry_xevents Ship in 2016 RTM?

Stay tuned, we’ll find out later.

↧

Does OPTION (RECOMPILE) Prevent Query Store from Saving an Execution Plan?

November 25, 2015, 8:00 am

≫ Next: SQL Server’s YEAR() Function and Index Performance

≪ Previous: Telemetry_xevents, NT SERVICE\SQLTELEMETRY, and XE_LIVE_TARGET_TVF in SQL Server 2016

This can drive up CPU usage for frequently run queries
This limits the information SQL Server keeps in its execution plan cache and related statistics in sys.dm_exec_query_stats and sys.dm_exec_procedure_stats
We’ve had some alarming bugs where recompile hints can cause incorrect results. (Oops! and Whoops!)
Some queries take a long time to compile (sometimes up to many seconds), and figuring out that this is happening can be extremely tricky when RECOMPILE hints are in place

Let’s Turn on Query Store

Query Store: ACTIVATE!

If you script out the TSQL for that, it looks like this:

USE [master]
GO
ALTER DATABASE [ContosoRetailDW] SET QUERY_STORE = ON
GO
ALTER DATABASE [ContosoRetailDW] 
SET QUERY_STORE (OPERATION_MODE = READ_WRITE, 
CLEANUP_POLICY = (STALE_QUERY_THRESHOLD_DAYS = 367), 
DATA_FLUSH_INTERVAL_SECONDS = 900, 
INTERVAL_LENGTH_MINUTES = 60, 
MAX_STORAGE_SIZE_MB = 100, 
QUERY_CAPTURE_MODE = ALL, 
SIZE_BASED_CLEANUP_MODE = AUTO)
GO

And Now Let’s Test Drive that RECOMPILE Hint

SELECT 
  qsq.query_id,
  qsq.query_hash,
  qsq.count_compiles,
  qrs.count_executions,
  qsq.avg_compile_duration,
  qsq.last_compile_duration,
  qsq.avg_compile_memory_kb,
  qsq.last_compile_duration,
  qrs.avg_logical_io_reads,
  qrs.last_logical_io_reads,
  qsqt.query_sql_text,
  CAST(qsp.query_plan AS XML) AS mah_query_plan
FROM sys.query_store_query qsq
JOIN sys.query_store_query_text qsqt on qsq.query_text_id=qsqt.query_text_id
JOIN sys.query_store_plan qsp on qsq.query_id=qsp.query_id
JOIN sys.query_store_runtime_stats qrs on qsp.plan_id = qrs.plan_id
WHERE qsqt.query_sql_text like '%recompile%';
GO

Here’s a sample of the results:

(Click to see the beauty of query store in a larger image)

And yes, I have the execution plans, too — the “CAST(qsp.query_plan AS XML) AS mah_query_plan” totally works.

Want to Learn More about Query Store and Recompile?

In this post, I just talked about observing recompile overhead with Query Store. Grant Fritchey has an excellent post that addresses the question: what if you tell Query Store to freeze a plan for a query with a recompile hint? Will you still pay the price of recompile? Read the answer on Grant’s blog here.

↧

SQL Server’s YEAR() Function and Index Performance

March 1, 2016, 8:00 am

≫ Next: What Resets sys.dm_db_index_usage_stats and Missing Index DMVs?

≪ Previous: Does OPTION (RECOMPILE) Prevent Query Store from Saving an Execution Plan?

Short on time? Scroll to the bottom of the post for the summary.

The Problem With YEAR()

I want to count the females born in 2006. The most natural way to write this query is:

SELECT
    COUNT(*)
FROM dbo.FirstNameByBirthDate_2005_2009
WHERE 
    Gender = 'F'
    AND YEAR(FakeBirthDateStamp) = 2006
GO

Pretty simple, right?

This looks like it’d be a really great index for the query:

CREATE INDEX ix_women 
    ON dbo.FirstNameByBirthDate_2005_2009
        (Gender, FakeBirthDateStamp);
GO

After creating our index, here’s the actual execution plan. At first, it looks like it worked. There’s a seek at the very right of this plan!

Solution 1: Rewrite Our TSQL

We can make this better with a simple query change. Let’s explain to the optimizer, in detail, what we mean by 2006, like this:

SELECT
	COUNT(*)
FROM dbo.FirstNameByBirthDate_2005_2009
WHERE 
	Gender = 'F'
	AND FakeBirthDateStamp >= CAST('1/1/2006' AS DATETIME2(0)) 
		and FakeBirthDateStamp < CAST('1/1/2007' AS DATETIME2(0))
GO

Our actual execution plan looks the same from the outer shape. We still have a seek, but the relative cost of it has gone up from 86% to 89%. Hm. Did it get worse?

Hovering over the index seek, the tooltip tells that it got much better. We have two seek predicates, and we only needed to read the rows that we actually counted. Way more efficient!

Solution 2: Add an Indexed Computed Column

What if you can’t change the code? There’s a really cool optimization with computed columns that can help.

First, I’m going to add a column to my table called BirthYear, which uses the YEAR() function, like this:

ALTER TABLE dbo.FirstNameByBirthDate_2005_2009
    ADD BirthYear AS YEAR(FakeBirthDateStamp);
GO

Then I’m going to index BirthYear and Gender:

CREATE INDEX ix_BirthYear on dbo.FirstNameByBirthDate_2005_2009 (BirthYear, Gender);
GO

SELECT
    COUNT(*)
FROM dbo.FirstNameByBirthDate_2005_2009
WHERE 
    Gender = 'F'
    AND YEAR(FakeBirthDateStamp) = 2006
GO

NitPicker’s Corner: Disclaimers and Notes

When considering indexed computed columns:

Test to make sure that SET OPTIONS don’t cause inserts/updates/deletes to fail
Test to make sure that DBCC CHECKDB doesn’t become significantly slower after the indexed computed column is created. (This doesn’t happen on every dataset per my testing.)

This issue isn’t specific to the DATETIME2 data type. It still happens with good old DATETIME as well.

My tests were all run against SQL Server 2016 CTP3.3.

TLDR; Just the Facts Please

There’s three main things to remember here:

A seek isn’t always awesome. Look for hidden predicates on the tooltip to the seek, because there may be hidden predicates in there which are NOT seek predicates.
The optimizer isn’t as smart with YEAR() as you might think, so consider other code constructs.
If you can’t rewrite the code and these queries need optimization, test out indexed computed columns to see if they may help.

↧

What Resets sys.dm_db_index_usage_stats and Missing Index DMVs?

March 7, 2016, 11:05 am

≫ Next: Learn Index Tuning at the PASS Summit in 2016!

≪ Previous: SQL Server’s YEAR() Function and Index Performance

Confusingly for users, this only happened with one specific command: REBUILD. ALTER INDEX REORGANIZE has never reset index usage stats or missing index requests.

The Bug with sys.dm_db_index_usage_stats : Fixed!

Joe Sack filed Connect bug #739566 back in April 2012 on this issue. That bug was closed and marked as “won’t fix” for a while, but was recently reactivated. (Yay!)

The Bug with Missing Index DMVs: Still There

I still see this problem occurring in 2016 RC0. Here’s what it looks like.

First, I run a query that generates missing index requests against the SQLIndexWorkbook database a bunch of times:

SET NOCOUNT ON;
GO
USE SQLIndexWorkbook
GO

DECLARE @garbage INT
SELECT 
    @garbage = NameCount
FROM agg.FirstNameByYear
WHERE  
    FirstNameId = 210;
GO 974

I verify that this generated missing index requests using the following query:

SELECT 
    deets.statement as db_schema_table,
    missin.avg_total_user_cost as [avg_est_query_cost],
    missin.avg_user_impact as [est_%_improvement],
    missin.user_scans,
    missin.user_seeks,
    missin.unique_compiles,
    deets.equality_columns,
    deets.included_columns,
    deets.included_columns
FROM sys.dm_db_missing_index_group_stats as missin
JOIN sys.dm_db_missing_index_groups as groups on missin.group_handle=groups.index_group_handle
JOIN sys.dm_db_missing_index_details as deets on groups.index_handle=deets.index_handle;
GO

Sure enough, it did. Here’s a partial screenshot of the output:

I run the following code to rebuild one index on the table. In this case it’s the clustered primary key:

ALTER INDEX pk_aggFirstNameByYear on agg.FirstNameByYear REBUILD;
GO

After this completes, I get zero results from the missing index query for this table. They have been cleared.

And that’s a big bummer. Having this half fixed is arguably even more confusing.

Vote for Connect Bug #2446044 to Let the Product Team Know You’d Like this to Change

Oops.

I created Connect bug #2446044 for this problem. Please vote for this bug– it just takes a second to create an account if you don’t have one.

Quick Rundown: What Happens In Which Version with Which Commands

Here’s the behavior you should expect to see by the version of SQL Server:

SQL Server Version	sys.dm_db_index_usage_stats	Missing Index DMVs
SQL Server 2005 & SQL Server 2008R2	Reset on database offline/restart.	Reset on any index drop/disable/create on that table. Reset on database offline/restart.
SQL Server 2012	Reset on ALTER INDEX REBUILD of that index until SP2+CU12 or SP3+CU3 . Reset on database offline/restart.	Reset on any index drop/disable/create on that table. Reset on database offline/restart. Reset on ALTER INDEX REBUILD of any index on the table.
SQL Server 2014	Reset on ALTER INDEX REBUILD of that index until SP2 (planned as of 5/18) . Reset on database offline/restart.	Reset on any index drop/disable/create on that table. Reset on database offline/restart. Reset on ALTER INDEX REBUILD of any index on the table.
SQL Server 2016	Reset on database offline/restart.	Reset on any index drop/disable/create on that table. Reset on database offline/restart. Reset on ALTER INDEX REBUILD of any index on the table.

Notes:

This table was updated May 18, 2016 following the release of fixes for SQL Server 2012 in Cumulative Updates. A comment on the Connect Item on sys.dm_db_index_usage_stats says the fix is planned to be released in the upcoming release of SQL Server 2014 SP2.
Missing index recommendations have always been cleared for a table whenever you create, drop, or disable an index on it. That makes total sense and is desirable, as the optimizer may make quite different decisions with different indexes available.

Don’t forget to vote!

Attribution: Bug photo by Tanguy Sauvin courtesy unsplash.com

↧

Learn Index Tuning at the PASS Summit in 2016!

June 24, 2016, 8:00 am

≫ Next: Index Tuning Decision Tree for SQL Server

≪ Previous: What Resets sys.dm_db_index_usage_stats and Missing Index DMVs?

SQL Server Index Formulas: Problems and Solutions

Tuesday, Oct 25, 2016
All day training session – $495
PASS Summit Preconference training day – Register here

You’ll leave with all the scripts for the problems and solutions covered in the course, plus a set of additional exercises to keep building your index skills.

Register for this day of learning on the SQLPASS website.

The Great Performance Robbery: Locking Problems and Solutions

Exact date TBD (Oct 26-28, 2016)
75 minutes
PASS Summit general session – Register for the SQLPASS conference here

I hope to see you in Seattle this fall!

↧

Index Tuning Decision Tree for SQL Server

September 20, 2016, 8:00 am

≫ Next: What’s Adaptive Query Processing? (Dear SQL DBA Episode 21)

≪ Previous: Learn Index Tuning at the PASS Summit in 2016!

I recently mapped out my thought process for how I approach a new instance of SQL Server when it comes to index tuning. It now looks like this:

Highlight: Can I use Query Store?

One of the first things I think about is whether the new 2016 Query Store option is available to collect query runtime statistics and execution plans. Information on query duration, reads, cpu use, and execution plans are so critical to index tuning that I care a ton about this new feature.

And it is a new feature. It’s even had its first big bugfix — if you’re running it on something other than Enterprise or Developer Edition, make sure you’ve tested and installed CU1, which contains this fix for query store cleanup.

I’m a big fan of SQL Server’s plan cache and index management dynamic management views — but I love that Query Store takes away the mystery of wondering what might be missing from the cache, or which missing index requests might have been cleared by an index rebuild.

Observation: Tuning indexes is most effective when you analyze the top execution plans to design your indexes — not the missing index DMVs

When I first began tuning indexes in SQL Server, I largely reviewed and followed missing index suggestions in the missing index DMVs. I learned to combine those suggestions together and with indexes on the tables.

My tuning style has evolved from this, though. SQL Server’s index recommendations are useful, but they’re very rough – sometimes they suggest columns for the includes which you don’t absolutely need. Sometimes they suggest a column as an include that would be better in the key. Sometimes they overestimate the benefit the index would provide. And sometimes you just don’t get a suggestion at all.

It’s not that the Missing Index feature doesn’t work, it’s simply that the missing index feature is not designed to fine-tune an index workload. And that’s totally fair – those index requests are generated during query optimization, and that’s definitely something that we want to be fast!

What I very much prefer these days is to look at the top running queries during the periods I want to tune. I like to examine the execution plans and CPU, reads, and duration for the statements along with the plans.

I do still like to look at missing index suggestions, I just prefer to do it in the context of the plan asking for it.

The reason that I love the whole concept of Query Store is that it’s a built in way to make this a whole lot easier!

↧

What’s Adaptive Query Processing? (Dear SQL DBA Episode 21)

November 3, 2016, 8:35 am

≫ Next: Filtered Indexes: Rowstore vs Nonclustered Columnstore

≪ Previous: Index Tuning Decision Tree for SQL Server

I’m mixing things up a bit in this episode. I want to talk about a question that keynotes and sessions at the SQL PASS Summit got me thinking about last week. Let’s talk about Adaptive Query Processing.

Watch the 24 minute video, scroll down to read the transcript, or subscribe to the podcast.

This post includes a lot of speculation

I’m headed to the Microsoft MVP Summit next week. The cool thing about the MVP Summit is that you get to learn some things that aren’t public yet. The downside is that once you get some secret info, that closes off your ability to speculate a bit… because you don’t want to speculate too close to something that is “secret”.

Everything I’m talking about today was either revealed publicly last week at the Summit, or is speculation on my part (and is pure speculation, I have no privileged insights on this stuff).

I’ll do my best to be completely clear about what’s speculation and what isn’t here.

Keynote focus: predicting the future

Perhaps speculation feels like the right topic today because Microsoft folks talked a lot about the importance of prediction in the keynotes at the PASS Summit last week.

SQL Server 2016 features R Services. This brings the ability to learn patterns and make predictions into the database engine.

Using this new feature came up a lot in the keynote. And not just for performing predictions for a user application, either: there were quite a few references about using SQL Server’s predictive powers to make SQL Server itself smarter.

So what might that mean?

We’re used to SQL Server optimizing a query before it runs

When you execute a query, the SQL Server optimizer has to quickly analyze what all its options are for executing the query with the data structures it has at hand. It uses ‘statistics’ to help it estimate how much data it will get back from each structure.

It has a lot to figure out: what types of joins should it choose? Should it use a single core or multiple cores? How much memory should it allocate for operators that need to do things like sorting and creating hash tables in memory?

It has to figure it out fast. Every microsecond taking in optimizing a query is a microsecond the user is waiting.

Once a query starts executing, SQL Server doesn’t (currently) do “re-optimization”

Once the optimizer chooses a plan, the query goes off to the races. SQL Server doesn’t have the option for it to turn back.

Some of us have wondered for a while if we might get a feature where SQL Server can change a query plan after it starts running if it looks like estimates from statistics weren’t accurate.

Oracle has a feature called “Adaptive Query Optimization” which stretches the optimization process out into the query execution phase. Oracle can start a query with a “default plan.”

I’m no Oracle expert, but here’s how their docs describe Adaptive Query Optimization:

Once the query is running, if it looks like estimates were way off, it can change portions of the plan based on what it’s finding.
It can change joins, parallelism, and even create “dynamic statistics” to get more detailed information where things looked fishy.
Oracle can also use what it learns about rowcounts after a query executes to help it optimize future executions of that query.

I’m not going through this to suggest that SQL Server will implement the same features. But it can be useful to think about what competitors are doing in terms of optimization to open up our view a little when we’re thinking about what’s possible. And of course, SQL Server can go beyond this.

Things have been changing in Azure with automatic index tuning in the SQL Database Advisor

tuning-forks This isn’t your old Database Tuning Advisor. You have a newer option called (similarly) SQL Database Advisor when you use Azure SQL Database.

The SQL Database Advisor in hosted databases can recommend indexes to create and drop, and it’ll note when queries aren’t parameterized or are getting a lot of recompiles to end up with the same plan.

You have the option to tell the SQL Database Advisor to automatically manage indexes. In this case, it’ll not only apply the index changes but watch performance after it makes the change. If things get slower, it’ll revert the change.

How well does this work in practice?

Honestly, I have no idea

But I’m starting to get really curious after the Summit this year, so I’m planning to start exploring this more.

Announced last week: Adaptive Query Processing

I attended a session called “What’s New in Azure SQL Database?” at PASS last week. This was presented by Lindsey Allen and other program manager on the SQL Server Engineering team.

There was a lot of cool stuff discussed in the session, but two bullet points in particular jumped out at me:

Performance insight and auto-tuning
Adaptive query processing

Adaptive query processing is basically a subset of what’s being called “performance intelligence”. We saw a very cool demo video that explained that Adaptive Query Processing is focusing on three things:

Row estimates for “problematic subtrees”
Adjusting memory grants
Fixing join types when needed

How is Adaptive Query Processing going to work?

I have no idea. This is a totally new area, and it was a fast moving session that quickly moved on to other new features.

I got two somewhat conflicting ideas about how this might work, and I’m looking forward to sorting it out in the future.

Count this all as pure speculation, because I may have a very skewed understanding of what I heard at this point.

This might be based on collecting information by observing a workload of queries — say, queries collected in Query Store– and using R Services to find queries where optimization needs to be improved, then giving feedback for future runs of the query.
- Simple example I can think of when it comes to memory grants: if SQL Server always requests a lot more memory than it actually uses for a frequent query, this could be learned and the grant could be reduced. This could help avoid low workspace query situations on very busy systems (aka RESOURCE_SEMAPHORE waits)
This might also involve some dynamic optimization at runtime. One slide I saw was talking about joins, and used the phrase “Defer the join choice until after the first join input has been scanned.”
- That sounds a lot like optimization may be stretching out into the execution of the query, right?
- I also saw the sentence “Materialize estimates for problematic subtrees“, which sounds like getting extra statistics for parts of the plan where estimated rows and actual rows differ. But no idea yet if this could happen on first execution of the query or would be observed across a workload after a bunch of things have run.

Speculation: to optimize a “herd”/ workload of queries, wouldn’t Query Store need wait stats?

If I did understand correctly that Adaptive Query Optimization at least in part requires using data collected from a workload of queries and analyzing it in R, then the 2016 Query Store feature seems like it’d be a big part of the picture. Query Store collects runtime statistics and execution plans.

But to do this well, wouldn’t the analysis also need to know why a query was slow? Perhaps it just couldn’t get started because it was waiting on a lock. That doesn’t necessarily mean it needs to have different joins or its memory grant changed.

This is pure speculation, but if Adaptive Query Processing uses Query Store data, this makes me think we might see Query Store collecting Wait Statistics sometime soon.

Will Adaptive Query Processing be Cloud-Only, or part of “boxed” SQL Server?

The session I was attending was specifically on Azure SQL Database.

I didn’t hear an announcement about whether this feature might be available outside of the cloud. But I also didn’t hear anything that sounded like it would prevent this feature from working in the the “install and manage it yourself” boxed version of SQL Server.

A lot of times we don’t get a clear answer on this until they start to ship previews of new major versions of SQL Server — so treat anything you hear as speculation unless it’s directly from a Microsoft Program Manager.

You can sign up for the preview of Adaptive Query Processing

Check it out yourself! https://aka.ms/AdaptiveQPPreview

Got your own speculations? Or even (gasp) some facts?

Tell me all about it in the comments!

Unless you can’t tell me because of a non disclosure agreement. Then keep it to yourself

↧

Filtered Indexes: Rowstore vs Nonclustered Columnstore

November 10, 2016, 8:30 am

≫ Next: SQL Server 2016 SP1: Features Added to Standard, Web, Express, Local DB Editions

≪ Previous: What’s Adaptive Query Processing? (Dear SQL DBA Episode 21)

filters-are-delicious SQL Server has two types of filtered indexes:

The “classic” filtered nonclustered rowstore index, introduced in SQL Server 2008, available in all editions
The newfangled filtered nonclustered columnstore index, introduced in SQL Server 2016, available in Enterprise Edition

These two filtered indexes are very different – and the SQL Server optimizer can use them very differently!

While classic filtered nonclustered rowstore indexes must reliably “cover” parts of the query to be used to the optimizer, filtered nonclustered columnstore indexes may be combined with other indexes to produce a plan returning a larger range of data.

This sounds a little weird. I’ll show you what I mean using the WideWorldImporters database.

Filtered nonclustered rowstore indexes (“Filtered Indexes”)

A filtered index is a nonclustered rowstore index with a “where” clause. This index contains only rows from Sales.Invoices which were last edited before February 1, 2013:

CREATE INDEX ix_nc_filter_LT
on Sales.Invoices (LastEditedWhen)
    INCLUDE (CustomerID)
    WHERE (LastEditedWhen < '2013-02-01');
GO

SQL Server may use this index if I run it for a query that also specifies LastEditedWhen < ‘2013-02-01’ (although possibly not if my query is parameterized and may be used for a date outside this range).

What if I am querying all CustomerIds, and I force SQL Server to use this index with a hint?

SELECT CustomerID
FROM Sales.Invoices WITH (INDEX (ix_nc_filter_LT));
GO

SQL Server could potentially pick up some of the rows from the filtered index, then find the rest of the rows in the base table and combine them. It’d be expensive, but it’s theoretically possible.

However, SQL Server can’t. Instead, I get this error:

Msg 8622, Level 16, State 1, Line 21
Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.

My query wants a larger range of data than is in my filtered rowstore index, and SQL Server won’t use the filtered rowstore index and then go find the rest of the data in another index. The optimizer just isn’t written to do this.

We’ve been used to this for years with filtered indexes. But filtered nonclustered columnstore indexes behave differently!

Filtered nonclustered columnstore indexes (“Filtered NCCI”)

Let’s create a filtered nonclustered columnstore index on the same table:

CREATE NONCLUSTERED COLUMNSTORE INDEX ix_ncci_filter_LT
on Sales.Invoices (LastEditedWhen, CustomerID)
    WHERE (LastEditedWhen < '2013-02-01');
GO

I’m not saying this little demo table needs a nonclustered columnstore, I’m just reusing it for simplicity.

Now, I force SQL Server to use this index to get all the CustomerIDs:

SELECT CustomerID
FROM Sales.Invoices WITH (INDEX (ix_ncci_filter_LT));
GO

This time, the query doesn’t fail to get a plan! I get a plan, and I get all the 70,510 rows back. The execution plan looks like this:

filterednonclusteredcolumnstore-combined

This isn’t an awesome plan. SQL Server scanned the columnstore index, then scanned the clustered index of the table to find the rows that weren’t in the columnstore index, then combined them. It did this because I forced it to use the columnstore hint.

But SQL Server can make this plan. SQL was able to do this because it understands the filter in the nonclustered columnstore index. Hover over the clustered index scan in the plan, and you can see it figured out how to find the rest of the data in the clustered index:

filterednonclusteredcolumnstore-combined-predicate

Why are filtered nonclustered columnstore indexes smarter?

Columnstore indexes shine when it comes to scanning and aggregating lots of data. While nonclustered columnstore indexes are updatable in SQL Server 2016, it’s expensive to maintain them.

SQL Server is smarter about optimizing plans with filtered nonclustered columnstore indexes so you can design your filter so that “cold” data which is unlikely to be modified is in the columnstore index. This makes it cheaper to maintain. The optimizer has the ability to use the filtered NCCI and combine it with other indexes behind the scenes.

You do want to be careful with your filter and make sure that it doesn’t have to do a clustered index scan every time it’s going to do this trick, of course!

Read more about this feature on the SQL Server database engine blog in Sunil Agarwal’s post, “Real-Time Operational Analytics: Filtered nonclustered columnstore index (NCCI).”

↧

SQL Server 2016 SP1: Features Added to Standard, Web, Express, Local DB Editions

November 16, 2016, 7:30 am

≫ Next: Should I Upgrade to SQL Server 2016? (Dear SQL DBA Episode 22)

≪ Previous: Filtered Indexes: Rowstore vs Nonclustered Columnstore

that-time-sql-server-gave-you-a-pony Wouldn’t it be awesome if you could develop your application for a small SQL Server using the same features that you intend to use for scale?

And wouldn’t it be more awesome if you could start using bleeding edge features like In-Memory OLTP on some of your less-risky, smaller databases first?

In the past, cost inhibited feature adoption

This has been tough in the past, because SQL Server limits which Edition can use specific programmability features. Enterprise Edition gets all the goodies, and Enterprise costs more:

Enterprise Edition is ~$7,000 USD/core (minimum 4 cores per socket, sold in 2 core packs)
Standard Edition is ~$1,800 USD/core (minimum 4 cores per socket, sold in 2 core packs / also available for purchase by Client Access License)

With these prices, developers cant’t usually afford to use Enterprise Edition features on new, small projects. And they were forced to introduce cutting edge EE features into the most critical databases first– because the least critical databases don’t get those licensing dollars. Software vendors often have to maintain two versions of their SQL Server codebase: one for Standard Edition customers, and one for Enterprise Edition customers.

That changes today.

Today is the Microsoft Connect() online developer conference. It’s also the day that SQL Server 2016 Service Pack 1 is dropping. SP1 includes some amazing licensing changes.

Download SQL Server 2016 SP 1 here.

Data Management Features added to Standard, Express, Web, and (mostly) Local DB Editions starting with SQL Server 2016 SP1

Developers will now be able to use far more features in production, whether or not they’re using Enterprise Edition. These features are now available in “lower” editions.

I’ve noted usage “caps” as I understand them from initial information:

Table Partitioning
Data Compression
Columnstore Indexes
- Standard Edition limits DOP to 2
- Web / Express limit DOP to 1.
- Memory limited to 25% Buffer Pool limit for each non-EE edition (so you get 32GB for Standard Edition)
In-Memory OLTP
- Memory limited to 25% Buffer Pool limit for each non-EE edition (so you get 32GB for Standard Edition)
- Not in Local DB
Distributed Partitioned Views (writeable)
Multiple Filestream Containers
Change Data Capture
- Not in Express or Local DB, as this requires SQL Server Agent, which is not present there
Database Snapshots
PolyBase
- Exception: not in Local DB

Security Features added to Standard, Express, Web, and (mostly) Local DB Editions starting with SQL Server 2016 SP1

At this time I don’t know of any limitations on these features, or differences from Enterprise. It’s very cool to see SQL Server making enterprise grade security features available to everyone, in every edition.

Auditing (some)
Always Encrypted
Row-Level Security
- Has been available in Standard in 2016, now in Web, Express, Local DB
Dynamic Data Masking
- Has been available in Standard in 2016, now in Web, Express, Local DB

Did the price for Standard Edition go up?

Nope. No pricing changes. Just wider feature availability!

Will people still need to buy Enterprise Edition?

Yes. Scalability limits for edition aren’t changing. Enterprise still gets you more memory and more CPUs to scale your workload.

Fine Print: not all features have licensing changes

The clever reader may have noticed that I haven’t mentioned AlwaysOn Availability Groups, Transparent Data Encryption, Database Mirroring, or Peer-to-Peer Replication. No licensing changes have been announced for those features, or for services like SSAS or SSRS.

This is a huge licensing improvement, and it’s targeted at how you architect your schema and your codebase, and how you secure your data. But it doesn’t touch everything.

Don’t worry, Developer Edition still acts like Enterprise

Note that these changes don’t apply to Developer Edition. It continues to have all the features of Enterprise Edition, and it’s free (aww yeah), but it’s only for non-production environments.

Seems like a great time to upgrade to 2016, doesn’t it?

Traditionally, managers like to wait until Service Pack 1 comes out before they get serious about upgrading to a new SQL Server version. You’ve got an especially good reason to talk about upgrades with this one!

PS: you try out SQL Server v.Next on Windows and Linux, too

There’s so much cool news today that I’m weirdly including this as a final note. The first preview of SQL Server v.Next is dropping today. You can run it on Windows. You can run it on Linux. You can run it in a Docker container on a Mac.

Download v.Next CTP1 it here.

Wowzers.

↧

Should I Upgrade to SQL Server 2016? (Dear SQL DBA Episode 22)

November 17, 2016, 9:29 am

≫ Next: Does Truncate Table Reset Statistics?

≪ Previous: SQL Server 2016 SP1: Features Added to Standard, Web, Express, Local DB Editions

It’s a big week for SQL Server! And it’s the perfect week to talk about this week’s question, which is about explaining to your management why it’s worth it to upgrade to SQL Server 2016, and which features you can use right away.

Watch the 24 minute video, scroll down to read the article, or subscribe to the podcast.

Three different people asked me a variation of this question recently at the SQL PASS Summit:

Dear SQL DBA,

We recently got budget to upgrade our SQL Server. My manager wants us to go with SQL Server 2014, because 2016 is so new. How can I convince management that 2016 is better?

Lucky DBA

As of yesterday, you have more reasons than ever to go with SQL Server 2016.

SQL Server 2016 Service Pack 1 was released, and not only do Service Packs give managers warm fuzzy feelings, this one was full of goodies.

SQL Server 2016 Service Pack 1 adds more features to Standard Edition (plus Web, Express, and Local DB)

I know most of y’all care about Standard Edition.

With SP1, you can use all sorts of (formerly) super-expensive features like data compression, partitioning, Columnstore, Change Data Capture, Polybase, and more in Standard Edition.

A few features have scalability caps in “lower” editions. There’s a memory limit for In-Memory OLTP (aka Hekaton), and Columnstore. Columnstore also has restrictions on parallelism.

This isn’t every feature. Transparent Data Encryption and licensing for high-availability features hasn’t changed. Memory and CPU limits for editions haven’t changed either: Enterprise Edition is still needed for scalability and HA/DR.

But overall, you get way more bang for your licensing buck in Standard Edition in SQL Server 2016 SP1.

Or, in the case of Express Edition, you get more for free.

Read the list of features by edition here: https://www.microsoft.com/en-us/sql-server/sql-server-editions

Quick DBA wins to start using in Standard Edition

Your management may ask, “which of these features can you use without code changes? And how much will they help?”

That’s a great question.

Database snapshots can be immediately useful as part of your release process. You take a database snapshot before deploying a change, use it for verification, if needed, and drop it after the change is done. You incur write overhead for modifications used while the snapshot exists, but it doesn’t copy the whole database.
- Warning: be careful jumping on the bandwagon of reverting to snapshots – that can have some gotchas. See Paul Randal’s post and this Connect Item.
Data compression can make better use of your storage and memory.
- Fine print: don’t just compress all your indexes. You need to determine which tables might benefit, and which type of compression to use. There are CPU tradeoffs. Start with the Microsoft Whitepaper on Data Compression.
Table Partitioning – maybe. Technically, you can implement table partitioning without a bunch of code changes. But in practice, you frequently do need to tune or rewrite some queries because of quirks as to how things like TOP or GROUP BY might behave differently. For tables that are sensitive to read performance, it usually takes both developers and DBAs to implement partitioning. However, you may have some tables which are easier wins– like a table you almost exclusively write to for logging purposes, where table partitioning could be useful for truncating partitions instead of running deletes.
- Disclaimer: I still very much claim that table partitioning is NOT a feature for query performance – it’s a tool for managing data. Watch a free video on the details here.

Security can be pretty persuasive

When it comes to Standard Edition, the biggest change with 2016 SP1 is that Always Encrypted is now available.

If your database stores anything like credit card numbers or social security numbers, you want this. You REALLY want this if you’re the DBA, because your life is simpler if there’s no way you could exploit that data. Think of this as a really nice protective vault for your data kryptonite.

Note: there are other security features in 2016 that may also be attractive, like Dynamic Data Masking. They were available in Standard Edition as of RTM, but now are available in Web, Express, and Local DB.

Fine print: Transparent Data Encryption (TDE) is still an Enterprise Edition feature.

How I’d start talking about In-Memory OLTP

Your developers may ask, “does this mean we can start using that Hekaton thing everywhere?”

Well, you can, but you might break a few things.

In-Memory OLTP, aka Hekaton, has a memory cap in the “lower” editions of 25% of the Buffer Pool Memory Limit. For Standard Edition, you’ll be limited to 32GB of memory for In-Memory tables, and when you run out of memory, things get real dicy.

So you’ve got to be careful. But you’ve got to be careful even in Enterprise Edition, too– because your hardware doesn’t have an infinite amount of memory. Even in EE, you have to learn to set up monitoring of how much memory is being used by In-Memory tables, alert when it’s getting low, and learn how to adjust it.

Having In-Memory tables in Standard Edition gives you a much better place to learn.

I would look for less critical applications that might have a good use case for In-Memory tables. Applications where it’s not a disaster if you have an outage. You need something that has a lot of writes to make it worth your while for the experiment.

Logging databases where you don’t have to retain a ton of history come to mind– I’ve worked with a bunch of apps that can flip verbose logging to a database on and off, and the app is designed to keep going even if the logging database goes off the farm.

Essentially, you now have room to be creative and cautious with this feature. That’s gold.

What about Columnstore?

For Columnstore, there’s the same memory limit as for In-Memory tables: 25% of the Buffer Pool Limit for that Edition. Plus, Standard Edition is limited to 2 cores for parallel queries, Web and Express just get a single thread.

This is a bit easier to play with as writes don’t stop when you run out of memory (unless it’s Clustered Columnstore on In-Memory OLTP). Reading Columnstore data from disk just isn’t as fast as it is to read it from Memory.

You also have to balance out making sure the overheads of writes to a Columnstore index aren’t slowing you down (and that you can monitor this), and that you’re maintaining the index properly.

For Columnstore, look for relatively narrow tables that have many millions of rows. I say narrow because you’ve got limited memory and parallelism to burn for this. And Columnstore shines when you’ve got many millions of rows to compress.

It’s really not unusual to have many millions of rows in OLTP tables anymore, and to have a diverse amount of queries hitting them. OLTP tables are often narrow as well, so even with the limits, I see this as a big deal for Standard Edition to get this feature.

SQL Server 2016 has Query Store (all editions)

This hasn’t changed, but it was already awesome.

Query Store gives you a way to track your query plans along with metrics on query resource usage and execution time. So you can do things like see if adding a filtered nonclustered Columnstore index made things faster… or oops, if it made things slower.

Query Store is a fantastic feature for DBAs and Developers. It helps DBAs and Developers work better together, because they aren’t trying to save as many giant XML strings for query text and plans in spreadsheets anymore.

Before yesterday, it was one of my biggest reasons to argue that a SQL Server 2016 upgrade is far more attractive, even for Standard Edition. Now it’s just another item on the list.

You’ve still gotta test it

Things can go wrong in service packs. You’ve got to test any upgrade, even if it’s full of goodies.

2016 isn’t the new kid anymore. Meet SQL Server v.Next CTP 1

This train is picking up speed.

SQL Server v.Next CTP1 is now available for download on Windows, Linux, and even Mac (via Docker containers).

Check it out online here: https://www.microsoft.com/en-us/evalcenter/evaluate-sql-server-vnext-ctp

So it’s not like 2016 is the “new version” anymore, anyway.

↧

Does Truncate Table Reset Statistics?

December 8, 2016, 8:22 am

≫ Next: Which Filegroup is that Partition Using? How Many Rows Does It Have?

≪ Previous: Should I Upgrade to SQL Server 2016? (Dear SQL DBA Episode 22)

Short answer: the SQL Server optimizer will know that the table was truncated, but statistics might not update when you expect.

For the long answer, let’s walk through an example using the WideWorldImporters sample database. I’ll be using Trace Flag 3604 and 2363 to get SQL Server to print information about how it optimized my query out to the messages tab. (Thanks to Paul White for blogging about this trace flag.)

First, a fresh restore of WideWorldImporters

USE master;
GO

IF DB_ID('WideWorldImporters') IS NOT NULL
ALTER DATABASE WideWorldImporters SET OFFLINE WITH ROLLBACK IMMEDIATE

RESTORE DATABASE WideWorldImporters FROM DISK=
    'S:\MSSQL\Backup\WideWorldImporters-Full.bak'
    WITH REPLACE
GO

USE WideWorldImporters;
GO

Before we do anything, what do the statistics look like on Sales.OrderLines?

Here’s the query that I’m using to inspect the statistics:

SELECT 
    sp.last_updated,
    stat.name as stats_name,
    STUFF((SELECT ', ' + cols.name
        FROM sys.stats_columns AS statcols
        JOIN sys.columns AS cols ON
            statcols.column_id=cols.column_id
            AND statcols.object_id=cols.object_id
        WHERE statcols.stats_id = stat.stats_id and
            statcols.object_id=stat.object_id
        ORDER BY statcols.stats_column_id
        FOR XML PATH(''), TYPE
    ).value('.', 'NVARCHAR(MAX)'), 1, 2, '')  as stat_cols,
    sp.modification_counter,
    sp.rows,
    sp.rows_sampled
FROM sys.stats as stat
CROSS APPLY sys.dm_db_stats_properties (stat.object_id, stat.stats_id) AS sp
JOIN sys.objects as so on 
    stat.object_id=so.object_id
JOIN sys.schemas as sc on
    so.schema_id=sc.schema_id
WHERE 
    sc.name= 'Sales'
    and so.name='OrderLines'
ORDER BY 1 DESC
GO

Statistics were last updated on June 2, 2016. We’ll be mostly looking at the statistic on Quantity throughout the example, so I’ve highlighted it:

Let’s run a query that loads the statistic on Quantity

Before we truncate the table, let’s take a peek into how SQL Server optimizes a query that cares about rows in Sales.OrderLines with Quantity > 10. I’m using trace flags 3604 and 2363 to make SQL Server print information about how it used statistics to optimize this to my messages tab.

SELECT *
FROM Sales.OrderLines
WHERE Quantity > 10
    OPTION
(
    QUERYTRACEON 3604,
    QUERYTRACEON 2363,
    RECOMPILE
)
GO

Here’s the info on the messages tab:

Begin selectivity computation

Input tree:

  LogOp_Select

      CStCollBaseTable(ID=1, CARD=231412 TBL: Sales.OrderLines)

      ScaOp_Comp x_cmpGt

          ScaOp_Identifier QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity

          ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=10)

Plan for computation:

  CSelCalcColumnInInterval

      Column: QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity

Loaded histogram for column QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity from stats with id 7

Selectivity: 0.44231

Stats collection generated: 

  CStCollFilter(ID=2, CARD=102356)

      CStCollBaseTable(ID=1, CARD=231412 TBL: Sales.OrderLines)

End selectivity computation

Estimating distinct count in utility function

Input stats collection:

    CStCollFilter(ID=2, CARD=102356)

        CStCollBaseTable(ID=1, CARD=231412 TBL: Sales.OrderLines)

Columns to distinct on:QCOL: [WideWorldImporters].[Sales].[OrderLines].OrderLineID


Plan for computation:

  CDVCPlanUniqueKey

Result of computation: 102356


(102035 row(s) affected)

Highlights: one of the first thing SQL thinks about is the number of rows in the table

Right at the beginning, we see: “CStCollBaseTable(ID=1, CARD=231412 TBL: Sales.OrderLines)”

That ‘CARD’ number is the optimizer thinking about how many rows are in this table. If you glance back up at the table statistics, the most recent statistic to be updated was on the ‘LastEditedWhen’ column. When that statistic was updated, there were 231,412 rows in the table.

SQL Server decides that it wants detail on the Quantity column to figure out how to run this query, so we see that it loads that statistic up to use: “Loaded histogram for column QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity from stats with id 7”

Alright, let’s truncate this table

I wipe out all the rows with this command:

TRUNCATE TABLE Sales.OrderLines;
GO

Now, I wouldn’t expect truncating the table to automatically update the statistics.

SQL Server updates statistics when they’re used to optimize a query — so if nobody queries this table for six months, I wouldn’t expect the stats to update for six months.

Let’s re-run our query, trace flags and all:

SELECT *
FROM Sales.OrderLines
WHERE Quantity > 10
    OPTION
(
    QUERYTRACEON 3604,
    QUERYTRACEON 2363,
    RECOMPILE
)
GO

The messages tab has less info this time- it’s much more concise!

Begin selectivity computation

Input tree:

  LogOp_Select

      CStCollBaseTable(ID=1, CARD=1 TBL: Sales.OrderLines)

      ScaOp_Comp x_cmpGt

          ScaOp_Identifier QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity

          ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=10)

Plan for computation:

  CSelCalcFixedFilter (0.3)

Selectivity: 0.3

Stats collection generated: 

  CStCollFilter(ID=2, CARD=1)

      CStCollBaseTable(ID=1, CARD=1 TBL: Sales.OrderLines)

End selectivity computation


(0 row(s) affected)

SQL Server knows that we blew away all those rows

This time we see “CARD=1 TBL: Sales.OrderLines”

SQL Server doesn’t like to estimate 0 for empty tables. It likes to estimate 1. It knows this table is empty.

With this information, it chooses a different plan for computation. The plan doesn’t require looking at the quantity column this time– we don’t have any lines about that at all.

But the statistics don’t look any different

You might expect to see that the statistic on Quantity had updated. I expected it, before I ran through this demo.

But SQL Server never actually had to load up the statistic on Quantity for the query above. So it didn’t bother to update the statistic. It didn’t need to, because it knows that the table is empty, and this doesn’t show up in our column or index specific statistics.

To verify, I just rerun my metadata query above, and things look the same:

What if the table has exactly one row?

Let’s insert one and find out:

INSERT INTO [Sales].[OrderLines] (OrderLineID, OrderID, StockItemID, Description, PackageTypeID, Quantity, UnitPrice, TaxRate, PickedQuantity, PickingCompletedWhen, LastEditedBy, LastEditedWhen)
     VALUES (1, 45, 164, '32 mm Double sided bubble wrap 50m', 7, 50, 112.00, 15.000, 50, '2013-01-02 11:00:00.0000000', 4, '2013-01-02 11:00:00.0000000')
GO

Now we run our familiar query, with all its merry trace flags:

SELECT *
FROM Sales.OrderLines
WHERE Quantity > 10
    OPTION
(
    QUERYTRACEON 3604,
    QUERYTRACEON 2363,
    RECOMPILE
)
GO

And here’s what SQL Server has to say about optimizing that…

Begin selectivity computation

Input tree:

  LogOp_Select

      CStCollBaseTable(ID=1, CARD=1 TBL: Sales.OrderLines)

      ScaOp_Comp x_cmpGt

          ScaOp_Identifier QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity

          ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=10)

Plan for computation:

  CSelCalcColumnInInterval

      Column: QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity

Loaded histogram for column QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity from stats with id 7

Selectivity: 1

Stats collection generated: 

  CStCollFilter(ID=2, CARD=1)

      CStCollBaseTable(ID=1, CARD=1 TBL: Sales.OrderLines)

End selectivity computation


(1 row(s) affected)

One row is enough to use our column statistic

Looking at the beginning, CARD=1 for Sales.OrderLines, just like it did after we truncated the table. But SQL Server does something different this time, indicating that it now knows that the table isn’t really empty.

It goes back to the CSelCalcColumnInInterval plan to optimize. And it loads up the column stat for the Quantity column.

Since this statistic was loaded into memory, it should have auto-updated based on my database settings. Sure enough, it did:

SQL Server knows when you’ve truncate a table

And the fact that the table has been truncated may mean that it doesn’t need to use statistics on the table when optimizing queries. After all, it’s an empty table, so it can take shortcuts!

So don’t get too confused if statistics look way out of date for a truncated table. Instead, ask yourself, “why am I querying a truncated table?” (Related disclaimer: I only tested this on SQL Server 2016.)

Want to learn more about statistics in SQL Server? Start here.

↧

Which Filegroup is that Partition Using? How Many Rows Does It Have?

January 31, 2017, 8:10 am

≫ Next: Index Types: Heaps, Primary Keys, Clustered and Nonclustered Indexes (Dear SQL DBA Episode 28)

≪ Previous: Does Truncate Table Reset Statistics?

Table Partitioning in SQL Server has a bit of a learning curve. It’s tricky to just figure out how much data you have and where the data is stored.

When you’re designing or managing partitioned tables, it’s useful to quickly verify:

Which tables are partitioned
The type of partition function they use (left or right)
Which boundary points are assigned to which filegroup
How many rows and pages are in each partition (and which boundary point they’re associated with)

This helps make sure that you’re designing your tables correctly, and it also helps you avoid goofs like merging the wrong boundary point and causing a bunch of data to move into another– which can be slow and painful.

All this information is available in TSQL, it’s just an ugly query, and it doesn’t come in any built-in reports or views.

So I’ve got an ugly query for you!

Query Listing Partitioned Tables with Boundary Point, Filegroup, Row Count, Partition Size, and Partition Number By Index

This query gives an overview of partitioned tables and indexes in a database. The query is also in a Gist, if you prefer.

SELECT
    sc.name + N'.' + so.name as [Schema.Table],
    si.index_id as [Index ID],
    si.type_desc as [Structure],
    si.name as [Index],
    stat.row_count AS [Rows],
    stat.in_row_reserved_page_count * 8./1024./1024. as [In-Row GB],
    stat.lob_reserved_page_count * 8./1024./1024. as [LOB GB],
    p.partition_number AS [Partition #],
    pf.name as [Partition Function],
    CASE pf.boundary_value_on_right
        WHEN 1 then 'Right / Lower'
        ELSE 'Left / Upper'
    END as [Boundary Type],
    prv.value as [Boundary Point],
    fg.name as [Filegroup]
FROM sys.partition_functions AS pf
JOIN sys.partition_schemes as ps on ps.function_id=pf.function_id
JOIN sys.indexes as si on si.data_space_id=ps.data_space_id
JOIN sys.objects as so on si.object_id = so.object_id
JOIN sys.schemas as sc on so.schema_id = sc.schema_id
JOIN sys.partitions as p on 
    si.object_id=p.object_id 
    and si.index_id=p.index_id
LEFT JOIN sys.partition_range_values as prv on prv.function_id=pf.function_id
    and p.partition_number= 
        CASE pf.boundary_value_on_right WHEN 1
            THEN prv.boundary_id + 1
        ELSE prv.boundary_id
        END
        /* For left-based functions, partition_number = boundary_id, 
           for right-based functions we need to add 1 */
JOIN sys.dm_db_partition_stats as stat on stat.object_id=p.object_id
    and stat.index_id=p.index_id
    and stat.index_id=p.index_id and stat.partition_id=p.partition_id
    and stat.partition_number=p.partition_number
JOIN sys.allocation_units as au on au.container_id = p.hobt_id
    and au.type_desc ='IN_ROW_DATA' 
        /* Avoiding double rows for columnstore indexes. */
        /* We can pick up LOB page count from partition_stats */
JOIN sys.filegroups as fg on fg.data_space_id = au.data_space_id
ORDER BY [Schema.Table], [Index ID], [Partition Function], [Partition #];
GO

Column Definitions and Notes

Schema.Table: Schema name concatenated with table name
Index ID: Included for reference and ordering
Structure: This will decode if it’s a partitioned heap, clustered index, nonclustered index, clustered columnstore index, or nonclustered columnstore index
Index Name: What it sounds like
Rows: Number of rows in that partition
In-Row GB: Reserved in-row pages for that partition
LOB GB: Reserved LOB pages for that partition (reminder – columnstore indexes use LOB pages)
Partition #: This can be useful in some queries. Remember that partition numbers are reassigned when you modify your partition function (split/merge)
Partition Function Name: The partition function is the “algorithm” that defines the boundary points for the partitions
Boundary Type: Whether the boundary point is a “right” type (lower inclusive boundary) or a “left” type (upper inclusive boundary)
Boundary Point: The value of the boundary point that goes with that particular partition
Filegroup: Where the data is located (defined by the partition scheme)

If you need to know the partition scheme name, it’s easy to add that column in (sys.partition_schemes is already in the query). The partition scheme is what maps your partition function to the filegroups. In most cases, people just want to know where things currently are, so I left that out of the query.

↧

Index Types: Heaps, Primary Keys, Clustered and Nonclustered Indexes (Dear SQL DBA Episode 28)

February 2, 2017, 8:12 am

≫ Next: Should I Learn Fulltext Indexing? (Dear SQL DBA Episode 29)

≪ Previous: Which Filegroup is that Partition Using? How Many Rows Does It Have?

Dear SQL DBA…

I see HEAP tables are found even when I know those tables have a clustered index, and I see a lot of forwarded records. This happens to 5 tables in my database. I can see the clustered and in some ones the non-clustered indexes… why are some scripts reporting them as heaps?

Here is an example of what I’m seeing from a script:
dbo.AnonymousTable (0) [HEAP] [RID] / 2,126,697 reads, 308,401 writes / 17,847 forwarded records fetched;

This puzzles a lot of people when they start working with indexes in SQL Server. The concepts here overlap and there are quite a few different ways you can do things.

Watch the 27 minute video discussing this, or scroll on down to read a written version of the video, complete with code samples.

Subscribe to the podcast, if you’d like to listen on the go! And a review on iTunes will help others find out about the show.

First up, let’s clarify the concepts.

Concept 1- How a disk-based table is physically ordered

I say “disk-based” because we’re not talking about in-Memory tables here. I’m not getting into those today for the sake of simplicity. (For a high level overview of disk-based vs in-Memory tables, check out this post.)

Clustered Index: This will always have IndexID = 1

Clustered rowstore tables – Traditional clustered index: you choose clustering key column(s) that determine the sort order of the data

Clustered columnstore tables – Clustered columnstore indexes don’t have key columns. Every column in the table is stored in columnar format, which uses LOB (large-object) pages

The syntax to create a clustered index may look like one of these samples:

/* These samples create clustered indexes that are NOT also a primary key */

/* 1. Two step process... */
CREATE TABLE dbo.ClusterMeToo (
    MakeMeAClusteredIndex BIGINT IDENTITY NOT NULL,
    Col2 NVARCHAR(100)
);
GO
CREATE UNIQUE CLUSTERED INDEX CX_ClusterMeToo_MakeMeAClusteredIndex 
    ON dbo.ClusterMeToo (MakeMeAClusteredIndex);
GO

/* 2. Inline index create. This syntax works in SQL Server 2014+ */
CREATE TABLE dbo.ClusterMe (
    MakeMeAClusteredIndex BIGINT IDENTITY NOT NULL,
    Col2 NVARCHAR(100),
    INDEX CX_ClusterMe_MakeMeAClusteredIndex UNIQUE CLUSTERED (MakeMeAClusteredIndex) 
);
GO

/* 3. Clustered Columnstore Example. This exists in SQL Server 2014+ */
CREATE TABLE dbo.ClusteredColumnstore (
    Col1 BIGINT IDENTITY NOT NULL,
    Col2 NVARCHAR(100),
    INDEX CX_ClusteredColumnstore_MakeMeAClusteredIndex CLUSTERED COLUMNSTORE 
);
GO

Notes:

You may want your clustered index to also be a primary key, in which case you want a different code sample. Keep reading!
It’s a best practice to make rowstore indexes unique, but it’s not required.
You can create a clustered columnstore index after the table has been created. You may even want to temporarily create a rowstore clustered index before you create the columnstore clustered index! Read the post, “Columnstore Index Performance: Rowgroup Elimination” from the SQL Server Tiger Team to learn more.

Heap: This will always have IndexID = 0

When you don’t define a clustered index, SQL Server uses a secret Row Identifier (RID) behind the scenes. You can’t use the RID in queries.
You may have nonclustered indexes (rowstore or columnstore) on a heap. Those are secondary physical structures. They’ll always have an IndexID greater than 1, because that’s reserved for clustered indexes.

The syntax to create a heap is something like this:

CREATE TABLE dbo.HeapExample (
    IMightBeAPK BIGINT IDENTITY NOT NULL,
    Col2 NVARCHAR(100)
);
GO

/* It remains a heap if you do NOT run any commands like this:
CREATE [UNIQUE] CLUSTERED INDEX [index name] on  ( [Column Name(s) ] or 
ALTER TABLE  ADD CONSTRAINT [constraint name] PRIMARY KEY CLUSTERED ( [Column Name(s)] )
CREATE CLUSTERED COLUMNSTORE INDEX [index name] on 
*/

Concept 2- Primary Key: the column or columns that define a unique row for business purposes

A table may have only one primary key (PK). (You can enforce uniqueness in other ways with unique constraints and unique indexes, though.)

A primary key is secretly an index! It can be clustered or nonclustered.

Your primary key may technically be a “surrogate key”. That just means that it’s not a column that “naturally” identifies the data– it may be an INT, BIGINT, or UNIQUEIDENTIFIER column that was designed to uniquely identify the row, even though the number or uniqueidentifier itself isn’t meaningful to look at.

Clustered primary key: This will always have IndexID = 1 (it’s a clustered index behind the scenes, as well as a constraint)

You may choose to make the clustered index ALSO the primary key when you create the index / constraint. This means the column or columns that uniquely identify a row also define the physical sort order of the table on disk.
There’s no such thing as a clustered primary key on a clustered columnstore table. Clustered columnstore indexes don’t have key columns– every column in the table is stored in a columnar format.

The syntax to create a clustered primary key can look like this:

CREATE TABLE dbo.ClusteredPKExample (
    MakeMeACXPK BIGINT IDENTITY NOT NULL,
    Col2 NVARCHAR(100),
    CONSTRAINT PK_ClusteredPKExample_MakeMeACXPK
        PRIMARY KEY CLUSTERED (MakeMeACXPK)
);
GO

/* Or a two step create... */
CREATE TABLE dbo.AnotherClusteredPKExample (
    MakeMeACXPK BIGINT IDENTITY NOT NULL,
    Col2 NVARCHAR(100)
);
GO

ALTER TABLE dbo.AnotherClusteredPKExample
    ADD CONSTRAINT PK_AnotherClusteredPKExample_MakeMeACXPK
        PRIMARY KEY CLUSTERED (MakeMeACXPK);
GO

Nonclustered primary key: This will always have IndexID > 1

Nonclustered primary key constraints are nonclustered indexes behind the scenes
A nonclustered primary key may be created on a heap, or a table with a clustered index
Antipattern: sometimes people create a clustered index and a non-clustered primary key on the same column or columns. This means your table has to maintain TWO indexes on the same key column, when you could just have one. It’s more efficient to create a clustered primary key.

The syntax to create a nonclustered primary key may look something like this:

/* 1. This syntax works in SQL Server 2014+ */
CREATE TABLE dbo.NonclusteredPKExample (
    MakeMeCX BIGINT IDENTITY NOT NULL,
    BusinessKey NVARCHAR(50) NOT NULL,
    Col2 NVARCHAR(100),
    INDEX CX_NonclusteredPKExample_MakeMeCX UNIQUE CLUSTERED (MakeMeCX),
    CONSTRAINT PK_NonclusteredPKExample_BusinessKey
        PRIMARY KEY NONCLUSTERED (BusinessKey)
);
GO

/* 2. Three step create... */
CREATE TABLE dbo.AnotherNonclusteredPKExample (
    MakeMeCX BIGINT IDENTITY NOT NULL,
    BusinessKey NVARCHAR(50) NOT NULL,
    Col2 NVARCHAR(100)
);
GO

ALTER TABLE dbo.AnotherNonclusteredPKExample
    ADD CONSTRAINT PK_AnotherNonclusteredPKExample_BusinessKey
        PRIMARY KEY NONCLUSTERED (BusinessKey);
GO

CREATE UNIQUE CLUSTERED INDEX cx_AnotherNonclusteredPKExample_MakeMeCX
    on dbo.AnotherNonclusteredPKExample (MakeMeCX);
GO

/* 3. Clustered Columnstore with PK, two step version.
This works in SQL Server 2016+ */
CREATE TABLE dbo.ClusteredColumnstoreWithPKExample (
    BusinessKey NVARCHAR(50) NOT NULL,
    Col1 BIGINT IDENTITY NOT NULL,
    Col2 NVARCHAR(100),
    CONSTRAINT PK_ClusteredColumnstoreWithPKExample_BusinessKey
        PRIMARY KEY NONCLUSTERED (BusinessKey)
);
GO

CREATE CLUSTERED COLUMNSTORE INDEX cCx_ClusteredColumnstoreWithPKExample
    on dbo.ClusteredColumnstoreWithPKExample;
GO

Back to the question: what happened?

Based on the initial email, I was pretty sure that the tables in question were accidentally created as heaps, each with a nonclustered primary key.

The big giveaways were that indexid zero only ever exists on a heap, and forwarded records can also only occur in a heap object. In a brief email conversation back and forth, we confirmed that this was the case.

I’ve run into this quite a few times in the wild. Sometimes the tables were a heap and someone later thought to add primary keys, and made them nonclustered without thinking. Sometimes people just accidentally use the wrong syntax at create time.

Which type of indexes should I use?

In the case of our questioner, they likely want to just recreate their nonclustered primary keys and clustered primary keys. That’s kind of a pain if you’ve got a lot of foreign keys or SQL Server replication set up.

While it’s generally a bad practice to have a unique clustered index and a non-clustered primary key on the same columns, because they’re duplicate indexes… if the tables are small and don’t have a lot of modifications, I’m not going to pin a scarlet letter on you for doing it a few times.

But for general use, let’s make some generalizations!

Clustered primary key: When the set of columns that uniquely identify a row are also very frequently used in joins and the ‘where’ clause of your query, ordering the table by those columns is usually a great fit. The clustered index automagically has direct access to all the in-row columns in a table without having to look it up in another structure.

Unique clustered index with a different nonclustered primary key (rowstore): Sometimes you have a table where it makes sense to physically sort the table on different columns than the ones that make up the primary key on the table:

Maybe the most important use of the table is range scan on a different column or columns and those queries access lots of columns in the table. Using that as the clustered index can be very powerful.
Maybe the primary key is a very wide set of columns on a large table. Having a wide set of columns in the clustered index bloats all your nonclustered indexes (because it’s secretly added to them). Depending on how the table is queried, another column or set of columns may work better as the clustered index.

Heap tables (possibly with a nonclustered PK, depending what you’re doing): You don’t always need a clustered index. Or any index for that matter. Heaps can have some weird problems, like those forwarded records, but that’s for another day. Heaps can be good for:

Tables that you always scan, where you want all the columns. Scanning heaps can be really fast!
Tables that you query in a very specific, controlled, targeted manner which is suited to non-clustered indexes
Staging tables where you’re doing quick and dirty loads and queries. Sometimes it’s faster to not create a clustered index, depending on what you’re doing. Test and use what performs best while making sure your data is valid: no shame in that.

Columnstore indexes (clustered and nonclustered): These are extremely powerful when you need to scan a lot of rows to do aggregations.

Data warehouse tables are the obvious fit here, and are where the feature started.
- Since data warehouses are all about analytics and massive tables, the natural patterns are updatable clustered columnstore indexes, likely with table partitioning
- Whether you should use primary and foreign keys in your data warehouse is something people fight about. I accept people of all key choices here.
OLTP databases can be a good fit, also!
- Writeable nonclustered columnstore indexes are a big feature in SQL Server 2016, and are designed to help people with busy workloads which combine OLTP and analytic queries in the same database. That’s an increasingly common requirement: lots of businesses just cannot wait for an ETL to run before analysis.
- The natural pattern in an OLTP database using this would be a rowstore table with a Clustered PK and a nonclustered columnstore index on the columns you use for analytics
Also in SQL Server 2016:
- Optimistic locking (snapshot / RCSI) is supported against columnstore indexes
- You can read columnstore indexes on Availability Group secondaries (which automagically use snapshot behind the scenes)

Is my excitement for columnstore indexes in SQL Server 2016 showing?

On a related note…

Don’t be jealous, you can get one, too.

Sitting right next to me as I write this is the brand spanking new fifth edition of Louis Davidson’s Pro SQL Server Relational Database Design and Implementation, updated for SQL Server 2016. You can buy the book now from APress or Amazon.

Those are not affiliate links and this is not a sponsored post. I’m just excited to read the fifth edition, and if you got this far in this post then you’d probably like it, too.

If you’re in the process of modeling a SQL Server database, get Louis’ book. It will help you along the way, and in designing future projects as well.

↧

Should I Learn Fulltext Indexing? (Dear SQL DBA Episode 29)

February 9, 2017, 8:05 am

≪ Previous: Index Types: Heaps, Primary Keys, Clustered and Nonclustered Indexes (Dear SQL DBA Episode 28)

This week’s question is about a longstanding feature in SQL Server that sounds really cool: full-text search. If you’re learning performance tuning, how much time should you invest in researching and learning about full-text indexes?

Watch this 18 minute video, or scroll on down to read the written scoop on full-text search.

Dear SQL DBA…

I have been doing performance tuning for about 9 months now. It puzzles me that one type of index never gets much attention: full text indexes. Are fulltext indexes a cool feature that can really help performance (all that LIKE ‘%blabla%’ predicates application developers seem to love ) or are they quite the opposite and not worth investing time in ?

Best regards,

Puzzled about fulltext

The “dirty little secret” about full-text search indexes is that they don’t help with ‘%blabla%’ predicates.

Well, it’s not a secret, it’s right there in the documentation.

A lot of us get the impression that full-text search is designed to handle “full wildcard” searches, probably just because of the name. “Full-Text Searches” sounds like it means “All The Searches”. But that’s not actually what it means.

What is full-text search good for?

Full-text indexes can help with:

Prefix searches. It’s good for ‘bla%’
Phrases containing words. So it’s good for ‘So blabla to you’
Different forms of a word / synonyms (is there a synonym for blabla? I don’t know!)
Words near one another in a document (‘bla’ is in a document in proximity to ‘blop’)

Full-text search also has special features like stoplists and stopwords to keep the index from becoming more bloated than it has to be, and help searches be more efficient.

One way to think about this is that full-text search is designed to be smart about language: it thinks about phrases, synonyms, how words are used, things like that.

A pure wildcard search of ‘%blabla%’ isn’t really about language. That’s just looking for a pattern somewhere in a string.

For wildcard searches and regular expression queries, secondary applications like Lucene are attractive, and these days in the cloud there are options like Lucene Query in Azure Search.

Aside: Azure Search is easy to play with for free

A while back I wrote a post called Wildcard vs Regular Expressions – Lucene Query in Azure Search.

It shows how easy it is to play around with texting non-sargable wildcard searches like ‘%blabla%’ against online sample data in Azure. All you need is a browser, it’s totally free and you don’t even have to create an Azure account.

Fulltext indexes and performance

I’ve run into quite a few companies using full-text search. Most of them were using it pretty lightly, and it rarely was something they asked me for help with: they set it up following the documentation, and it just worked. There were quite a few cases where I’d say something about seeing a full-text index when looking over an instance, and my client laughed and said they’d forgotten they even used full-text. (If you think about it, that’s a compliment to the feature.)

I’ve also run into some folks who’ve used full-text search so heavily that they pushed the boundaries of the feature: very large multi-terabyte databases pulling in large volumes of data.

Keeping data in sync with heavy update rates

With heavy to ultra-heavy usage, one issue with full-text indexes is that they don’t update synchronously with the base table. This is helpful for performance for inserts, updates, deletes into the base table, because updating a large full-text index can take time. But it does mean that if your application allows both queries of the base table AND the full-text index, people could see different, contradictory data if the full-text index is behind.

What if corruption strikes?

And as with any other index, you can get corruption in a full-text index. That’s not necessarily the SQL Server’s fault: corruption can come from the storage subsystem. If your full-text index gets corrupt, you’re probably going to have to rebuild it.

If you’re working with giant full-text indexes, recreating the index can add up to a lot of downtime. Thinking about how your tables are laid out and breaking your indexes into manageable chunks becomes very important at scale.

I think full-text search is here to stay, it’s just getting interesting company

This is an older feature, so there’s always that question as to how “fresh” it is.

Microsoft has invested in making full-text indexes perform better over the years. The feature was revamped in 2008 and has received a variety of performance fixes over the years. A new DMV, sys.dm_fts_index_keywords_position_by_document , was added in SQL Server 2016 and also backported to previous versions.

Full-text search is well maintained by Microsoft. I don’t think it’s going anywhere.

What about semantic search?

In SQL Server 2012, Microsoft added the semantic search feature built on top of full-text search. Semantic search helps identify the main phrases in a document and can find and compare similar/related documents.

Semantic is one of those features that dropped in and then seemed to disappear from the conversation, though.

I haven’t heard of its capabilities being strongly expanded in later versions, and I know people who evaluated it in SQL Server 2012 who found it to be too much of a “v1 feature” to fit their needs, compared to features offered by third-party vendors with semantic search tools. (Of course, they were evaluating native semantic search because not everything was perfect with their third party app, either.)

Here is one such investigation into semantic search by Joe Sack – Exploring Semantic Search Key Term Relevance.

If you use semantic search in production and know about improvements that I’m unaware of, I’d love to hear about it in the comments!

How much time should you invest in learning full-text indexes?

To sum up, full-text indexing is fairly widely used, but most of the folks using it are doing so on a small scale where it “just works.” Those companies are unlikely to have a high bar on full-text index skills when it comes to hiring, and they may not even ask you questions about it at all in a job interview.

For most folks, I think it’s worth knowing the basic limitations of full-text and what the feature does.

A one-time investment of an hour to read and make notes for yourself is generally enough to get you to a point where you can identify potential use cases. If you ever find those use cases, at that point you can invest more time in evaluating how well full-text fits that implementation.

After getting the big picture from this post, reading the Books Online page on full-text search is probably good enough for most people. That’s where I’d spend the rest of your hour.

After that, I wouldn’t invest a bunch of time learning about full-text indexes unless you’ve got a specific reason. You’re better off investing your time learning about wait statistics, tuning TSQL using execution plans, rowstore indexes, columnstore indexes, Query Store, and In-Memory indexes.

Some fun related topics: building your own type of full-text index, and querying with regular expressions

Aaron Bertrand writes about building your own word-part index: One way to get an index seek for a leading %wildcard

Dev Nambi created an open-source project, sql-server-regex, that uses the SQLCLR “lets you run regular expressions in T-SQL queries using scalar and table-valued functions.” I know for a fact that Dev is crazy good at this stuff, because I worked with him for several years out there in the real world. He’s a unicorn.

↧