Problem
My EXISTS checks used to look like this:
IF EXISTS (SELECT * FROM TABLE WHERE Columns=@Filters)
BEGIN
UPDATE TABLE SET ColumnsX=ValuesX WHERE Where Columns=@Filters
END
In a previous life, one of the DBAs advised me to use SELECT 1 instead of SELECT * when using an EXISTS clause.
IF EXISTS (SELECT 1 FROM TABLE WHERE Columns=@Filters)
BEGIN
UPDATE TABLE SET ColumnsX=ValuesX WHERE Columns=@Filters
END
Is there a real difference here?
Asked by Raj More
Solution #1
No, SQL Server is sophisticated enough to recognize that it is being used for an EXISTS and responds with NO DATA.
Quoth Microsoft: http://technet.microsoft.com/en-us/library/ms189259.aspx?ppud=4
To check yourself, try running the following:
SELECT whatever
FROM yourtable
WHERE EXISTS( SELECT 1/0
FROM someothertable
WHERE a_valid_clause )
It would throw a div by zero error if it was actually doing something with the SELECT list. It isn’t the case.
EDIT: It’s worth noting that this is addressed in the SQL Standard.
http://www.contrib.andrew.cmu.edu/shadow/sql/sql1992.txt ANSI SQL 1992 Standard, pg 191
Answered by Matt Rogish
Solution #2
This misunderstanding is most likely due to the assumption that it will read all of the columns. It’s obvious that this isn’t the case.
CREATE TABLE T
(
X INT PRIMARY KEY,
Y INT,
Z CHAR(8000)
)
CREATE NONCLUSTERED INDEX NarrowIndex ON T(Y)
IF EXISTS (SELECT * FROM T)
PRINT 'Y'
Gives plan
Despite the fact that the index does not include all columns, SQL Server was able to check the result using the narrowest index available. The index is accessed using a semi join operator, which implies it can stop searching after the first row has been returned.
As a result, it is evident that the preceding belief is incorrect.
However, Conor Cunningham of the Query Optimiser team explains that he often chooses SELECT 1 in this scenario because it can improve query compilation performance slightly.
On an empty table with variable numbers of columns, I tested four different ways of presenting this query. SELECT 1 vs SELECT * vs SELECT Primary Key vs SELECT Other Not Null Column vs SELECT Other Not Null Column vs SELECT Other Not Null Column vs SELECT Other Not Null_
I used OPTION (RECOMPILE) to run the queries in a loop and counted the average number of executions per second. The following are the outcomes.
+-------------+----------+---------+---------+--------------+
| Num of Cols | * | 1 | PK | Not Null col |
+-------------+----------+---------+---------+--------------+
| 2 | 2043.5 | 2043.25 | 2073.5 | 2067.5 |
| 4 | 2038.75 | 2041.25 | 2067.5 | 2067.5 |
| 8 | 2015.75 | 2017 | 2059.75 | 2059 |
| 16 | 2005.75 | 2005.25 | 2025.25 | 2035.75 |
| 32 | 1963.25 | 1967.25 | 2001.25 | 1992.75 |
| 64 | 1903 | 1904 | 1936.25 | 1939.75 |
| 128 | 1778.75 | 1779.75 | 1799 | 1806.75 |
| 256 | 1530.75 | 1526.5 | 1542.75 | 1541.25 |
| 512 | 1195 | 1189.75 | 1203.75 | 1198.5 |
| 1024 | 694.75 | 697 | 699 | 699.25 |
+-------------+----------+---------+---------+--------------+
| Total | 17169.25 | 17171 | 17408 | 17408 |
+-------------+----------+---------+---------+--------------+
As can be shown, neither SELECT 1 nor SELECT * consistently win, and the difference between the two ways is insignificant. However, the SELECT Not Null col and SELECT PK appear to be slightly faster.
All four of the queries degrade in performance as the number of columns in the table increases.
Due to the fact that the table is empty, this relationship appears to be explained solely by the amount of column metadata. From the below, it is clear that COUNT(1) is rewritten to COUNT(*) at some point during the process.
SET SHOWPLAN_TEXT ON;
GO
SELECT COUNT(1)
FROM master..spt_values
As a result, the following plan emerges.
|--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1004],0)))
|--Stream Aggregate(DEFINE:([Expr1004]=Count(*)))
|--Index Scan(OBJECT:([master].[dbo].[spt_values].[ix2_spt_values_nu_nc]))
Connecting a debugger to the SQL Server process and randomly crashing it while doing the following commands
DECLARE @V int
WHILE (1=1)
SELECT @V=1 WHERE EXISTS (SELECT 1 FROM ##T) OPTION(RECOMPILE)
Even when SELECT 1 is used, I found that in most cases where the table has 1,024 columns, the call stack looks like this, indicating that it is spending a significant amount of time loading column metadata (for the case where the table has 1 column, randomly breaking didn’t hit this bit of the call stack in 10 attempts).
sqlservr.exe!CMEDAccess::GetProxyBaseIntnl() - 0x1e2c79 bytes
sqlservr.exe!CMEDProxyRelation::GetColumn() + 0x57 bytes
sqlservr.exe!CAlgTableMetadata::LoadColumns() + 0x256 bytes
sqlservr.exe!CAlgTableMetadata::Bind() + 0x15c bytes
sqlservr.exe!CRelOp_Get::BindTree() + 0x98 bytes
sqlservr.exe!COptExpr::BindTree() + 0x58 bytes
sqlservr.exe!CRelOp_FromList::BindTree() + 0x5c bytes
sqlservr.exe!COptExpr::BindTree() + 0x58 bytes
sqlservr.exe!CRelOp_QuerySpec::BindTree() + 0xbe bytes
sqlservr.exe!COptExpr::BindTree() + 0x58 bytes
sqlservr.exe!CScaOp_Exists::BindScalarTree() + 0x72 bytes
... Lines omitted ...
msvcr80.dll!_threadstartex(void * ptd=0x0031d888) Line 326 + 0x5 bytes C
kernel32.dll!_BaseThreadStart@8() + 0x37 bytes
This human profiling effort is supported by the VS 2012 code profiler, which displays a considerably distinct set of routines requiring compilation time in the two scenarios (Top 15 Functions 1024 columns vs Top 15 Functions 1 column).
Both the SELECT 1 and SELECT * versions check column permissions and fail if the user does not have access to all of the table’s columns.
An example I cribbed from a conversation on the heap
CREATE USER blat WITHOUT LOGIN;
GO
CREATE TABLE dbo.T
(
X INT PRIMARY KEY,
Y INT,
Z CHAR(8000)
)
GO
GRANT SELECT ON dbo.T TO blat;
DENY SELECT ON dbo.T(Z) TO blat;
GO
EXECUTE AS USER = 'blat';
GO
SELECT 1
WHERE EXISTS (SELECT 1
FROM T);
/* ↑↑↑↑
Fails unexpectedly with
The SELECT permission was denied on the column 'Z' of the
object 'T', database 'tempdb', schema 'dbo'.*/
GO
REVERT;
DROP USER blat
DROP TABLE T
So it’s possible that the only difference between SELECT some not null col and SELECT some not null col is that it just checks permissions on that one column (though still loads the metadata for all). However, the data appear to contradict this, as the percentage difference between the two ways shrinks as the number of columns in the underlying table grows.
In any case, I’m not going to hurry out and convert all of my queries to this format because the difference is minimal and only noticeable during query compilation. The following result was obtained by removing the OPTION (RECOMPILE) so that subsequent executions might use a cached plan.
+-------------+-----------+------------+-----------+--------------+
| Num of Cols | * | 1 | PK | Not Null col |
+-------------+-----------+------------+-----------+--------------+
| 2 | 144933.25 | 145292 | 146029.25 | 143973.5 |
| 4 | 146084 | 146633.5 | 146018.75 | 146581.25 |
| 8 | 143145.25 | 144393.25 | 145723.5 | 144790.25 |
| 16 | 145191.75 | 145174 | 144755.5 | 146666.75 |
| 32 | 144624 | 145483.75 | 143531 | 145366.25 |
| 64 | 145459.25 | 146175.75 | 147174.25 | 146622.5 |
| 128 | 145625.75 | 143823.25 | 144132 | 144739.25 |
| 256 | 145380.75 | 147224 | 146203.25 | 147078.75 |
| 512 | 146045 | 145609.25 | 145149.25 | 144335.5 |
| 1024 | 148280 | 148076 | 145593.25 | 146534.75 |
+-------------+-----------+------------+-----------+--------------+
| Total | 1454769 | 1457884.75 | 1454310 | 1456688.75 |
+-------------+-----------+------------+-----------+--------------+
You can find the test script I used here.
Answered by Martin Smith
Solution #3
The best approach to find out is to run both versions through their paces and examine their execution plans. Choose a table with a large number of columns.
Answered by HLGEM
Solution #4
In SQL Server, there is no difference, and it has never been a problem. The optimizer recognizes that they are identical. If you look at the execution plans, you will see that they are identical.
Answered by Cade Roux
Solution #5
Personally, I find it difficult to believe that they don’t use the same query plan. However, the only way to know for sure in your own case is to put it to the test. Please let us know if you succeed!
Answered by Larry Lustig
Post is based on https://stackoverflow.com/questions/1597442/subquery-using-exists-1-or-exists